Serendeputy - your personal news assistant.

Welcome to Serendeputy!

Serendeputy is your personal news assistant.

Your deputy:
- learns what you like and don't like,
- lovingly compiles a list of news and blogs for you.

You can help your deputy learn by searching, clicking links and pressing the little smiley faces.
How it works.

What to do:
  1. Click links to teach your deputy
  2. Click smileys and frownies
  3. Find favorite topics and sources
  4. See how much better your deputy is getting at finding you good stuff.
  5. Sign in for free to save your profile, or please tell me why you won't.
I work with bioinformatics and I'm completely new to machine learning, so I don't know if this idea is viable. There are several predictors that try to predict the molecular nature of a set of a given genes. They use different ML methods (SVM, logistic...
From: Stats Stack Exchange | By: apcamargo | Friday, January 20, 2017
smile
frown
The full question is: Assume we fit the following quadratic function: $f(x) = w_0+w_1x+w_2(x^2)$ to the dataset shown (blue circles). The fitted function is shown by the green curve in the picture below. Out of the 3 parameters of the fitted function...
From: Stats Stack Exchange | By: Manan | Thursday, January 19, 2017
smile
frown
Running the following code in R: library(datasets) boxplot(ChickWeight$weight, ChickWeight$Diet) It produces a boxplot looking like this: Only two Diet groups are displayed in the x-axis. Using the formula structure and executing the call: boxplot(weight...
From: Stats Stack Exchange | By: Judy | Thursday, January 19, 2017
smile
frown
I am so confused with using cv to evaluate model performance, so the setting is like this: I have 1000 data, and I split that into training and testing set, and then I use that training set to further perform k-fold cv, and I will use the model that...
From: Stats Stack Exchange | By: Wendy Huang | Saturday, January 21, 2017
smile
frown
I had run artificial data using y=a+ax1+ax2+e. x1 is generated using Normal Distribution and e generated using Cauchy and Normal Distribution. The model i want to compare is ANN and SVM. When using Cauchy as disturbance in artificial data. The model...
From: Stats Stack Exchange | By: bbadyalina | Saturday, January 21, 2017
smile
frown
I've got this model: model <- lm (time~radius_mean+texture_mean+perimeter_mean+area_mean +smoothness_mean+compactness_mean+concavity_mean +concave_points_mean+symmetry_mean+fractal_dimension_mean+radius_se +texture_se+perimeter_se+area_se+smoothness_se+compactness_se...
From: Stats Stack Exchange | By: user1068980 | Saturday, January 21, 2017
smile
frown
I wish to plot error bars on a bar chart that represents the answers of 200 respondents, randomly selected from a population of 50,000,000, to a simple multiple-choice question. Here are the proportions of answers: Option A: 83% Option B: 4% Option C:...
From: Stats Stack Exchange | By: Remster | Saturday, January 21, 2017
smile
frown
i want to know the interpretation of this coefficient table: basically i am seeing the impact of firm size and audit experience on audit pressure where small firm, medium firm and large firm are the dummy variables of audit firm size and the reference...
From: Stats Stack Exchange | By: manahil | Friday, January 20, 2017
smile
frown
Call: pgls(formula = log10(numericlifespan) ~ numerictrophic * log10(numericsize), data = mycdat, lambda = "ML") Residuals: Min 1Q Median 3Q Max -0.43856 -0.18641 -0.00481 0.16407 0.89648 Branch length transformations: kappa [Fix] : 1.000 lambda [ ML]...
From: Stats Stack Exchange | By: pren | Saturday, January 21, 2017
smile
frown
Suppose one don't know exactly which individual is moving from one state to the other but an only observe aggregate amount of individuals in each state. How to find out the Hidden Markov Model (let's make it as simple as possible)? Can someone propose...
From: Stats Stack Exchange | By: ZHU | Thursday, January 19, 2017
smile
frown
I'm looking to estimate two spatial models: $y = \rho Wy + X\beta + WX \tau + \epsilon \\$ and $y = X\beta + WX \tau + \epsilon \\$ , where $\epsilon = \lambda W\epsilon + u$. This is possible to estimate in R using the maximum likelihood, but my data...
From: Stats Stack Exchange | By: Eri Farias | Saturday, January 21, 2017
smile
frown
I am using the following algorithms: 1.) logistic regression 2.) Decision Trees 3.) SVM Besides getting more data and messing with parameters, how can I aim to improve performance?
From: Stats Stack Exchange | By: tominariBoy | Saturday, January 21, 2017
smile
frown
I am going to build a GLMM with R (lme4). And I found that several papers summarised the results in this way (the independent variables are categorical variables): I have tried to use anova function, but I only could get Chi sq and p value. And it is...
From: Stats Stack Exchange | By: qinli Deng | Saturday, January 21, 2017
smile
frown
What would cause a regression model to always under predict? For over a year now an associate of mine has been producing a linear model for a client which predicts trends with reasonable accuracy but always under predicts the magnitude. This has bothered...
From: Stats Stack Exchange | By: sten | Friday, January 20, 2017
smile
frown
This seems like a simple thing but it dawned on me I did not really know how to answer this. What are the criteria for being able to estimate an effect? For example, with the data set: dat<-structure(list(Y = c(100L, 556L, 25L, 32L, 15L, 56L, 95L,...
From: Stats Stack Exchange | By: B_Miner | Friday, January 20, 2017
smile
frown
I am using a GLM to model my data. The response variable is binary and I have three predictors of which two are continuous variables and one is binary. Would the distribution of predictors be important when I am fitting the model? That is, would it be...
From: Stats Stack Exchange | By: Mina | Friday, January 20, 2017
smile
frown
I'm trying to create a very specific folder tree using Python that will create predefined folders and subfolders. I can create the top and second level folders, but the program messes up when it creates the third and fourth level folders. It either doesn't...
From: Stats Stack Exchange | By: user131935 | Friday, January 20, 2017
smile
frown
There's an old of thumb in multivariate statistics that recommends a minimum of 10 cases for each independent variable. But that's often where there is one parameter to fit for each variable. Why I'm asking: I'm working through a textbook example that...
From: Stats Stack Exchange | By: Mike Kruger | Friday, January 20, 2017
smile
frown
Let's say I were to impute values for a variable (using multiple imputation). Then I wanted to use that variable in a regression. Can I use the same variables I used to impute in my new regression? So, for example, if I were to impute math test score...
From: Stats Stack Exchange | By: user146004 | Friday, January 20, 2017
smile
frown
I am using using VIF to remove multiconnearity. Prior to VIF I stationarized all my variables first. Once I defined the set of variables from the VIF, is it okay if I revert to the original time series, which is not necessarily stationary, before random...
From: Stats Stack Exchange | By: python novice | Friday, January 20, 2017
smile
frown
* I'm a student and I have been given this above assumption to Validate and checks to avoid Violations.My question is According to this Assumption how do we validate and checks to avoid Violation. *
From: Stats Stack Exchange | By: Sharanya | Friday, January 20, 2017
smile
frown
I have following problem scenario. Assume, a system with two groups: P and R. P group has 3 samples (p1, p2, p3) and R group has 4 samples (r1, r2, r3, r4). All samples can belong to any of the two classes (c0 or c1). And, each sample has 3 features:...
From: Stats Stack Exchange | By: Sumaiya Iqbal | Friday, January 20, 2017
smile
frown
Let's say I have the following OLS model: Y = X1 + X2 + X3 + e Variable of interest in X1 while X2 and X3 are control variables. Based on theory, I know X4 (e.g. firm size) should have an affect on Y. I also know I should standardize X1 since one unit...
From: Stats Stack Exchange | By: user3210369 | Friday, January 20, 2017
smile
frown
My question grew out of discussion with @whuber in the comments of a different question. Specifically @whuber 's comment was as follows: One reason it might surprise you is that the assumptions underlying a correlation test and a regression slope test...
From: Stats Stack Exchange | By: Stefan | Thursday, January 19, 2017
smile
frown
I have a mixt linear model with a random variable, so a lmer model. Q1) I would like to know what are the assumptions that need to be checked for this kind of model? Is it enough to look at the residues (student residues, normality) and the extreme values...
From: Stats Stack Exchange | By: J. Du | Thursday, January 19, 2017
smile
frown
Suppose we have an hotel and we know all the reservations of the last five years. We would like to forecast/estimate the room demand day-by-day for the next year. I'm a mathematician but not a statistician, I'm sorry if I'm saying something trivial or...
From: Stats Stack Exchange | By: fdesmond | Thursday, January 19, 2017
smile
frown
As I researched from documents, generally we use 3 sets that; 1)a set for training classifiers 2)a set for testing the classifiers during development and 3) an untouched test set that is only used after the development.. So my question is why we really...
From: Stats Stack Exchange | By: noMatter | Thursday, January 19, 2017
smile
frown
What will be the effect on the solution of least square analysis if we apply the following transformations on the training set: add a real number $k$ to the output value of each datapoint. And the model is $y = \phi w$ where $w$ is the weight attached...
From: Stats Stack Exchange | By: Shraddheya Shendre | Thursday, January 19, 2017
smile
frown
I am trying to eliminate seasonality from my data using Fourier analysis in MATLAB. Following this post http://stackoverflow.com/questions/19285684/fast-fourier-transform-for-deasonalizing-data-in-matlab?answertab=oldest#tab-top I came up with this code:...
From: Stats Stack Exchange | By: Datanalyst | Thursday, January 19, 2017
smile
frown
Given a time series with Events, I want to test weather events in two time series are occurring differently.  See for example the attached image. There are 12 events (orange) between 2000 and 2007 with different lengths. Let's pretend this are drought...
From: Stats Stack Exchange | By: kn1g | Thursday, January 19, 2017
smile
frown
I am conducting a CFA on a questionnaire with 4 factors. I know that the exploratory factor analysis to obtain theses 4 factors was done using oblimin rotation. I am now wondering, if this affects the model I have to build with lavaan-package in R. Following...
From: Stats Stack Exchange | By: chonasson | Thursday, January 19, 2017
smile
frown
In my data I have $n_1$ people who had an "event" and $n_2$ people who did not. Cases (those with an event) were oversampled substantially (the true prevalence is probably more like 1 in 10000). If it matters, $n_1 = 70$ and $n_2 = 250$. For all of those...
From: Stats Stack Exchange | By: Bernedette_Sanders | Thursday, January 19, 2017
smile
frown
Very short question: are there tools (by preference in R or Stata) to solve a simultaneous equation model, without needing instrumental variables? In my case, I would like to model irrigation and croptype. A farmer needs more or less irrigation depending...
From: Stats Stack Exchange | By: user33125 | Thursday, January 19, 2017
smile
frown
I am trying to calculate modification indices in a Structural Equation Model (SEM) with an endogenous categorical variable. I am using the R package 'lavaan'. Running what it seems to be the correct code and I can not compute them. Reproducible example...
From: Stats Stack Exchange | By: Filipe Dias | Thursday, January 19, 2017
smile
frown
I have $200$ observations of a time series $X_t$ and they have been fitted using a $SARIMA(1, 0, 0)(0, 1, 1)_4$ model which is $y_t= y_{t-4} + \phi_1 (y_{t-1} - y_{t-5}) + Z_t + \Theta Z_{t-4 }$. This is the output of the program made in R: Coefficients:...
From: Stats Stack Exchange | By: user137201 | Thursday, January 19, 2017
smile
frown
We are trying to predict customer's Total recharge value for the next week. We came up with 8 recharge bands. So its now a classification problem. We have tried history of last 8 weeks. We have tried all the famous techniques like feature selection,...
From: Stats Stack Exchange | By: Alex | Thursday, January 19, 2017
smile
frown
Other than ANN inconsistent prediction performance, What is other ANN disadvantage and weakness?
From: Stats Stack Exchange | By: bbadyalina | Thursday, January 19, 2017
smile
frown
I have a very large longitudinal dataset consisting of a variable Y measured in time (10million datapoints in 30,000 samples). I would like to assess a large number of predictors (e.g. gender/age) as fixed effects on the slope of Y in time or Y at a...
From: Stats Stack Exchange | By: tafelplankje | Thursday, January 19, 2017
smile
frown
I have two samples, one which has 1436 observations where sd=0.0405, mean=0.7776 and skewness=0.032 and the other which has 4956 observations and sd=0.0416, mean=0.7716 and skewness=-0.0897. Now i am doing a Welch Two Sample t-test in R, but I am wondering...
From: Stats Stack Exchange | By: Meeldurb | Thursday, January 19, 2017
smile
frown
I have weekly data. I would like to perform time series analysis on it. Refer to Robjhyndman, period of weekly data can be calculated by approximation 365.25/7 = 52. How can I define period in SPSS for weekly data? The following is data examples which...
From: Stats Stack Exchange | By: No-Encrypt | Thursday, January 19, 2017
smile
frown