Serendeputy is your personal news assistant.

Your deputy:

- learns what you like and don't like,

- lovingly compiles a list of news and blogs for you.

You can help your deputy learn by searching, clicking links and pressing the little smiley faces.

How it works.

- Click links to teach your deputy
- Click smileys and frownies
- Find favorite topics and sources
- See how much better your deputy is getting at finding you good stuff.
- Sign in for free to save your profile, or please tell me why you won't.

I'm a novice in machine learning. I'm trying to make prediction with classification methods. My class has 3 possibles states, so 33% of probabilities at the beginning. I can't go further than 45% of accuracy. To imporve my accuracy, I changed my features....

From: Stats Stack Exchange | By: Yoann boyere | Sunday, March 26, 2017

smile

frown

skip

I've read many threads on this website try to understand why we need to break the data on hand into 3 parts, the training, validation and test data set. I am still thinking it is enough just to break the data set on into into 2 pieces, i.e., the training...

From: Stats Stack Exchange | By: KevinKim | Sunday, March 26, 2017

smile

frown

skip

The article plots for every 100 women that use a certain type of contraception method the number of unplanned pregnancies over time. https://www.nytimes.com/interactive/2014/09/14/sunday-review/unplanned-pregnancies.html?_r=0 In particular at the end...

From: Stats Stack Exchange | By: user103341 | Sunday, March 26, 2017

smile

frown

skip

I'm provided with the following dataset: Dataset. I'm meant to use sklearn to create a Support Vector Machine that can predict it. I load A and B from my dataset into a 2 dimensional array called input_data and load the label from my dataset into an...

From: Stats Stack Exchange | By: patrickdamery | Wednesday, March 29, 2017

smile

frown

skip

I have devised a new clustering algorthm that is domain agnostic and has several assumptions. I can't seem to find real-world data to test it, so I have generated some syntetic data. The idea: Generate pre-cluster data "Forget" the clusters and unify...

From: Stats Stack Exchange | By: Jack Stevens | Tuesday, March 28, 2017

smile

frown

skip

In the sarima function in the astsa package in R, we can add external regressors to a SARIMA model, so I assume that we obtain a SARIMAX model? If we add regressors to a SARIMA(p, 0, 0) x (0, 0, 0)o model, is this equivalent to adjusting an ARX model...

From: Stats Stack Exchange | By: Xavier | Tuesday, March 28, 2017

smile

frown

skip

Edit: I apologize if the question is considered to broad. In fact, it is concerning a very specific task in bioinformatic analysis of high throughput data set, and in my opinion the problem presented here, albeit not formulated formally, is very specific....

From: Stats Stack Exchange | By: January | Monday, March 27, 2017

smile

frown

skip

I'm trying model stacking in a kaggle competition. However, what the competition is trying to do is irrelevant. I think my approach of doing model stacking is not correct. I have 4 different models: xgboost model with dense features (numbers, that can...

From: Stats Stack Exchange | By: user1157751 | Wednesday, March 29, 2017

smile

frown

skip

I need to analyse participant's reaction time data from a 3x2x2 (Face-ABC, Visual field-1,2,Prime-?,!) repeated measures design. Each trial manipulated all IV's and multiple trials were completed per combination. Insofar as data transformation, my approach...

From: Stats Stack Exchange | By: cel12345 | Tuesday, March 28, 2017

smile

frown

skip

full question
The regression between y and x gives the equation y-hat = -1.2 + 3.4x. The R-squared value for this regression is 0.64. What is the correlation value? Input your answer in decimal format, rounded to 2 decimal places.

From: Stats Stack Exchange | By: courtney.b | Wednesday, March 29, 2017

smile

frown

skip

I have read the article Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. In this article in section 2.3, the theorem about equivalence of first-order incremental search for mRMR and Max-Dependency...

From: Stats Stack Exchange | By: Peter Bugata | Tuesday, March 28, 2017

smile

frown

skip

Given the fact that A, B C, D are the input nodes and Quality is the output node. value of all input nodes can vary from 0 to 1....

From: Stats Stack Exchange | By: user3447215 | Wednesday, March 29, 2017

smile

frown

skip

Using logistic regression (lrm) from rms, is there a way to use the Predict command to compute (and plot) predicted probabilities, not log odds?

From: Stats Stack Exchange | By: Paul gronke | Wednesday, March 29, 2017

smile

frown

skip

From what I know, Recurrent NNs perform very well in case of sequential data. However, I have also read at many places that it can be used for non-sequential data as well. For instance in the article 'The unreasonable effectiveness of Recurrent Neural...

From: Stats Stack Exchange | By: darthy | Wednesday, March 29, 2017

smile

frown

skip

I have to classify the executable files as malicious and non-malicious files. I have created my own corpus Train. I have explained the errors below. The input file format is also given below. How can I get the presence of the features with their names...

From: Stats Stack Exchange | By: banu | Wednesday, March 29, 2017

smile

frown

skip

I start with 9 independent variables in my linear regression. However, I find that the overall F test value is not having significant P value and some variables are highly correlated. Thus I discarded five independent variables and get both significant...

From: Stats Stack Exchange | By: Eric | Tuesday, March 28, 2017

smile

frown

skip

I ran an experiment with 4 factors A, B, C, and D. Factor C is nested within B. The results of the ANOVA show: C had a significant effect and AxB had a significant interaction D had no significant effect, but in this case, D is species, and I want to...

From: Stats Stack Exchange | By: Nathan Haag | Tuesday, March 28, 2017

smile

frown

skip

I have a network that is an "ensemble" of text data and linear data that feed into a concat layer into another feed forward network. I understand that the gradient is giving you how much an input/node affect the output of the network. Would it be possible/usefult...

From: Stats Stack Exchange | By: Camron_Godbout | Tuesday, March 28, 2017

smile

frown

skip

In the book Time Series Analysis by R, the author mentions the use of moving average to smooth out the white noise.
Can moving averages be used to remove white noise or are there better methods?

From: Stats Stack Exchange | By: jeffy abraham | Tuesday, March 28, 2017

smile

frown

skip

I am looking at the following pdf and on page 4 , it mentions that If $x,y$ are jointly distributed variables which bear the linear relationship, $$E(y|x) = \alpha + B^T x$$ then $$Var(y|x) = Var(y) - Cov(y,x)[Var(x)]^{-1} Cov(x,y)$$ note that $E(y|x)...

From: Stats Stack Exchange | By: user1769197 | Tuesday, March 28, 2017

smile

frown

skip

My first question was: how does the sample size affect t-test results? And I found the answer in this post. Now I understand the "unbalance" situation does not neccessarily affect the results of a t-test. I also tried power200200 and power20050 in the...

From: Stats Stack Exchange | By: Yan | Tuesday, March 28, 2017

smile

frown

skip

Consider the random variables $X$ and $Y$ defined on the same probability space $(\Omega, \mathcal{F}, \mathbb{P})$ taking values respectively in $\mathbb{R}^d$ and $\mathbb{R}^p$. Let $f:\mathbb{R}^d\rightarrow \mathbb{R}^m$. Let $g:\mathbb{R}^m\rightarrow...

From: Stats Stack Exchange | By: user3285148 | Tuesday, March 28, 2017

smile

frown

skip

I have an estimator $\theta$ for the mean $\mu$. I understand consistency such that $\theta$ converges in probability to $\mu$ as $n$ goes to infinity. Now, I encountered another concept, consistency in mean sqaure. $\theta$ is consistent in mean square...

From: Stats Stack Exchange | By: cecefuss | Tuesday, March 28, 2017

smile

frown

skip

I have the following model $y_i=\beta_1+\beta_2x_i+\epsilon_i$ with $E(\epsilon^2)=\sigma^2\exp(x_i)$ And I have to use the proper transformation to obtain a model where the variance of the error is $\sigma^2$. My guess: I do not have any guesses because...

From: Stats Stack Exchange | By: plr | Tuesday, March 28, 2017

smile

frown

skip

I am working with linear Gaussian Bayesian networks, and trying to recover the joint multivariate distribution from the conditionals. This is described in Probabilistic graphical models by Koller (pg 251). ( A summary can be found at the link: pdf download!)...

From: Stats Stack Exchange | By: user2957945 | Tuesday, March 28, 2017

smile

frown

skip

I have to extract data from our databases, which I then perform various aggregation and recoding before handing to analysts. I would like to document the definition of the variables carefully so that everything is transparent. For example, I want the...

From: Stats Stack Exchange | By: Heisenberg | Tuesday, March 28, 2017

smile

frown

skip

Old Scheme 57 103 59 75 84 73 35 110 44 82 67 64 78 53 41 39 80 87 73 65 28 62 49 84 63 77 67 101 91 50 New Scheme 62 122 54 82 84 86 32 104 38 107 84 85 99 39 34 58 73 53 66 78 41 71 38 95 81 58 75 94 100 68 the sales output in (£000) before and after...

From: Stats Stack Exchange | By: Ruma Sinha | Tuesday, March 28, 2017

smile

frown

skip

I have been working in a project in which I have to collect the following data over a timespan: Temperature of a room Humidity inside the room Amount of CO2 present in the room Number of persons in the room I have already conducted the experiment and...

From: Stats Stack Exchange | By: somdeep acharyya | Tuesday, March 28, 2017

smile

frown

skip

I am stuck on the following homework assignment: $$X'X=\begin{bmatrix} 10 & 1.2980 & -2.4641 & 0.7716 \\ 1.2980 & 4.8676 & -3.0048 & -1.6154 \\ -2.4641 & -3.0048 & 5.3561 & -0.4576 \\ 0.7716 & -1.6154 & -0.4576...

From: Stats Stack Exchange | By: user429134 | Tuesday, March 28, 2017

smile

frown

skip

This is a followup to the answers here and here. I have not seen this term in any textbooks I have or many online resources. It is not, for example, present on the SVM wikipedia page. What is a hypothesis class in the context of SVM? How do the support...

From: Stats Stack Exchange | By: kingledion | Tuesday, March 28, 2017

smile

frown

skip

I would like to know how to handle missing data in predictive analysis: In my case, missing information has been decided not to be omitted, however, for certain predictive models such as logistic regression, random forest, they couldn't handle missing...

From: Stats Stack Exchange | By: user95902 | Tuesday, March 28, 2017

smile

frown

skip

I have some data representing times series about houses costs in specific areas. Some of the values along the times series (30 points = 30 months) are missing or are totally wrong (huge spikes). What I am doing right now is to calculate the average and...

From: Stats Stack Exchange | By: Randomize | Tuesday, March 28, 2017

smile

frown

skip

I am building a recommendater system using Collaborative Filtering. I have implemented Alternating Least Square method following this tutorial. Now I want my algorithm to adapt new ratings for movies that were previously not rated. Should the algorithm...

From: Stats Stack Exchange | By: Jatin Bhola | Tuesday, March 28, 2017

smile

frown

skip

Let $M$ be a n x k matrix which is the outcome of a subjective test, where $n$ is the number of samples and $k$ is the number of raters Values in $M$ range from 0 to 1. Since the number of samples is high and the evaluation procedure is long, each rater...

From: Stats Stack Exchange | By: Francesco Setragno | Monday, March 27, 2017

smile

frown

skip

I have an observed data set, denote it by data set A. I simulated a multivariate normal data set, denote it as data set B. When I plot the ACF and PACF of both data sets I received a very similar result. I applied all joint test of multivariate normal...

From: Stats Stack Exchange | By: rsc05 | Tuesday, March 28, 2017

smile

frown

skip

I am trying to forecast time-series in a very "applied" sense. Ideally, what I would be looking for was a time-series model (à la ARIMA models), which could capture the dynamics of the growth data of an index. However, I have strong a priori ideas about...

From: Stats Stack Exchange | By: pApaAPPApapapa | Tuesday, March 28, 2017

smile

frown

skip

Here is the statement, that I have read: Since we are selecting the furthest outlier, it is not legitimate to use a simple t-test(for studendized residuals) for detecting outliers. To remedy this we can make a Bonferroni adjustment to the p-value I have...

From: Stats Stack Exchange | By: Daniel Yefimov | Tuesday, March 28, 2017

smile

frown

skip

I wish to forecast Y for year 2018 but I only have two data points of Y in years 2006 and 2012. I already did multiple linear regression (since I have a lot of predictors) but multiple linear regression does not consider the time aspect so my predictions...

From: Stats Stack Exchange | By: Katherine | Tuesday, March 28, 2017

smile

frown

skip

I'm new at learning random variables and stuck in this example. Can anyone help me solve this?
"The RV x is N(5,2) and y=2x+4. Find mean, standard deviation and density function of y."

From: Stats Stack Exchange | By: Kubilay Can DEMİR | Tuesday, March 28, 2017

smile

frown

skip

I just estimated a Vector Autoregressive Model with 6 lags and 10 variables in R. My goal is to simulate the given original time series (on which the model parameters were estimated) to see how the model fits. As the simulation in R doesn't work (but...

From: Stats Stack Exchange | By: Blair92 | Tuesday, March 28, 2017

smile

frown

skip

Let's say we have a random variable $Y$ defined as the sum of $N$ Bernoulli variables $X_i$, each with a different success probability $P_i$ and a different weight $W_i$. The weights are positive and between 1-1,000 Formally, $Y = \sum X_i W_i$ Where...

From: Stats Stack Exchange | By: Leon P | Tuesday, March 28, 2017

smile

frown

skip

I would greatly appreciate if you could let me know how to do discrete time survival analysis with time varying covariates. Some part of my data set is as follows (d1-d12: are dummy variables for each time period): ID TIME EVENT x1 x2 x3 x4 x5 1 1 0...

From: Stats Stack Exchange | By: ebrahimi | Tuesday, March 28, 2017

smile

frown

skip

When performing chi-squared independence tests, why do 2x2 tests always have every residual value (o-e) equal?
Why is this not true for tests with unequal amounts of rows and columns?

From: Stats Stack Exchange | By: Eric | Tuesday, March 28, 2017

smile

frown

skip

I am trying to implement Feature hashing in python. I plan to use the following command. preproc =Pipeline([('fh',FeatureHasher( n_features=2**27,input_type='string', non_negative=False))]) I have a dataframe that has int64, category, object data types....

From: Stats Stack Exchange | By: Aman | Tuesday, March 28, 2017

smile

frown

skip

My question is are all ARIMA processes also unit root processes? My guess is yes because $\{X_t\}$ is ARIMA(p, d, q) if $(1-B)^dX_t = a(B)\epsilon_t$ is stationary ARMA(p, q). The characteristic function for $(1-B)^dX_t = a(B)\epsilon_t$ is $(1-B)^d$,...

From: Stats Stack Exchange | By: Student | Tuesday, March 28, 2017

smile

frown

skip

I need help in determining the cut-off value to make sensitivity and specificity for a biomarker. I have a continuous biomarker and an outcome with three diseased groups (ex, stage I, II and III) I know many methods such as: mean of the biomarker Bimodal...

From: Stats Stack Exchange | By: Mohamed Gomaa | Monday, March 27, 2017

smile

frown

skip

I have some data and a model, e.g., $H_0: \xi \sim \mathcal N(\mu_1 > \mu_0, \sigma_0)$, $\mu_0, \sigma_0$ are fixed, in other words, this is the similar to the situation with "unequal means, alternative=greater". How can I calculate the likelihood...

From: Stats Stack Exchange | By: German Demidov | Monday, March 27, 2017

smile

frown

skip

I'm working with a Negative binomial regression in STAN. I would like to make predictions on a test set, but looking at the reference I can't find a negative_binomial random number generator. Is there any way to do so without saving mean and overdispersion...

From: Stats Stack Exchange | By: Tommaso Guerrini | Monday, March 27, 2017

smile

frown

skip

I'm trying to understand how loss_metric class in dlib calculates the gradient. Here is the code(full version): // It should be noted that the derivative of length(x-y) with respect // to the x vector is the unit vector (x-y)/length(x-y). If you stare...

From: Stats Stack Exchange | By: don-prog | Monday, March 27, 2017

smile

frown

skip

I am learning how to use libsvm through sklearn.svm in python. I read here about what happens and why when you change the C value as part of your model. My intuition from what I've learned, would be that lower C values would use less support vectors...

From: Stats Stack Exchange | By: kingledion | Monday, March 27, 2017

smile

frown

skip

- Westinghouse Electric is filing for bankruptcy
- Emily Deschanel and 'Bones' boss break down the final episode ...
- Photographer capturesbeauty of Europe's abandoned buildings ...
- Trump: US troops 'fighting like never before' in Iraq
- Hillary Clinton makes most political remarks since losing ...
- Paralyzed man uses experimental device to regain hand movements ...
- Ex-congressman Aaron Schock claims staffer dimed him out ...
- Obama's climate policies that Trump opposes
- After health care failure, Plan B suddenly more appealing ...
- Airline industry leader says laptop ban could hurt airlines ...

- Alternative Energy
- In Vitro Fertilization
- Artificial Intelligence
- Manolo Blahnik
- iPhone
- Glenn Beck
- Recipes
- Machine Learning
- Carbon Footprint
- Couture
- Green Energy
- Alicia Keys
- Mount Everest
- Supreme Court
- Weight Loss
- Scams
- Journalism
- Debt
- Afghanistan
- Healthcare
- Photography
- Pregnancy
- Advertising
- Parenting
- Wii