Serendeputy - your personal news assistant.

Welcome to Serendeputy!

Serendeputy is your personal news assistant.

Your deputy:
- learns what you like and don't like,
- lovingly compiles a list of news and blogs for you.

You can help your deputy learn by searching, clicking links and pressing the little smiley faces.
How it works.

What to do:
  1. Click links to teach your deputy
  2. Click smileys and frownies
  3. Find favorite topics and sources
  4. See how much better your deputy is getting at finding you good stuff.
  5. Sign in for free to save your profile, or please tell me why you won't.
I'm currently using the train() function in the caret package to run 10-fold repeated cv on a random forest model. I would also like to explore other statistical and machine learning models for use with the same dataset, that is, with the same predictor...
From: Stats Stack Exchange | By: small_world | Thursday, July 30, 2015
smile
frown
I am newbie in machine learning. I have been studying about features extraction and some classification approaches, in the term of my study, I have a question in my mind, what the reasons we need to extract a lot of features for classification? is it...
From: Stats Stack Exchange | By: user83655 | Thursday, July 30, 2015
smile
frown
I have a matrix where the rows are the data points (samples) and the columns are the features. It is a multiclass (4 classes) problem. On this data I want to apply machine learning classifiers. But first I want to do feature selection by using ANOVA....
From: Stats Stack Exchange | By: machinery | Wednesday, July 29, 2015
smile
frown
I have a data set that includes a DV (Richness or Abundance) and multiple continuous and categorical variables: Distances Richness Abundance Canopy Flower Veg Wood Trees Net Tray Vines Quality -300 1.083 1.945910149 0.886077124 0 0.397699415 0.321750554...
From: Stats Stack Exchange | By: tom91 | Thursday, July 30, 2015
smile
frown
I am dealing with an imbalanced dataset with the R package randomForest. Some one has suggested that, Bootstrap your data while over-sampling the rare class and under-sampling the typical class. But I found that with the resampling size increasing, the...
From: Stats Stack Exchange | By: earclimate | Friday, July 31, 2015
smile
frown
I am new in matlab and neural network, and I am doing a prediction with some data that I found on internet to learn more about it. Here is my function to create a neural network: function net = createNet(hiddenSize, trainFcn, X_treinamento, T_treinamento)...
From: Stats Stack Exchange | By: X0R40 | Friday, July 31, 2015
smile
frown
I have a joint probability of a very specific form: $P(x_1,\cdots,x_n)=\phi(x_1)\psi(x_1,x_2)\phi(x_2)\cdots\psi(x_{n-1},x_n)\phi(x_n)=\prod_{i=1}^n \phi(x_i) \prod_{i=1}^{n-1} \psi(x_i,x_{i+1})$ I wonder if there is a closed form expression for $P(x_{i+1}|x_i)$,...
From: Stats Stack Exchange | By: maksay | Friday, July 31, 2015
smile
frown
I had a problem when I tried to test the fitting of my data with the generalized Pareto distribution. I used the MLE to estimate the two parameters 'shape' and 'scale' and I generated a vector of random variables GPD with them. Does this make sense if...
From: Stats Stack Exchange | By: Quang-Trung | Friday, July 31, 2015
smile
frown
I have performed an ANOVA (Linear model in SAS EG) to determine the role of country, farm, sex and year-season on performances of pigs and ran pair-wise bonferroni tests on country, farm and sex (LSM post-hoc test). Will I be able to use any of the results...
From: Stats Stack Exchange | By: Donné | Thursday, July 30, 2015
smile
frown
Basic setup: Unit of observation is the individual. Treatment (binary) is assigned on city level. Every state contains 4 cities, 2 get randomly chosen for treatment, 2 control. There are only 5 states. The outcome of interest is likely to be regionally...
From: Stats Stack Exchange | By: sheß | Wednesday, July 29, 2015
smile
frown
I have a data set that consists of the information generated by a service call for a home appliance. The data set consists of a column with the sentence of the customer's complaint and a corresponding column of the part that the service technician replaced....
From: Stats Stack Exchange | By: Nick | Friday, July 31, 2015
smile
frown
I am attempting to conduct a logistic regression for a tennis analytics project, endeavoring to predict the probability of a player winning a point in which he is the server. My response variable (service points) is binary in the sense that it can have...
From: Stats Stack Exchange | By: Stevie Kvothe | Friday, July 31, 2015
smile
frown
I want to classify an image and I want to know how well I did, but I am not sure if I understand the workflow properly. I use scikit-learn. I first use cross_validate and GridSearchCV to find the optimal hyper parameter settings. Now I want to classify...
From: Stats Stack Exchange | By: JdeB | Friday, July 31, 2015
smile
frown
I am analyzing data from one study where participants had to choose (between two stimuli) the one with higher intensity. One way to look at the data is to fit the proportion of correct choices as a function of the absolute difference between the 2 intensities...
From: Stats Stack Exchange | By: Matteo Lisi | Thursday, July 30, 2015
smile
frown
For a given set of numbers $ Y = \{ .... \} $ and a given exponential distribution expressed by $ f(y; shape, scale, family )$, there should be a subset of numbers that can be derived from $ X \subset Y $, such that distribution of $ X $ represents $f$....
From: Stats Stack Exchange | By: Shark | Thursday, July 30, 2015
smile
frown
I am running a logistic regression on a data set containing Continuous, Ordinal, Categorical and Dichotomic variables. I would like to know how to calculate the correlation for all possible combinations (see matrix below - cases marked with an X do not...
From: Stats Stack Exchange | By: user2568648 | Thursday, July 30, 2015
smile
frown
I used the miltiple regression models to derive the outcomes after using AIC pairwise comparison and deleted the outliers, high leverage points. And it seems good, the adjusted R^2 acheived 0.9543, see below: Call: lm(formula = P ~ V + EF + W + H, data...
From: Stats Stack Exchange | By: Samotht | Friday, July 31, 2015
smile
frown
I have a dataframe, 'datas', with 200 observations and a series of columns (some numeric, dummy, etc) and a binary class variable to be predicted that is called "bad_econ." I would like to get the model to predict whether bad_econ = is the case (1) or...
From: Stats Stack Exchange | By: Solutioneering | Wednesday, July 29, 2015
smile
frown
I want to perform a Poisson regression to explain Abundance (Counts of individuals) through a number of continuous and categorical explanatory variables. Some of the categorical variables have more than two levels so I will be performing some dummy coding...
From: Stats Stack Exchange | By: tom91 | Friday, July 31, 2015
smile
frown
There is a paper : Saralees Nadarajah and Samuel Kotz,A note on the product of normal and laplace random variables,Brazilian Journal of Probability and Statistics 2005.I have added image version of it. The integral in (2.3) can be calculated by direct...
From: Stats Stack Exchange | By: sutsmart | Friday, July 31, 2015
smile
frown
I am pretty new to analytics and I have a data received from sensor about the water usage .The readings are taken at 10 sec interval.I would like to know what kind of analytics methods can be applied on it to know the usage pattern,detect abnormal behavior...
From: Stats Stack Exchange | By: mri | Friday, July 31, 2015
smile
frown
Let's assume the model: lm(VAR ~ A * B + (A : R), data) which produces this ANOVA: Analysis of Variance Table Response: VAR Df Sum Sq Mean Sq F value Pr(>F) A 2 2444.07 1222.04 71.4330 1.086e-14 *** B 3 2370.92 790.31 46.1966 8.675e-14 *** A:B 6 1376.40...
From: Stats Stack Exchange | By: Walter | Thursday, July 30, 2015
smile
frown
I am doing some stuff about functional data analysis, but I come across some trouble when I try to analyze curves with significant fluctuations. I use the sample data "StatSciChinese" in "fda" package as an example. library(fda) data("StatSciChinese")...
From: Stats Stack Exchange | By: NiubilityDiu | Friday, July 31, 2015
smile
frown
A 30 item questionnaire in accept or decline form consisting of commonly encountered dysfunctional beliefs about psychiatric issues was administered to medical students. The questionnaire is in the form of YES/NO and i have data . I am interested in...
From: Stats Stack Exchange | By: Milan Amrut Joshi | Friday, July 31, 2015
smile
frown
I am a beginner with R, and I would like if someone can walk me through this issue. I have a data set with three variable x, y, z and many rows. The variables repeat every year in columns for 45 years as follows: Country X1970 Y1970 Z1970 X1971 Y1971...
From: Stats Stack Exchange | By: user49017 | Friday, July 31, 2015
smile
frown
For a project for which there are multiple bidders, the following is known: Number of bidders: 24 Mean bid: 104 Highest bid: 356 Lowest bid: 20 Given the above, is it possible (however roughly) to estimate (i) the number of bids above and below the mean...
From: Stats Stack Exchange | By: Pyderman | Friday, July 31, 2015
smile
frown
i have a score $x$ on a scale $[0-\infty]$. I know that if $0\leq x \leq 1.3$ the fuzzy grade is "None". If $1.3\leq x \leq 2.1$ the fuzzy grade is "MILD". If $2.1\leq x \leq 3.5$ the fuzzy grade is "MODERATE". If $3.5\leq x < \infty$ the fuzzy grade...
From: Stats Stack Exchange | By: Schibo | Friday, July 31, 2015
smile
frown
Since RF can handle non-linearity but can't provide coefficients, would it be wise to use Random Forest to gather the most important Features and then plug those features into a Multiple Linear Regression model in order to explain their signs?...
From: Stats Stack Exchange | By: Hidden Markov Model | Thursday, July 30, 2015
smile
frown
For each day, I observe my variable, y(t), for a period of seven hours. In order to understand the data and make predictions, I want to put together these data and make a long timeseries data. Now, if I fit an AR(1) model to the data, or even do kalman...
From: Stats Stack Exchange | By: Pep | Friday, July 31, 2015
smile
frown
This is something that had confused me for a while. I figured it out, then ran into the same problem again, so I wanted to be sure to post it to help out others. As RStudio has become more robust and powerful, I've tried to incorporate more of its hooks....
From: Stats Stack Exchange | By: Mike Williamson | Friday, July 31, 2015
smile
frown
I am new to time series and I am trying to figure out exactly what does on beyond the scenes in R. Say I have the MA process: $$y_t - \mu = a_t+\theta_1 a_{t-1} + \theta_2 a_{t-2}$$ where $a_t$ are i.i.d. standard normal. For concreteness let $\mu =...
From: Stats Stack Exchange | By: mb7744 | Friday, July 31, 2015
smile
frown
If I want to calculate the correlations among the components in a vector space using the MLE with a prior of multivariate Normal distribution, which kind of data should be better? The binary data or The continuous data such as the binary data {(1, 1,...
From: Stats Stack Exchange | By: Double Gray | Friday, July 31, 2015
smile
frown
I'm using the randomForest package in R to perform a binary classification. Model fit example: surv.rdf=randomForest(Question~., data=surv.train, mtry=surv.results$mtry[which.min(surv.results$ValError)], ntrees=100, classwt=c(table(surv.train$Question)[1]/length(surv.train$Question),...
From: Stats Stack Exchange | By: user83763 | Friday, July 31, 2015
smile
frown
(Spam content removed. See revision history for original.)
From: Stats Stack Exchange | By: Samantha | Friday, July 31, 2015
smile
frown
I have a probabilistic, binary classifier. Is there any principled way to select the threshold that maximizes the F1 score? Currently I simply choose many different thresholds, apply them on some validation data and pick the threshold that yields the...
From: Stats Stack Exchange | By: Franck Dernoncourt | Friday, July 31, 2015
smile
frown
In this picture you can see the formula (red rectangle added by me for emphasis): $$ \textbf{V}^\intercal\textbf{HV} = \textbf{D} $$ Should not this rather be (eigenvalue decomposition): $$ \textbf{V}^\intercal\textbf{DV} = \textbf{H} $$ The first two...
From: Stats Stack Exchange | By: kahoon | Thursday, July 30, 2015
smile
frown
According to the glmnet vignette, a foldid can be set up by: foldid=sample(1:10,size=length(y),replace=TRUE) However, if you look at the number of observations in each of the folds: > table(foldid) foldid 1 2 3 4 5 6 7 8 9 10 10 12 8 7 12 12 8 7 14...
From: Stats Stack Exchange | By: fumikos | Thursday, July 30, 2015
smile
frown
I have a dataset consisting of thousands of companies from the UK. For 300 of these companies, I collected several indicators which were extracted from social media (e.g. #likes). I have built a model using the WEKA data mining tool that achieves an...
From: Stats Stack Exchange | By: Alwin | Thursday, July 30, 2015
smile
frown
I'm trying to make the Kolmogorov test in a beta model in the Stata program, however I find that in stata so far I can only do this test to a normal distribution. So, does anyone know if the Kolmogorov test to be performed for a beta distribution in...
From: Stats Stack Exchange | By: Anna Molly | Thursday, July 30, 2015
smile
frown
This may be a stupid question but it's been bugging me for years. Can someone explain to me why would anyone choose a parametric over a nonparametric statistical method for hypothesis testing or regression analysis any day of the week? In my mind it's...
From: Stats Stack Exchange | By: en1 | Thursday, July 30, 2015
smile
frown
I'm looking to run a bunch of t-tests, and I'm trying to figure out the appropriate time to apply an FDR correction. I have four conditions and am doing pairwise comparisons amongst these conditions, so I have six pairwise comparisons/t-tests to run...
From: Stats Stack Exchange | By: hmg | Thursday, July 30, 2015
smile
frown
According to this question and answer, the sum of variances of all PLS components is normally less than 100%. Why do all the PLS components together explain only a part of the variance of the original data? Can somebody provide (further) evidence for...
From: Stats Stack Exchange | By: Iggy25 | Thursday, July 30, 2015
smile
frown
What conclusions can we draw if p>alpha? Does not rejecting the H0 mean anything? If p < alpha then: the mean is not the suggested one - difference in means (because we usually suggest the population mean) we have the strength of the evidence against...
From: Stats Stack Exchange | By: Anton Andreev | Thursday, July 30, 2015
smile
frown
I would like to know what are advantages and disadvantages of $R^2$ vs. correlation (e.g. cor() in R) vs. p-value of linear regression for two variables/features? What other ways exist to measure whether two variables/features correlate?...
From: Stats Stack Exchange | By: user49283 | Thursday, July 30, 2015
smile
frown
The use of atomless (continuous) distributions is ubiquitous in applied works. While the general idea is somewhat clear to me, I was looking for a formal definition or some useful references on the matter. Any help would be greatly appreciated....
From: Stats Stack Exchange | By: mrb | Thursday, July 30, 2015
smile
frown
I have a quick question I would humbly like to ask for your help to solve: let's assume that I am analyzing a series of events with different probabilities of success for which I have calculated the Expected Value. The results can be either "success"...
From: Stats Stack Exchange | By: sharpbounce | Thursday, July 30, 2015
smile
frown
I have a set of noisy data that can be described by a functional form. For each observation y(x), where x is an index that runs from 0-100, I know that y(x)=f(x+1)/f(x)-f(x+1). I would like to find a way of fitting f(x). I also know that f(x) must be...
From: Stats Stack Exchange | By: RonRich | Thursday, July 30, 2015
smile
frown
I do a lot of mixed ANOVA (both within and between subjects factors) and have found the ez package very helpful for this. However, lately, it seems to have stopped working for me half the time. I'm hoping someone can tell me what, if anything, I'm doing...
From: Stats Stack Exchange | By: P.N | Thursday, July 30, 2015
smile
frown
I try to calculate the PCA in my matrix and I use two ways for this: PCA function [coeff, score, eigenvalues] = pca(M); And for compare and understand the PCA calculus, I try to calculate step by step the PCA without the matlab function pca. %// first...
From: Stats Stack Exchange | By: sushi | Wednesday, July 29, 2015
smile
frown
I am currently a Math/Economics Student and I am working with a local bank on modeling operational risk, and one of the models that propose we might use is the Metropolis Hastings algorithm. This might be a stupid question, and it probably stems from...
From: Stats Stack Exchange | By: MathStudent | Thursday, July 30, 2015
smile
frown