## Welcome to Serendeputy!

Serendeputy is your personal news assistant.

- learns what you like and don't like,
- lovingly compiles a list of news and blogs for you.

How it works.

What to do:
2. Click smileys and frownies
3. Find favorite topics and sources
4. See how much better your deputy is getting at finding you good stuff.

# Stats Stack Exchange

I've been trying to perform a binary classification using an SVM classifier (scikit-learn's SVC with RBF kernal). I have a sample size of about 100, with about 70 features each. The features are of approximately the same order of magnitude in their raw...
From: Stats Stack Exchange | By: Shovalt | Tuesday, May 31, 2016
smile
frown
I want to learn machine learning. I found tons of material on the internet but couldn't decide which book to get started with.
From: Stats Stack Exchange | By: stormshadow | Monday, May 30, 2016
smile
frown
I have been learning about the use of machine learning algorithms and their application to particle physics. Now, I have some doubts concerning what to do with the results. Let me explain: imagine that we have two theoretical models to explain the data....
From: Stats Stack Exchange | By: PML | Sunday, May 29, 2016
smile
frown
This question has been asked on CV some yrs ago, it seems worth a repost in light of 1) order of magnitude better computing technology (e.g. parallel computing, HPC etc) and 2) newer techniques, e.g. [3]. First, some context. Let's assume the goal is...
From: Stats Stack Exchange | By: horaceT | Saturday, May 28, 2016
smile
frown
I am trying to solve a problem for finding similarity score between objects to create a similarity score matrix based on multiple nominal/ordinal/continuous variables for each object. Example of how the data looks like : Object Var1 Var2 A 4.5 category1...
From: Stats Stack Exchange | By: Sheetanshu Gupta | Tuesday, May 31, 2016
smile
frown
Is there some good heuristics to choose: Number of filters in a Convolutional layer Size of the filters Number of Convolutional layers I have 250k small images (28x28), and I have 37 outputs. So I don't know if knowing this can help me to choose a raisonnable...
From: Stats Stack Exchange | By: Ghilas BELHADJ | Tuesday, May 31, 2016
smile
frown
I have a dataset from a cross-sectional study (n=121) where people where asked about production characteristics in 2015 and how they recall their production in 2010. One set of example questions could be: "How many units of input A did you use in 2010?"...
From: Stats Stack Exchange | By: user117425 | Tuesday, May 31, 2016
smile
frown
I want to increase my sample to thousand? Suppose tree data.frame has 100 data point. Now i want to increase to 100. head(tree) Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 Can someone please...
From: Stats Stack Exchange | By: paramjeet | Tuesday, May 31, 2016
smile
frown
I was reading the section on k-statistics on wolfram alpha. It was known to me that for the sample variance $k_2 = \frac{1}{n-1}\sum_{i=1}^n (x_i - \overline{x})^2$ it holds that its variance equals $var(k_2) = \frac{\kappa_4}{n} + \frac{2 \kappa_2}{n-1}... From: Stats Stack Exchange | By: Akkariz | Tuesday, May 31, 2016 smile frown Given two variables, X and Y, there is a way of obtaining a Mutual Information value between 0 and 1 by: MI_normalised=MI_original/sqrt(H(X)*H(Y)); where H(X) and H(Y) are entropies of X and Y respectively. Just wondering if there is a similar operation... From: Stats Stack Exchange | By: Nitin | Tuesday, May 31, 2016 smile frown i am a research student. Now i am confused with the data i have collected through a questionnaire survey regarding the topic effectiveness of RBI policies. please help me to in deciding which test is applicable here inorder to complete the analyses and... From: Stats Stack Exchange | By: rechu | Tuesday, May 31, 2016 smile frown The probability of heads showing up upon tossing a certain coin is$p$, this coin is tossed$3$times, let$X_i,i=1,2,3$be$1$or$-1$depending on the outcome of the$i^{th}$toss being head or tails respectively.Then which of the following statements... From: Stats Stack Exchange | By: priyanka | Saturday, May 28, 2016 smile frown MASS::mvrnorm() takes a mandatory Sigma argument which is a symmetric matrix specifying the covariance matrix of the variables. mvrnorm() is useful, say, for demonstration purposes. How would I create in R an$n\times n$symmetric matrix with arbitrary... From: Stats Stack Exchange | By: Moazzem Hossen | Tuesday, May 31, 2016 smile frown I was reading a few papers on experimental psychology. There i read about some experiments performed on humans regarding face recognition.They concluded that humans recognize faces when presented in a holistic form rather than a specific feature of part... From: Stats Stack Exchange | By: user3371423 | Tuesday, May 31, 2016 smile frown I have data on an entire social network of individuals. I'd like to know whether a particular individual-level characteristic is more similar among individuals who are directly linked than among random pairs. (For now I'm not worried by the direction... From: Stats Stack Exchange | By: dash2 | Monday, May 30, 2016 smile frown I have conducted a survey where all my questions are asked in a dichotomous manner (Yes/No). Eg IV:"Are you a smoker?", "Are you obese", "Is your gender male/Female" etc. DV: "Have you ever had a stroke?" Therefore both my dependent variable and independent... From: Stats Stack Exchange | By: Aiden | Tuesday, May 31, 2016 smile frown I am building a model whose outputs are between 0-1 and the goal is to minimize a cost function over the predicted values and labels. So far everything seems to be easy but my labels are real-valued and therefore, I cannot use the ordinary cross entropy... From: Stats Stack Exchange | By: Amir | Tuesday, May 31, 2016 smile frown I was going over the derivation of Naive Bayes, and the following 3 lines were given: Suppose$X = \left $$$P(X|Y) = P(X_1, X_2 | Y)$$ $$= P(X_1 | X_2, Y)P(X_2 | Y)$$ $$= P(X_1 | Y)P(X_2 | Y)$$ So the third line comes from the fact that we have made... From: Stats Stack Exchange | By: kevinzakka | Tuesday, May 31, 2016 smile frown I was reading a post that used score fusion to compare two scores from two different classifiers (after normalisation). I read another that suggested feeding the results of these two classifiers into a stacked approach. In what situation is each appropriate?... From: Stats Stack Exchange | By: mino | Tuesday, May 31, 2016 smile frown I have a response variable that is 4 categories of behaviors (ly, rs,al and fd). I am trying to use a multinomial model with 7 habitat-related predictors as fixed factors and individuals ("bird.ID") as a random factor. the data looks like this: >... From: Stats Stack Exchange | By: Emm. | Monday, May 30, 2016 smile frown So I have a dataset that contains both categorical and numerical data for each data point, and a class for each data point. My goal is to plan to build an SVM model from the data to predict the class of data points i put into the model. Because SVMs... From: Stats Stack Exchange | By: Ted | Tuesday, May 31, 2016 smile frown I was reading the batch normalization paper and it had one section where is goes through an example, trying to show why normalization has to be done carefully. I honestly, can't understand how the example works and I am genuinely very curious to understand... From: Stats Stack Exchange | By: Charlie Parker | Monday, May 30, 2016 smile frown When developing an instrument involving ordinal data (likert scale with 5-6 response levels), how does one reduce the initial item pool before completing the exploratory factor analysis? I have seen that categorical PCA (in SPSS) has been recommended... From: Stats Stack Exchange | By: user116948 | Monday, May 30, 2016 smile frown I have seen several definitions for weak stationarity of time series. One of the conditions is connected to the variance of the series. I have seen two definitions of this condition: Variance is constant over time Variance is finite over time The second... From: Stats Stack Exchange | By: user44697 | Monday, May 30, 2016 smile frown In chapter 8 section 8.7.1 it tries to explain batch normalization. In the second paragraph of that section it tells us to consider the simple example: $$\hat{y} = x w_1 ... w_i ... w_l$$ and then claims: The output$\hat y$is a linear function of... From: Stats Stack Exchange | By: Charlie Parker | Monday, May 30, 2016 smile frown I'm currently using naiveBayes from {e1071}. My response is simply a prediction based on my independent variables. Is there a way to get the probability for each possible prediction as a response with naiveBayes? So if I was trying to predict the outcome... From: Stats Stack Exchange | By: jgozal | Monday, May 30, 2016 smile frown When scientists are using mark-recapture models on an open population model to estimate the survival probability and the recapture probability (also known as "detection"), how can we be sure that the model is estimating the right thing between the two... From: Stats Stack Exchange | By: M. Beausoleil | Monday, May 30, 2016 smile frown About convolution: prof. Brad Osgood said during the course EE-261 said that we can not fully "visualize" convolution. E.g. https://see.stanford.edu/materials/lsoftaee261/book-fall-07.pdf , p.105: "Now, tell the truth, do you really think you could just... From: Stats Stack Exchange | By: bruziuz | Monday, May 30, 2016 smile frown Well? Does all variables in a VAR/VEC need to be normally distributed, or only the target variable? It is very hard to get all of them to meet criteria of normality without deleting too many outliers. From: Stats Stack Exchange | By: Lars Ahnland Nordfors | Monday, May 30, 2016 smile frown Does anyone know how to extract/ get the corresponding standard deviation of the conditional variable importance in a cforest object? Would be great From: Stats Stack Exchange | By: guest | Monday, May 30, 2016 smile frown I have within-subject desing and want to compare the means of two conditions. I first run the test for normal distribution and used shapiro and p values are following; PARENT-Total play ,067 PARENT-Number of toys ,024 ALONE-Number of toys ,352 ALONE-Total... From: Stats Stack Exchange | By: user117375 | Monday, May 30, 2016 smile frown Question: is minimizing test set mean validation error more important than the gap between train and test errors? Let's say I can tweak parameters in my model to give me mean validation error of 4500 RMSE on k-fold cross validation. When I use these... From: Stats Stack Exchange | By: SpicyClubSauce | Monday, May 30, 2016 smile frown To estimate RRs for binary outcomes, sometimes the Poisson regression can be used. Specially in epidemiology, when the incidence rate of the binary outcome variable is above 10%, then it's necessary to use an alternative to the logistic regression because... From: Stats Stack Exchange | By: Indunil | Monday, May 30, 2016 smile frown Let$X_1$and$X_2$is a random sample from the geometric distribution with$Pr(X_i=j)=p(1-p)^{j-1},i=1,2;j=1,2,... 0<p<1$,what will be an unbiased estimator for$\frac{p}{(1+p)}?$My attempt:Let$T(X)$be the required unbiased estimator.Then,... From: Stats Stack Exchange | By: priyanka | Monday, May 30, 2016 smile frown I need the function (sin(x)/x)^3 to be evaluated in R a huge number of times. What is the fastest way: (sin(x)/x)^3, (sin(x)/x)^3L, or { y=sin(x)/x; y*y*y } ? From: Stats Stack Exchange | By: Viktor | Monday, May 30, 2016 smile frown with python there is predefined functions of externe index( like jaccard, hamming, accuracy)in the package sklearn.metric, to evaluate and compare between ground truth and the clustering result. is there some packages with R do the same? From: Stats Stack Exchange | By: naya | Monday, May 30, 2016 smile frown I am attempting Attrition Analysis in R using the Survival & KMsurv Package. My question is more related to how to use the R package / functionality for my situation. Let us say the analysis is for Department B. I have the following dataset: All... From: Stats Stack Exchange | By: Gaurav Chaturvedi | Monday, May 30, 2016 smile frown I am looking for a variant of Fleiss' Kappa to deal with interval data, rather than strictly nominal data. The context that I intend to use it in is as follows: There are several (5-8) graders grading a total of 16 exams The exams are identical, and... From: Stats Stack Exchange | By: artifaxiom | Monday, May 30, 2016 smile frown Im trying to understand the randomization in connection to sufficeny from the following text I dont really get what they mean by "a random device such as random number table to generate$Y$" I suppose they mean that we know the distribution of$X \mid...
From: Stats Stack Exchange | By: User1 | Monday, May 30, 2016
smile
frown
I have completed the PCA, EFA, and confirmatory factor analysis (CFA), treating data with likert scale (5-level responses: none, a little, some,..) as a continuous variable. Using Lavaan, I repeated the CFA defining the variables as categorical. I would...
From: Stats Stack Exchange | By: user116948 | Monday, May 30, 2016
smile
frown
It's known that ROC is overly optimistic in case of imbalanced data sets. How big can this bias be? For example if I read paper where they report 0.75 ROC on a dataset with 5 percent of samples being form the minority class, how would the ROC change...
From: Stats Stack Exchange | By: user2173836 | Monday, May 30, 2016
smile
frown
Imagine that for the purpose of a study a sample size is computed with the following formula for a given power $1-\beta$, difference in means $\epsilon$, standard deviation $\sigma$ and significance level $\alpha$ n = \frac{2(z_{\alpha/2}...
From: Stats Stack Exchange | By: user3631369 | Monday, May 30, 2016
smile
frown
What distribution could represent a "flipped" (skewed left) lognormal distribution?
From: Stats Stack Exchange | By: gabboshow | Monday, May 30, 2016
smile
frown
I have data which consist of 0,1,-1 values somethink like this. datax:{1,1,1,-1,-1,0,0,0,0,0,1,1,1,1,1,1,1,1 ......} y:{1,2,3,4,5,6,7,8 ......} I need to confirm that data has a relation with formula with the %90 confidance. How can I achive that. Formula:...
From: Stats Stack Exchange | By: sakir | Monday, May 30, 2016
smile
frown
I have run 2 Bayesian regression models and would like to compare the posterior samples of a parameter that is common to both models. For example, if model A is $y=\alpha + \beta_1x_1$ and model B is $y=\alpha + \beta_1x_1 + \beta_2x_2$ (This is just...
From: Stats Stack Exchange | By: Guido Biele | Monday, May 30, 2016
smile
frown
I'm trying to understand the RMSprop optimization method but I haven't been able to figure out why line 30 is necessary in this implementation of the RMSprop. Could someone please explain it to me? Any help would be greatly appreciated!...
From: Stats Stack Exchange | By: Bence | Monday, May 30, 2016
smile
frown
I've had it asserted to me that any consistent estimator must necessarily also grow less variable with increased sample size. I felt that this couldn't be correct, since there was nothing in the definition of a consistent estimator that forced this to...
From: Stats Stack Exchange | By: user1205901 | Monday, May 30, 2016
smile
frown
Currently I'm graduating and I'm finishing up my master's thesis as we speak. It's a study about the effect of hand positioning on response time. I'm submitting my data to a 2x6 repeated measures ANOVA and have found no significant interaction effect,...
From: Stats Stack Exchange | By: RonaldSurfs | Monday, May 30, 2016
smile
frown
Can a linear SVM support more than 2 classes for classification?
From: Stats Stack Exchange | By: john | Monday, May 30, 2016
smile
frown
I'm pretty new to LDA and I came across other terminology called Gaussian discriminant analysis elsewhere. Since LDA assumes the normality or normal distribution of the data which is same as Gaussian distribution. Are they both referring to the same...
From: Stats Stack Exchange | By: yome | Monday, May 30, 2016
smile
frown