## Welcome to Serendeputy!

Serendeputy is your personal news assistant.

- learns what you like and don't like,
- lovingly compiles a list of news and blogs for you.

You can help your deputy learn by searching, clicking links and pressing the little smiley faces.
How it works.

What to do:
1. Click links to teach your deputy
2. Click smileys and frownies
3. Find favorite topics and sources
4. See how much better your deputy is getting at finding you good stuff.

# Stats Stack Exchange

I have a complete dataset with input variables and response variables. I would like to do a simulation where I give the input variables and generate randomly the response variables. Is there a way to do that without using parametric models (like in machine...
From: Stats Stack Exchange | By: user53014 | Tuesday, July 29, 2014
smile
frown
I was using the Linear Discriminant Analysis (LDA) from the scikit-learn machine learning library (Python) for dimensionality reduction and was a little bit curious about the results. I am wondering now what the LDA in scikit-learn is doing so that the...
From: Stats Stack Exchange | By: Sebastian Raschka | Monday, July 28, 2014
smile
frown
I'm working on a project to estimate real estate and started with some classique techniques, such as linear regression etc. The obtained results are already going in the good direction, but to get some more precise results, I've started to lookup some...
From: Stats Stack Exchange | By: ABC | Tuesday, July 29, 2014
smile
frown
I'm a programmer (comfortable in Python and R) and I'm getting started with machine learning methods. I have a lot of data from the past year about users on my site. About 50% of the users converted to a particular goal. My problem is this: I'm interested...
From: Stats Stack Exchange | By: flossfan | Monday, July 28, 2014
smile
frown
I was planning to use a Tukey test for the following data. I regressed stock data for three time ranges (1 year daily data, 2 years weekly data, 8 years monthly data from today) and got the (beta and) alpha values. Dependent variable: stock returns of...
From: Stats Stack Exchange | By: Ana | Monday, July 28, 2014
smile
frown
I was wondering how to (or if it is even possible) find the continuous joint distribution between two random variables $x$ and $y$ when you know the continuous marginal density distributions of both $x$ and $y$, and we know there is a correlation between...
From: Stats Stack Exchange | By: Chris | Tuesday, July 29, 2014
smile
frown
This question is leading on from the following question. http://math.stackexchange.com/questions/360275/e1-1x2-under-a-normal-distribution Basically what is the $E\left(\frac{1}{1+x^2}\right)$ under a general Gaussian $\mathcal{N}(\mu,\sigma^2)$. I tried...
From: Stats Stack Exchange | By: Sachin_ruk | Tuesday, July 29, 2014
smile
frown
In this paper, I found a quite interesting approach concerning measurement of consistency in survey data (provided that each line=case represents a participant). Still, I cannot figure out exactly what is meant by "within-person correlation". Could anybody...
From: Stats Stack Exchange | By: nilsole | Tuesday, July 29, 2014
smile
frown
I want to know how variables affect travel mode for different trip purposes (i.e. leisure trips, work trips and shopping trips) in a specified region. I have 450 respondents in three different neighborhoods. My dependent variable is mode of travel (in...
From: Stats Stack Exchange | By: user52978 | Tuesday, July 29, 2014
smile
frown
I am trying to work out whether mobile, desktop or tablet users are more likely to perform an action on a site. The data I have is structured as follows: week clicked mobile tablet desktop 1 1 104 97 205 1 0 204 214 348 2 1 128 108 257 2 0 207 222 360...
From: Stats Stack Exchange | By: flossfan | Tuesday, July 29, 2014
smile
frown
I hope this is the correct SE site for this type of problem. I'm looking for known good algorithms for (fuzzy) clustering of similar file names found in a hierarchy of folders. File names are usually short character strings. I've read about two concepts:...
From: Stats Stack Exchange | By: mins | Tuesday, July 29, 2014
smile
frown
I have three techniques, called A, B and C. Each can be used independently when trying to perform four related tasks (Tasks 1, 2, 3 and 4). I have run lots of tests, and tried all combinations of each technique being on or off. My results look something...
From: Stats Stack Exchange | By: John Wickerson | Tuesday, July 29, 2014
smile
frown
I was curious what sort of time series models were the standard for doing this type of analysis. I have weekly sales data for the company - I could cook up my own time series model but would like to know what my options are.
From: Stats Stack Exchange | By: Andrew | Monday, July 28, 2014
smile
frown
I have some results where the tester claims the following values: Sensitivity: 0.525, Specificity: 0.925, Precision: 0.516, Accuracy: 0.907 Where Sensitivity=TP/(TP+FN), Specificity=TN/(TN+FP), Precision=TP/(TP+FP), Accuracy=(TP+TN)/(TP+TN+FP+FN) I'm...
From: Stats Stack Exchange | By: James Brown | Monday, July 28, 2014
smile
frown
Deas, One assumption for Spearman's correlation is that data have to be monotonic .. I tried to do scatterplot in spss, but I get the following graph that I couldn't decide if it is monotonic .. What is the solution please ? Note: my data are ordinal...
From: Stats Stack Exchange | By: Hadi | Tuesday, July 29, 2014
smile
frown
I have run a stepwise regression on R, however when I do a summary of the final model some of my factors are not significant. Why have these factors not been removed? Should I remove these from my model? The VIF of these factors are all under 5. thank...
From: Stats Stack Exchange | By: Blair Outhwaite | Tuesday, July 29, 2014
smile
frown
Somebody could explain me why the estimated coefficients of a multiple regression through GLS seem not to pass through the majority of observations? Here is a example: require(nlme) set.seed(1) df=data.frame(y=rank(rnorm(50,0)),x1=rank(y+rnorm(50,0))/2,x2=rank(y*2+rnorm(50,0)),x3=rank(rnorm(50,0)))...
From: Stats Stack Exchange | By: Agus camacho | Tuesday, July 29, 2014
smile
frown
I have a data set which is the exchange rate of USD and GBP from 2007_01_07 to 2014_06_06. I used the derivatives of a probability density to estimated the optimal bandwidth h=0.04071872, I am now thinking to use 'ksmooth' in R to add a fitted line but...
From: Stats Stack Exchange | By: user52897 | Monday, July 28, 2014
smile
frown
For an ARIMA (0,0,1) model, I understand that R follows the equation: xt = mu + e(t) + theta*e(t-1) (Please correct me if I am wrong) I assume e(t-1) is same as the residual of the last observation. But how is e(t) calculated? For example, here are the...
From: Stats Stack Exchange | By: nancy | Monday, July 28, 2014
smile
frown
I don't understand this concept well and need help. I was choosing whether to use a linear model or apply a non-linear transformation in my model formula. To do a diagnostic, I quickly plotted my data: plotalldaily <- ggplot(amsd, aes(ImpressionsA,...
From: Stats Stack Exchange | By: vagabond | Monday, July 28, 2014
smile
frown
I want to know how variables affect travel mode for different trip purposes(leisure trips,work trips and shopping trips) in a specified region.I have 450 respondents in three different neighborhoods. my independent variable is mode of travel(in 5 categories...
From: Stats Stack Exchange | By: user52978 | Monday, July 28, 2014
smile
frown
I'm trying to formalize the probability density function for a rather simple process, but I'm having difficulty writing it precisely. Specifically, consider simulating a 1-D Gaussian random walk starting from X_0 until some stopping condition (which...
From: Stats Stack Exchange | By: anonymous_4322 | Monday, July 28, 2014
smile
frown
Hello I am running a Regression Tree experiment. I am new to Regression Trees, and I am using Mean Squared error to test my tree. I am confused because I am getting a large Mean Squared Error but I am not sure how to evaluate if it is too high. Should...
From: Stats Stack Exchange | By: user2475523 | Monday, July 28, 2014
smile
frown
I've seen the homoskedasticity assumption stated as the constant conditional variance of the error (i.e., Var(u|x)=constant). I was wondering if I can also state the homoeskedasticity assumption as constant variance across values of a same dependent...
From: Stats Stack Exchange | By: StatsScared | Monday, July 28, 2014
smile
frown
I'm aware that you have a post on almost exactly the same topic, but after many trials I failed to run (or plot?) it successfully. When running time <- as.POSIXct(rownames(slp)), I get the error below: Error in as.POSIXlt.character(x, tz, ...) : character...
From: Stats Stack Exchange | By: andres | Monday, July 28, 2014
smile
frown
When making predictions with a random forest model, is it possible to associate the probability of a test case belonging to a class? For example, for a given test case, can we say that the probability of that test case belonging to the setosa class is...
From: Stats Stack Exchange | By: learner | Monday, July 28, 2014
smile
frown
I have heard of the "false discovery rate curve", but have never seen an example. If I recall correctly from a conversation with a colleague, the y-axis in the FDR curve measures the FDR itself, defined as $FDR = \frac{FP}{TP+FP}$ (i.e. $1 - \text{Precision}$),...
From: Stats Stack Exchange | By: user023472 | Monday, July 28, 2014
smile
frown
I'm using anova to test for differences between different values of the same factor for a mixed effects model which I produced. My model is: m2 <- lmer (ovsize ~ d.sheetratio + (1|nid), REML=FALSE) Following this I subsetted the data so that each...
From: Stats Stack Exchange | By: Dom Burns | Monday, July 28, 2014
smile
frown
I have already used the logit transform on my outcome variables (which are displayed in percentages). However, this obviously gives me -INF values and since my data includes a lot of zeros in some instances, this makes it hard to analyse. I have now...
From: Stats Stack Exchange | By: user3237820 | Monday, July 28, 2014
smile
frown
I am concerned with simulating data for a linear regression model. I need to control the means, variances, and correlations (covariances) between the predictors and the criterion variable. In addition, I need to be able to vary the explained variances...
From: Stats Stack Exchange | By: tomka | Monday, July 28, 2014
smile
frown
Statisticians are often interested on testing point null hypotesis, such as: $$H_0: \mu =0 \,\,\, vs. \,\,\, H_1:\mu\neq0.$$ As Jeffreys himself said, this actually corresponds to some presumption that $\mu$ is fairly small. My question is: has the alternative...
From: Stats Stack Exchange | By: Carlsberg | Monday, July 28, 2014
smile
frown
I would like to know if there is a test for the difference of to means m1 and m2 (continuous variables) if I have only information for mean, 2.5%- and 97.5%-quantiles. For example: m1 = 10.5 q1_025= 8.3 q1_975= 12.5 m2 = 15.5 q2_025= 12.7 q2_975= 17.3...
From: Stats Stack Exchange | By: giordano | Monday, July 28, 2014
smile
frown
I have a model where the Y is very skewed and I convert it to log and run a log lin model. But, I have doubts about the way to measure the error, because in the original variables the error would be much bigger than in log variables. Are there any other...
From: Stats Stack Exchange | By: Gaby P | Monday, July 28, 2014
smile
frown
I am having some problems interpreting the odds. I run a logistic regression for an out come 'Yes' or 'no'. My reference category is 'No'. I have 2 variables and this are the log(odds) and the odds: Variable A -> It is a integer with values 2 to 80....
From: Stats Stack Exchange | By: Sira RM | Monday, July 28, 2014
smile
frown
I need to be neat in measuring the success rate of a treatment. It is anyway pretty high. But as it is all about ecology, multpliying experiments is difficult. I have treated $N = 20$ individuals, $18$ succeded. This is a $\tau = .9$ success rate. I...
From: Stats Stack Exchange | By: Iago-lito | Sunday, July 27, 2014
smile
frown
Let $X_1, ..., X_n$ be iid with a distribution F. Let $\theta$ be the median of F. What is the value of $E(X_i \cdot I(X_j>\theta))$? If $i\neq j$, then $E(X_i \cdot I(X_j>\theta))= 1/2 \cdot \mu$, right? When $i=j$, I don't seem to find it......
From: Stats Stack Exchange | By: An old man in the sea. | Monday, July 28, 2014
smile
frown
I have a large panel (5000+ subjects, 4 variables over 182 periods), and I've identified particular Granger-causal relationship in a large subset of those subjects (30% or so). I would like to somehow characterize the subjects that exhibit the Granger...
From: Stats Stack Exchange | By: ssdecontrol | Sunday, July 27, 2014
smile
frown
I am trying to conduct an EFA with a sample size of 150 respondents. I would also like to use cross-validation but my professor says that the sample is not big enough for that. Is that true?
From: Stats Stack Exchange | By: user52896 | Monday, July 28, 2014
smile
frown
I have a dataset of about 300 people. 200 test positive for a disease, and the rest test negative. I have data on different test scores and imaging results for these 300 participants. So my dataset would look something like this status test1 test2 test3...
From: Stats Stack Exchange | By: Adrian | Monday, July 28, 2014
smile
frown
Am looking into statics to help support a project I am undertaking. The project scope concerns intelligent replenishment / refill of vending machines. During an onsite service, a technician must make decisions regarding machine refill to optimise sales...
From: Stats Stack Exchange | By: Barry_M | Monday, July 28, 2014
smile
frown
Frequently I see artificial neural networks compared by their "classification error rates" or "error rates", particularly for multi-class problems like CIFAR-10. What does this error rate actually refer to? Hamming loss? How is it calculated?
From: Stats Stack Exchange | By: gavinmh | Monday, July 28, 2014
smile
frown
I have a mixture model and the components are further parameterized by ~200 variables. Originally I use EM-algorithm to get a MLE estimation of the parameters. The algorithm works quite well and converges quickly. However, when I scale up the problem...
From: Stats Stack Exchange | By: wonghang | Monday, July 28, 2014
smile
frown
For a given sample set $S$ with $N$ individual samples $x_i$, I can easily find the average distance from the maximum by doing something like this: $\sigma_{max_N}:=\sqrt{\frac{1}{N}\sum\limits_{i=1}^N {\left(x_i - \max{\left(S\right)}\right)}^2}$ Trying...
From: Stats Stack Exchange | By: kram1032 | Sunday, July 27, 2014
smile
frown
I have two normal distributions, and I want to test whether they have the same standard deviation, I really don't care about the mean. My idea is: de-mean both of them and then use Kolmogorov-Smirnov to test if the distributions are different, if they...
From: Stats Stack Exchange | By: Akavall | Sunday, July 27, 2014
smile
frown
I know that one if one is trying to perform linear regression, multicollinearity can be an issue because it can "lead to unreliable and unstable estimates of regression coefficients." Suppose for a second that I have the following correlation matrix...
From: Stats Stack Exchange | By: user2801122 | Sunday, July 27, 2014
smile
frown
I orginially posted this over at AskMetafilter, and a commenter suggested I ask it here. I work for a dietary supplement company that also makes skin care products, and some of those skin care products are tested clinically. Now they are talking about...
From: Stats Stack Exchange | By: Methylviolet | Sunday, July 27, 2014
smile
frown
The chisq.test function in R includes a y = argument, which is to NULL by default. The help page doesn't explain what this argument does, and playing round with numbers doesn't give any clues, for example all these give exactly same results: chisq.test(x=c(20,...
From: Stats Stack Exchange | By: luciano | Sunday, July 27, 2014
smile
frown
I'd like to fit a longitudinal model for where multiple subjects experience binary outcomes over time. To accomplish that, I'd like to use an additive random effect for each subject and an autoregressive error process to model the temporal stability...
From: Stats Stack Exchange | By: Ben Ogorek | Sunday, July 27, 2014
smile
frown
I have a dataset with page view data for about 500,000 users, divided into two groups. Each user can visit up to 5 pages, each as many or as few times as they want. So for each user, I have the distribution of number of visits to each page. I would like...
From: Stats Stack Exchange | By: bsg | Sunday, July 27, 2014
smile
frown
I can't seem to find much info on the following: I have a dataset D at time t which I use to fit an ARIMA model. I forecast the value of the time series at time t+1. Now, when I'm in t+1, I would like to predict the value of my time series at t+2 using...
From: Stats Stack Exchange | By: Daniel | Sunday, July 27, 2014
smile
frown