Serendeputy - your personal news assistant.

Welcome to Serendeputy!

Serendeputy is your personal news assistant.

Your deputy:
- learns what you like and don't like,
- lovingly compiles a list of news and blogs for you.

You can help your deputy learn by searching, clicking links and pressing the little smiley faces.
How it works.

What to do:
  1. Click links to teach your deputy
  2. Click smileys and frownies
  3. Find favorite topics and sources
  4. See how much better your deputy is getting at finding you good stuff.
  5. Sign in for free to save your profile, or please tell me why you won't.
I'm working off my first independent project for some pattern classification. I'm utilizing some datasets from UCI machine learning, but am not sure on how to start with data normalization. The data isn't that large (feature vector around 15-20 dimensions),...
From: Stats Stack Exchange | By: mcjoness | Saturday, April 19, 2014
smile
frown
Assuming a dataset with the following attributes: Date (truncated), f1 ... fn, #impressions, #goals. The problem: I want to grow $n$ trees that would find the optimal selection of features and their ranges in each, and that maximize the goal rate (goals...
From: Stats Stack Exchange | By: Haggai | Sunday, April 20, 2014
smile
frown
I need to correlate employee engagement (gathered data using the 9 item UWES questionnaire) and organizational commitment (gathered data using the 18 item Organizational Commitment Scale). The both of them can be divided into different components; UWES...
From: Stats Stack Exchange | By: naugri | Sunday, April 20, 2014
smile
frown
I am looking to predict groups of items that someone will purchase... i.e., I have multiple, colinear dependent variables. Rather than building 7 or so independent models to predict the probability of someone buying each of the 7 items, and then combining...
From: Stats Stack Exchange | By: blast00 | Sunday, April 20, 2014
smile
frown
I am working on a project, and I am totally new to statistics. I have sales data for last two years at week level, along with other variables like temperature, holiday (TRUE/FALSE), where holiday are nominal variables. I have to do forecasting for the...
From: Stats Stack Exchange | By: Arushi | Saturday, April 19, 2014
smile
frown
I am working to analyze poverty rate using census data. I have a huge dataset. I want to extract the likelihood from this dataset in order to create patterns for energy consumption. Let's say this: in a house where we have 3 members with average age...
From: Stats Stack Exchange | By: user3378649 | Sunday, April 20, 2014
smile
frown
I am trying to do experiments on classifying longitudinal systems. We're working on classifying the location where we sell items most. I don't have a lot of experience in statistics and modeling data beyond a high school statistics course so I'm kinda...
From: Stats Stack Exchange | By: user3378649 | Sunday, April 20, 2014
smile
frown
$X$ and $Y$ are uniformly distributed on the unit disk. Thus, $f_{X,Y}(x,y) = \begin{cases} \frac{1}{\pi}, & \text{if} ~ x^2+y^2 \leq 1,\\ 0, &\text{otherwise.}\end{cases}$ If $Z=X+Y$, find the pdf of $Z$.
From: Stats Stack Exchange | By: Someone | Sunday, April 20, 2014
smile
frown
I have a question about the prediction of volatility and returns of a time series. Basically it is a question about prediction in the fGarchpackage. The following code is from the book Analysis of financial time seriesand it is an example of AR/GARCH...
From: Stats Stack Exchange | By: user8 | Sunday, April 20, 2014
smile
frown
I was doing some self study and came across the following formulae for estimating standard errors: Formulae 1: Formulae 2: I understand that these two can all be used when the Population Standard Deviation is unknown. But I don't really understand why...
From: Stats Stack Exchange | By: user1275515 | Sunday, April 20, 2014
smile
frown
I am looking for an introductory to intermediate level book on Generalized Linear Models. Ideally, in addition to the theory behind the models, I would want it to include applications and examples in R or another programming language - I hear SAS is...
From: Stats Stack Exchange | By: JohnK | Saturday, April 19, 2014
smile
frown
Can you please provide One advantage of "k-Means" compared to "Hierarchical Clustering" One advantage of "Hierarchical Clustering" compared to "k-Means" Thanks in advance !!
From: Stats Stack Exchange | By: YevgenyM | Sunday, April 20, 2014
smile
frown
I am here to seek opinion on how should i represent my data that i have collected. I am to create a presentation focusing on environment. I was told that a simple bar chart and a line graph is a bad visualization I have picked the following data set...
From: Stats Stack Exchange | By: user2691544 | Sunday, April 20, 2014
smile
frown
I was wondering whether you could help me on this question that. I am not sure whether i am doing it correctly so any guidance from anyone would be most appreciated. I will post the full question so please do bear with me. Let X be a random varaible...
From: Stats Stack Exchange | By: Ingrid | Sunday, April 20, 2014
smile
frown
I am working on a project where I have to do multi-label text classification. I want to understand that whether my approach is correct or I am missing something. I am using R to do it. Clean the text Create a corpus. While creating the corpus I am removing...
From: Stats Stack Exchange | By: tanay | Sunday, April 20, 2014
smile
frown
I have $n$ dice with $m$ sides. The $i^{th}$ dice will show value $0 \leq x_i \leq m-1$ with probability $0 \leq D_i(x_i) \leq 1$. What is the probability that the sum of the dice equals $\alpha$ Is there some approximation for $P(\alpha)$...
From: Stats Stack Exchange | By: IndustryMinion | Sunday, April 20, 2014
smile
frown
I am using LibSVM (3.18) as an implementation of SVM. But every time when I'm predicting the result, it's giving zero. I am following these instructions: I have CSV file (+50K lines), Most of data in column (target) is zeros, the other values are between...
From: Stats Stack Exchange | By: user3378649 | Sunday, April 20, 2014
smile
frown
I am looking for a method or package in R that can remove heteroscedasticity from time series. Specifically, I have a number of time series $$Z = (Z_1, \ldots, Z_p)$$ where $Z_j = \{(Z_1)_t\}_{t=1}^{T}$ to which I want to fit a VAR model. Each time series...
From: Stats Stack Exchange | By: Stijn | Sunday, April 20, 2014
smile
frown
I have collected data from 88 human subjects. There is two subject groups, A (test) and B (control). Number of subjects in each group is 44. The subjects are paired between groups. There is two measurements from each subject, one before, and one after...
From: Stats Stack Exchange | By: Guest | Sunday, April 20, 2014
smile
frown
I have a predictor with responses from 140 people in group A and 60 in group B. My mediator only uses responses from group A, and my outcome variable uses responses from Groups A, B, C $(n=31)$. What type of analysis do I need to run? What software would...
From: Stats Stack Exchange | By: Uriah07 | Sunday, April 20, 2014
smile
frown
I collected some data on a species of goose called Brent Goose over the winter. A csv file of the data can be downloaded from Dropbox or imported straight into R with this code: library(repmis) goose_behaviour <- repmis::source_DropboxData("goose_behaviour.csv",...
From: Stats Stack Exchange | By: luciano | Sunday, April 20, 2014
smile
frown
According to my understanding, when we has unknown population mean and variance, we has to estimate its population variance through sample variance and use t distribution to estimate the potential range of population mean using estimated population variance...
From: Stats Stack Exchange | By: user3420399 | Sunday, April 20, 2014
smile
frown
Following Hofert et al.'s paper "Likelihood inference for Archimedean copulas in high dimensions under known margins," (http://dl.acm.org/citation.cfm?id=2263953) I wrote a script in Matlab to produce estimates of Archimidean copulas in high dimensions....
From: Stats Stack Exchange | By: Sonntag | Sunday, April 20, 2014
smile
frown
I have the following hourly time series data and would like to fit a best fit line to it: There seems to be a periodicity on a daily basis and a weekly basis. By this, I mean there are patterns that repeat every day (e.g. peaks during 7PM) and patterns...
From: Stats Stack Exchange | By: mchangun | Sunday, April 20, 2014
smile
frown
What is the difference between compositional data model using additive log-ratio (alr) transformation and aggregated multinomial logit model?
From: Stats Stack Exchange | By: Surveyor | Sunday, April 20, 2014
smile
frown
I have a question about Arellano-Bond model in Stata (xtabond/xtabond2). The slopes I get, are they for levels or differences of values? My model to be estimated has a form of (D is first difference): DY=a+DX1+DX2+.... So should I use already differentiated...
From: Stats Stack Exchange | By: Risto | Sunday, April 20, 2014
smile
frown
I've been reading the Wikipedia page for Levene's test, and it cites the degrees of freedom as (k - 1, N - k), where k is the number of different groups to which the sampled cases belong, and N is the total number of cases in all groups. However, it...
From: Stats Stack Exchange | By: Sasha | Sunday, April 20, 2014
smile
frown
What is the difference between finite and infinite variance ? My stats knowledge is rather basic; Wikipedia / Google wasn't much help here.
From: Stats Stack Exchange | By: AfterWorkGuinness | Saturday, April 19, 2014
smile
frown
I'm working on a review paper and need to collect the means and standard deviations of a given measure (such as a measure of depression) from papers of interest. However, some authors report means and standard deviations for each item on the measure,...
From: Stats Stack Exchange | By: user30295 | Saturday, April 19, 2014
smile
frown
I read an article that says the dependent variables in a regression model must be normally distributed. The way i understand it, is that the observations for the regression model must then be normally distributed. Or in other words if i choose sample...
From: Stats Stack Exchange | By: Jason Samuels | Saturday, April 19, 2014
smile
frown
From this video by Andrew Ng around 5:00 How are $\delta_3$ and $\delta_2$ derived? In fact, what does $\delta_3$ even mean? $\delta_4$ is got by comparing to y, no such comparison is possible for the output of a hidden layer, right?...
From: Stats Stack Exchange | By: qed | Saturday, April 19, 2014
smile
frown
When comparing feature-based classification techniques what characteristics about the different processes should be considered? I'm comparing different classification techniques to try to figure out what should be considered when selecting a classification...
From: Stats Stack Exchange | By: HardcoreBro | Saturday, April 19, 2014
smile
frown
Let $\mathcal{H}\colon\mathbf{w}\cdot\mathbf{x}+b=0$ be a separating hyperplane, which some binary linear classifier results in. Let $\mathbf{x}_t$ be an unseen, new sample that appears and needs to be classified. We can predict the truth label of $\mathbf{x}_t$...
From: Stats Stack Exchange | By: nullgeppetto | Saturday, April 19, 2014
smile
frown
I hope I am asking this in a way that makes sense. I am comparing 8 means and want to set up a planned comparisons, rather than having my Bonferroni adjustment become overly-conservative in a post-hoc. For my groups I need to make a total of 16 comparisons,...
From: Stats Stack Exchange | By: Phillip | Saturday, April 19, 2014
smile
frown
I estimated the mean and variance of two latent variables through two groups of data. I can't use the data to do hypothesis testing, because I am interested in the latent variable. Is there a way to test the whether the two latent variables are significantly...
From: Stats Stack Exchange | By: user258682 | Saturday, April 19, 2014
smile
frown
For the following problem: $\text{min:}\ f(x)\\ s.t. \ g(x)\leq t$ Is the above problem equalivant to the following problem? $\text{min:}\ f(x) + \lambda g(x) \\ s.t. \ \lambda\geq0$ where $t$ and $\lambda$ are variables. It seems equalivant, because...
From: Stats Stack Exchange | By: user137273 | Saturday, April 19, 2014
smile
frown
Assume a model like this, basically explaining stock market returns with a bunch of stuff: stockReturn(t) ~ bondReturn(t) + moneyMarketReturn(t) + inflation(t) + somethingElse(t) Does using inflation as an independent variable bring any significant problems?...
From: Stats Stack Exchange | By: Roope | Saturday, April 19, 2014
smile
frown
In the questionnaire I asked respondents from two countries how many job offers they received from 5 sources in the last 6 months. There are 5 questions - one for each source. It is an open question, without a scale as the two countries strongly differ...
From: Stats Stack Exchange | By: Anna | Saturday, April 19, 2014
smile
frown
I ran the same SEM model in sem and lavaan. I got the same parameters and - generally - very close test values, with the exception of AIC and BIC which were immensely different between the two packages. The following is the resulting AIC and BIC from...
From: Stats Stack Exchange | By: Deuterium | Saturday, April 19, 2014
smile
frown
Suppose I have a big online company, and many of my customers churned (i.e. they were paying, and then stopped). My goal is to understand why each of them churned. First I identify the complete set of reasons for churning, $H_1,\ldots,H_n$. E.g. "the...
From: Stats Stack Exchange | By: Diego de Estrada | Saturday, April 19, 2014
smile
frown
I have a variable whose value I can only measure at the end of life of a product (which is not fixed). The variable's value, continuous and between 0 and 100, may be related to its age at that time. My data consists of the various ages of a set of products...
From: Stats Stack Exchange | By: Sjoerd C. de Vries | Saturday, April 19, 2014
smile
frown
I am talking about a situation in which I have several continuous predictor variables predicting a continuous outcome. One of the predictors has a very non-normal distribution and has some wild outliers. I intend the generalize the regression model to...
From: Stats Stack Exchange | By: Sasha | Saturday, April 19, 2014
smile
frown
Good evening all, I am doing a self-study exercise, but have been quizzed by a part of the question on finding percentage points of a normal distribution. I fully understand the first part of the question and was able to find the answer, which corresponds...
From: Stats Stack Exchange | By: user1275515 | Saturday, April 19, 2014
smile
frown
How do you interpret the results of a multivariate probit regression? Is it interpreted the same way as OLS?
From: Stats Stack Exchange | By: user44067 | Saturday, April 19, 2014
smile
frown
I have a set of data with features of movies and features of users and a third matrix with ratings of user for each movie. I have to build a recommendation system for new users. Can you help me with the problem? I am not sure how to go about it. What...
From: Stats Stack Exchange | By: Sejal Shinde | Saturday, April 19, 2014
smile
frown
I am looking for a python library or module function that allows me to estimate probability densities p(x) using the Parzen-window approach with a Gaussian kernel (with variable sigma, or 'window width') I managed to implement the Parzen-technique using...
From: Stats Stack Exchange | By: Sebastian Raschka | Saturday, April 19, 2014
smile
frown
I m a PhD student in New Zealand. I need to determine the impact of lameness in milk yield of cows. I measured milk yield daily as well as I recorded the cows that were observed lame in any one day . I recorded data daily for 325 consecutive days. it...
From: Stats Stack Exchange | By: carolina diaz | Saturday, April 19, 2014
smile
frown
Please forgive this silly answer, I'm fairly new to statistics. Consider this R code: a = c(1,2,3,4,3,2,3,4,5,5,6,5,4,3,4,5,6,7,8,7,6,6,5,6,7,10,9) b = c(10,9,7,6,5,6,7,8,4,6,6,5,4,5,6,5,4,5,6,7,5,4,4,5,4,3,2) mean((a - mean(a))*(b-mean(b))) [1] -2.42524...
From: Stats Stack Exchange | By: kamula | Saturday, April 19, 2014
smile
frown
Negative Binomial distribution can be parameterized using mean, $\mu$, and overdispersion $\psi$, so that the variance of NB is $\mu + \frac{\mu^2}{\psi}$. We know there is no analytical solution for estimating $\psi$. I understand the variance of NB...
From: Stats Stack Exchange | By: user258682 | Friday, April 18, 2014
smile
frown
my problem is that I want to implement a Parzen-window estimation for a Gaussian Kernel, but I have a problem understanding how I can check whether a point (2D or 3D) lies within a Gaussian sphere. Given a set of sample points, I want to check how many...
From: Stats Stack Exchange | By: Sebastian Raschka | Friday, April 18, 2014
smile
frown