I have on question regarding standardized coefficients (beta) in linear models. I have already asked one question here. From the answers I assume that I should use R's scale() function on the dependent variable as well as on all independent variables...

I need to describe what the difference between two groups (patients and normal controls) consists of in terms of latent variables that I can describe within each group. For instance, given this PCA variable map: how do I compare this to with the same...

I know that $Var(\theta)\geq 1/I(\theta)$ where $I(\theta)$ is Fisher information. Let take an example of natural exponential family with density $f(x)=\lambda\exp(-\lambda x)$. In this case we have: $-E[\frac{\partial^2 \log(f)}{\partial\lambda^2}]=\frac{1}{\lambda^2}$...

I'm trying to understand the theory of estimators. As I understand it now, if you have an r.v. $X$ and take $n$ i.i.d. samples then an estimator for $E[X^{2}]$ would be $\overline{X^{2}}$ since $E[\overline{X^{2}}] = E[X^{2}]$ (probably only true for...

I am working on research paper for diagnosis of cancer. List of Known prognostic factors Age of patient Size of tumor Grade of tumor Lymphnode involvement and list of Unknown factors which are to be assessed with prognosis by correlating with known prognostic...

I want to compare two profile likelihood curves and determine if they are significantly different from one another. For example are the following curves significantly different from one another: I realize I can find a 95% confidence interval for a value...

Attempting to understanding a statistical concept which I'm positive is basic stats, but that I currently don't understand. Say that there's a one in ten million likelihood of an outcome happening during an event, that happens a given count of times,...

In one use of k-fold cross-validation for evaluating classifiers, one trains k models, each on n(k-1)/k examples, and tests each on n/k examples. The average accuracy on those k test sets of size n/k is used as an estimate of the accuracy of a classifier...

I want to find the dominant one among two theories which give predictions about the relationships between one dependent variable( Y ) and eight independent variables ( x1 , x2 ... x8 ). The predictions of the two theories are mutually exclusive , i.e....

Suppose I have normally distributed data. For each element of the data I want to check how many SDs it is away from the mean. There might be an outlier in the data (likely only one, but might be also two or three) or not, but this outlier is basically...

Good morning everyone, I have a question in regards to Statistically Significant Sample Sizes to a population. I am working with data in excel and wanting to put the formula for this into the workbook as well but have run into a roadblock of sorts. I...

Let's say I am trying to figure out whether two classes can be differentiated. My methods may not be perfect, but I would like to know whether my features "mean" anything that may possibly be added to reinforce another system (for instance). I know that...

Could you please answer some of my questiona regarding SVM-RFE (svm with recursive feature elimination). I am using SVM-RFE with linear kernel for the binary classification and feature selection problem. All feature are rescale with mean 0 and standard...

I know it might be trivial but does the density of daily values impact the forecast accuracy? For example, if a call center receives less than 50 calls for weekdays and less than 10 calls for weekend, is the forecast accuracy diminished compared with...

Suppose I have 1000 draws each of two random variables X and Y. If I wanted to sample the sum of these variables, I would simply calculate 1000 samples, i.e. $$ S_{i}=X_{i}+Y_{i}, i=1,2,…,1000 $$ And that would give me draws from the pdf of the sum...

I have a dataset with approximately 4000 rows and 150 columns. I want to predict the values of a single column (= target). The data is on cities (demography, social, economic, ... indicators). A lot of these are highly correlated, so I want to do a PCA...

I'm not well versed in statistics so I'm not sure if my question is worded exactly correctly but basically here's the problem I'm trying to solve: imagine you have two equal sized arrays of size n. Each array is filled with random numbers from 0 to 1....

Suppose n people go to a fancy restaurant. Each person is wearing a hat and checks his/her hat at the door as he/she arrives. The hat-check attendant gets tipsy throughout the evening, forgetting which hat belongs to whom, and returns a random hat to...

I am trying to see differences in the feeding-rate of one bird species between big forest patches and small ones. I have several forest patches of both sizes, and three years of study. Some individuals have been recorded (to assess the feeding-rate)...

I want to find the dominant among two theories. I have used a unique "t - test" as shown in the file below : https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxzYW5kaXBzaW5oYW5vd3xneDo1ZTU5ZWVkNWRmOTgzZDMx Kindly comment about...

I feel this is a simple problem, yet I cannot seem to solve it. Any help is greatly appreciated.

Let say we have a dataset, $\mathbf{X}$ of $m$ instances, and $n$ features, and a target scalar variable $\mathbf{y}$ ($m$ instances). Now I want to do a regression so, I try to fit a hyperplane $ y = \mathbf{x} .\mathbf{w}$ + c. Note : $\mathbf{w}$...

I am trying to do Logistic Regression in R. My data set contains more than 50 variables. Some of them are factor (qualitative variable) and others are independent variable(quantitive ). I would like to get the significance of the variables from their...

Imagine a hypothetical scenario in which a ball is thrown along a straight line. During flight, the position is continually sampled; however, at some distance, the sampling fails and only noise is detected. This distance is unknown and variable. One...

I am trying understand how to correctly build a mixed-effects logistic regression model in R. I believe my model is pretty simple and straight forward but I'm lacking in experience and uncertain I'm doing it correctly. Not being a statistician and struggling...

I need to estimate a panel model. I have run the "normal" fixed effects model using plm in R and also wfe. I also wanted to try pggls considering its tolerance of heteroskedasticity and autocorrelation. However, the results I am getting with pggls are...

In a mixed effects model the recommendation is to use a fixed effect to estimate a parameter if all possible levels are included (e.g., both males and females). It is further recommended to use a random effect to account for a variable if the levels...

I am aware of some nice examples of pairs of correlated random variables which are marginally normal but not jointly normal. See this answer by Dilip Sarwate, and this one by Cardinal. I am also aware of an example of two normal random variables whose...

Say I take 500 bootstraps of a population and calculate 95% confidence intervals (CIs) for each sample. I would expect that 95% of the bootstrap sample CIs to contain the true population mean. However, I'm then asked what the probability is of 100% of...

Given input vector $x$, let the maximum of $x$ occur at index $i$ in the input vector. I am trying to quantify the peakedness of this maximum and do that I have thought of determining following quantities. I am interested in determining numbers $l$ and...

Above are three plots of the Linear model I am trying to analyze. The first one is a basic plot of the linear data: LinearModel = read.csv(file= "C:/Users/Nikhil/Documents/LinearModelCase2.csv", header=TRUE, sep=",") plot(LinearModel$X,LinearModel$LinearModel)...

I have two series of trading profit results. I use the geometric mean to calculate the average in percent (CAGR). I would like to divide it by the standard deviation by combining the two series, but I´m having trouble calculating the combined standard...

The 21,000 estimate for Oct. was certainly not via quad or power regression. I wonder how they got that number? http://www.telegraph.co.uk/news/worldnews/ebola/11121045/Graphic-how-Ebola-cases-have-grown-since-March.html...

I've seen a couple of seemingly unrelated notes about working with large volumes of data and it struck me that I couldn't find much content on problems specific to statistical analysis of Big Data. Is there a compiled list somewhere (or book, article)...

I have a database with several continuous variables measured in two times. I searched for a change in time in my dependent variables in this way: difJS<-lmer(JS~Time+(Time|id)+(Time|occupation),dat,REML=T) If I detect a significant fixed effect of...

As far as I know, both Gaussian mixtures as well as Gaussian processes can be used for regression. My question is: what is better and why? The answers might contain theoretic insights, practical experience or reference to further resources....

I'm a psychology PhD student doing analysis on a relatively large set of data, obtained via online surveys. The purpose of the study is largely to determine normative data for a population of adults, on a number of psychology scales. However, I'm not...

(R studio) Hi, I'm running LDA on a dataset with 250,000 observations, 2 classes and 30 variables. My goal is to create a classification model using the LDA function. After loading my variables I receive a warning that my X's are collinear. (should this...

I am interested in a model like: $y_{i} = \sum_{k\in K}{\beta_{k} z_{k}}$, with $z_{k} \tilde{} N(\mu_{k}, \sigma_{k})$. where $\beta \equiv(\beta_{k})_{k\in K}$ is not known, but all else is. I assume that there is a prior $\pi (\beta)$ of an arbitrary...

I have fitted a non linear assymptotic equation to a set of data and my interest is in getting the standard deviations of the fitted parameters. Is this possible in nls?

I am trying to build a, regressive, predictive model for a target time-series that is heavily skewed. You could think of the target as being like earthquake magnitudes or heavy rainfall. Most of the time we sit in the relatively boring head of the distribution,...

I am interested in finding the median absolute distance to quantiles. So, for $Q_\alpha$ the $0 \le \alpha \le 1$ quantile, I would like to find $Q_\gamma^*$ such that $Q_\gamma^*$ satisfies \begin{equation} \underset{\gamma}{median}|Q_\alpha-Q_\gamma|=...

I have several questions concerning analysis of data, especially when there are replications and/or pseudoreplications. First, I read an example in « pseudoreplication is a pseudoproblem » where we wish to determine which of two urns contains the greater...

I have 10 items in my store and I am running 3 promotions. To maximize my sell and profit, I want to decide price for the items on daily or weekly basis. No of items=10 Promotions running-3 Other factors can influence- competitor price of that area (...

I want to use some count data to train a classifier. The count data range from 0 to 400 something. There are a bunch of smaller counts (0's and 1's). I wonder what would be a good way to categorize it into 4 groups.Thanks in advance!...

A survey found that the average communication of male is 35 and the average communication of female is 65. The data obtained from 100 students from 2 samples and that the standard deviation were _ and _ respectively at a= 0.5. Can it be concluded that...

i want to analyze the correlation between overall attitudes with purchase intention. both is continuous variables. Overall attitude using 7-liker scale while purchase intention 5 liker scale. So, can i do correlation between this even they have different...

This question is based on Honglak Lee's paper "Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations". In chapter "4.3 Handwritten digit classification", it is written: We trained 40 first layer bases from...

I am trying to conduct an A/B(/C) test to compare the performance of 3 different website pages but I'm facing issues regarding zero inflated data. I have data for each page regarding 1.) the number of people who clicked on the page 2.) performed the...

I attempted to build a deep network (e.g. deep autoencoder) for some object classification, my result showed that the deep networks is worst than shallow network. However, from what I have read from lecture, deep network perform well. This raise me a...

