I work with bioinformatics and I'm completely new to machine learning, so I don't know if this idea is viable. There are several predictors that try to predict the molecular nature of a set of a given genes. They use different ML methods (SVM, logistic...

The full question is: Assume we fit the following quadratic function: $f(x) = w_0+w_1x+w_2(x^2)$ to the dataset shown (blue circles). The fitted function is shown by the green curve in the picture below. Out of the 3 parameters of the fitted function...

Running the following code in R: library(datasets) boxplot(ChickWeight$weight, ChickWeight$Diet) It produces a boxplot looking like this: Only two Diet groups are displayed in the x-axis. Using the formula structure and executing the call: boxplot(weight...

I am so confused with using cv to evaluate model performance, so the setting is like this: I have 1000 data, and I split that into training and testing set, and then I use that training set to further perform k-fold cv, and I will use the model that...

I had run artificial data using y=a+ax1+ax2+e. x1 is generated using Normal Distribution and e generated using Cauchy and Normal Distribution. The model i want to compare is ANN and SVM. When using Cauchy as disturbance in artificial data. The model...

I've got this model: model <- lm (time~radius_mean+texture_mean+perimeter_mean+area_mean +smoothness_mean+compactness_mean+concavity_mean +concave_points_mean+symmetry_mean+fractal_dimension_mean+radius_se +texture_se+perimeter_se+area_se+smoothness_se+compactness_se...

I wish to plot error bars on a bar chart that represents the answers of 200 respondents, randomly selected from a population of 50,000,000, to a simple multiple-choice question. Here are the proportions of answers: Option A: 83% Option B: 4% Option C:...

i want to know the interpretation of this coefficient table: basically i am seeing the impact of firm size and audit experience on audit pressure where small firm, medium firm and large firm are the dummy variables of audit firm size and the reference...

Call: pgls(formula = log10(numericlifespan) ~ numerictrophic * log10(numericsize), data = mycdat, lambda = "ML") Residuals: Min 1Q Median 3Q Max -0.43856 -0.18641 -0.00481 0.16407 0.89648 Branch length transformations: kappa [Fix] : 1.000 lambda [ ML]...

Suppose one don't know exactly which individual is moving from one state to the other but an only observe aggregate amount of individuals in each state. How to find out the Hidden Markov Model (let's make it as simple as possible)? Can someone propose...

I'm looking to estimate two spatial models: $y = \rho Wy + X\beta + WX \tau + \epsilon \\$ and $y = X\beta + WX \tau + \epsilon \\$ , where $\epsilon = \lambda W\epsilon + u$. This is possible to estimate in R using the maximum likelihood, but my data...

I am using the following algorithms:
1.) logistic regression
2.) Decision Trees
3.) SVM
Besides getting more data and messing with parameters, how can I aim to improve performance?

I am going to build a GLMM with R (lme4). And I found that several papers summarised the results in this way (the independent variables are categorical variables): I have tried to use anova function, but I only could get Chi sq and p value. And it is...

What would cause a regression model to always under predict? For over a year now an associate of mine has been producing a linear model for a client which predicts trends with reasonable accuracy but always under predicts the magnitude. This has bothered...

This seems like a simple thing but it dawned on me I did not really know how to answer this. What are the criteria for being able to estimate an effect? For example, with the data set: dat<-structure(list(Y = c(100L, 556L, 25L, 32L, 15L, 56L, 95L,...

I am using a GLM to model my data. The response variable is binary and I have three predictors of which two are continuous variables and one is binary. Would the distribution of predictors be important when I am fitting the model? That is, would it be...

I'm trying to create a very specific folder tree using Python that will create predefined folders and subfolders. I can create the top and second level folders, but the program messes up when it creates the third and fourth level folders. It either doesn't...

There's an old of thumb in multivariate statistics that recommends a minimum of 10 cases for each independent variable. But that's often where there is one parameter to fit for each variable. Why I'm asking: I'm working through a textbook example that...

Let's say I were to impute values for a variable (using multiple imputation). Then I wanted to use that variable in a regression. Can I use the same variables I used to impute in my new regression? So, for example, if I were to impute math test score...

I am using using VIF to remove multiconnearity. Prior to VIF I stationarized all my variables first. Once I defined the set of variables from the VIF, is it okay if I revert to the original time series, which is not necessarily stationary, before random...

*
I'm a student and I have been given this above assumption to Validate and checks to avoid Violations.My question is According to this Assumption how do we validate and checks to avoid Violation.
*

I have following problem scenario. Assume, a system with two groups: P and R. P group has 3 samples (p1, p2, p3) and R group has 4 samples (r1, r2, r3, r4). All samples can belong to any of the two classes (c0 or c1). And, each sample has 3 features:...

Let's say I have the following OLS model: Y = X1 + X2 + X3 + e Variable of interest in X1 while X2 and X3 are control variables. Based on theory, I know X4 (e.g. firm size) should have an affect on Y. I also know I should standardize X1 since one unit...

My question grew out of discussion with @whuber in the comments of a different question. Specifically @whuber 's comment was as follows: One reason it might surprise you is that the assumptions underlying a correlation test and a regression slope test...

I have a mixt linear model with a random variable, so a lmer model. Q1) I would like to know what are the assumptions that need to be checked for this kind of model? Is it enough to look at the residues (student residues, normality) and the extreme values...

Suppose we have an hotel and we know all the reservations of the last five years. We would like to forecast/estimate the room demand day-by-day for the next year. I'm a mathematician but not a statistician, I'm sorry if I'm saying something trivial or...

As I researched from documents, generally we use 3 sets that; 1)a set for training classifiers 2)a set for testing the classifiers during development and 3) an untouched test set that is only used after the development.. So my question is why we really...

What will be the effect on the solution of least square analysis if we apply the following transformations on the training set: add a real number $k$ to the output value of each datapoint. And the model is $y = \phi w$ where $w$ is the weight attached...

I am trying to eliminate seasonality from my data using Fourier analysis in MATLAB. Following this post http://stackoverflow.com/questions/19285684/fast-fourier-transform-for-deasonalizing-data-in-matlab?answertab=oldest#tab-top I came up with this code:...

Given a time series with Events, I want to test weather events in two time series are occurring differently. See for example the attached image. There are 12 events (orange) between 2000 and 2007 with different lengths. Let's pretend this are drought...

I am conducting a CFA on a questionnaire with 4 factors. I know that the exploratory factor analysis to obtain theses 4 factors was done using oblimin rotation. I am now wondering, if this affects the model I have to build with lavaan-package in R. Following...

In my data I have $n_1$ people who had an "event" and $n_2$ people who did not. Cases (those with an event) were oversampled substantially (the true prevalence is probably more like 1 in 10000). If it matters, $n_1 = 70$ and $n_2 = 250$. For all of those...

Very short question: are there tools (by preference in R or Stata) to solve a simultaneous equation model, without needing instrumental variables? In my case, I would like to model irrigation and croptype. A farmer needs more or less irrigation depending...

I am trying to calculate modification indices in a Structural Equation Model (SEM) with an endogenous categorical variable. I am using the R package 'lavaan'. Running what it seems to be the correct code and I can not compute them. Reproducible example...

I have $200$ observations of a time series $X_t$ and they have been fitted using a $SARIMA(1, 0, 0)(0, 1, 1)_4$ model which is $y_t= y_{t-4} + \phi_1 (y_{t-1} - y_{t-5}) + Z_t + \Theta Z_{t-4 }$. This is the output of the program made in R: Coefficients:...

We are trying to predict customer's Total recharge value for the next week. We came up with 8 recharge bands. So its now a classification problem. We have tried history of last 8 weeks. We have tried all the famous techniques like feature selection,...

Other than ANN inconsistent prediction performance, What is other ANN disadvantage and weakness?

I have a very large longitudinal dataset consisting of a variable Y measured in time (10million datapoints in 30,000 samples). I would like to assess a large number of predictors (e.g. gender/age) as fixed effects on the slope of Y in time or Y at a...

I have two samples, one which has 1436 observations where sd=0.0405, mean=0.7776 and skewness=0.032 and the other which has 4956 observations and sd=0.0416, mean=0.7716 and skewness=-0.0897. Now i am doing a Welch Two Sample t-test in R, but I am wondering...

I have weekly data. I would like to perform time series analysis on it. Refer to Robjhyndman, period of weekly data can be calculated by approximation 365.25/7 = 52. How can I define period in SPSS for weekly data? The following is data examples which...

