Skip to main content

Questions tagged [cross-validation]

Repeatedly withholding subsets of the data during model fitting in order to quantify the model performance on the withheld data subsets.

Filter by
Sorted by
Tagged with
3 votes
0 answers
53 views

Do k-folds risk sampling bias and, if so, how do we avoid it?

In cross-validation, $k$-folds are a common way to train, compare and validate models. Often we want to find an optimal set of hyperparameters for our models. There are many ways to probe the ...
Markus Klyver's user avatar
2 votes
1 answer
55 views

Should differential expression analysis be incorporated in cross validation for training machine learning models?

I'm conducting some experiments using TCGA-LUAD clinical and RNA-Seq count data. I'm building machine learning models for survival prediction (Random Survival Forests, Survival Support Vector Machines,...
Yordany Paz's user avatar
2 votes
0 answers
58 views

Cross-validating multi-output models: importance + SHAP

I am currently developing a project that deals with multiple targets which can have different numbers of cardinalities. The idea is to use different ML-models(e.g. Random Forest, SVM, AdaBoost) and ...
Le Roi des Aulnes's user avatar
0 votes
0 answers
24 views

What is the best way to determine if cross validated R-squared scores are significantly different? [duplicate]

I'm comparing, pairwise, the results of Linear Regression models with transformations applied to one numerical feature and the target. I'm using K folds cross validation scoring with R-squared. The ...
Morgan P's user avatar
1 vote
0 answers
45 views

How to choose between ARIMA and ARFIMA?

I am in the position of having a time series data set that I can model well using either a Autoregressive Fractionally Integrated Moving Average (ARFIMA) or an ARIMA model. I'm asking for ways to ...
David White's user avatar
4 votes
1 answer
515 views

Should I normalize both train and valdiation sets or only the train set?

I have a question about normalization when merging training and validation sets for cross-validation. Normally, I normalize using re-scaling (Min-Max Normalization) calculated from the training set ...
Suebpong Pruttipattanapong's user avatar
1 vote
2 answers
243 views

A proper approach to K-fold cross validation on imbalanced data

What is the proper algorithm for k-fold CV in case of class-balancing (under/over sampling)? Variant 1: split data into train and test set balance classes in the train set run k-fold CV Variant 2: ...
Jakub Małecki's user avatar
4 votes
1 answer
128 views

When and how can unsupervised preprocessing before splitting data lead to overoptimistic model performance?

Conceptually, I understand that models should be built totally blind to the test set in order to most faithfully estimate performance on future data. However, I'm struggling to understand the extent ...
Evan's user avatar
  • 329
0 votes
0 answers
52 views

LASSO and cross validation when dealing with missing data

I want to simulate data with missing values and use them to compare the predictive performance of several machine learning algorithms, including LASSO. All analyses will be performed in R, using the ...
Benykō-Zamurai's user avatar
4 votes
1 answer
88 views

Confused about the utility of nested cross-validation vs k-fold cross-validation

I am using nested cross validation in mlr3 to tune my model's hyperparameters and gauge its out-of-sample performance. Previously, when I was performing regular k-fold CV, my understanding was that ...
Adverse Effect's user avatar
1 vote
1 answer
101 views

How to choose and structure a GLM for species richness with non-normal distribution? [closed]

I know my next steps involve using a GLM and selecting the type of GLM based on my response variables (possibly gamma or Poisson regression?). I also need to standardise explanatory variables to be ...
SMM's user avatar
  • 41
0 votes
1 answer
134 views

Comparing AUROCs of binary classifiers across cross-validation folds: alternatives to DeLong

I have two binary classifiers and would like to check whether there is a statistically significant difference between the area under the ROC curve (AUROC). I have reason to opt for AUROC as my ...
IsaacNuketon's user avatar
2 votes
0 answers
30 views

How can one statistically compare machine learning models based on the results of a cross validation? [duplicate]

It is often recommended that one uses cross fold validation to estimate the generalisation ability of a machine learning model. Most ressources I've found however do not adres what one should do after ...
Digitallis's user avatar
0 votes
0 answers
63 views

Time series LASSO K-fold cross validation

This topic has been discussed before but I couldn't find a specific answer. Here's my approach to forecast QoQ values, Run the usual LASSO K-fold CV on timeseries data and generate a one-step ahead ...
bebgejo's user avatar
0 votes
1 answer
59 views

Data cross validation to predict label from cluster analysis [closed]

My project has the following steps: Use elbow method to determine the features and number of clusters for kmeans. Run kmeans on the data (with determined features and n clusters), and gives the ...
Xin Niu's user avatar
  • 103

15 30 50 per page
1
2 3 4 5
235