Newest 'cross-validation' Questions

3 votes

0 answers

53 views

Do k-folds risk sampling bias and, if so, how do we avoid it?

In cross-validation, $k$-folds are a common way to train, compare and validate models. Often we want to find an optimal set of hyperparameters for our models. There are many ways to probe the ...

Markus Klyver

331

asked Oct 18 at 16:51

2 votes

1 answer

55 views

Should differential expression analysis be incorporated in cross validation for training machine learning models?

I'm conducting some experiments using TCGA-LUAD clinical and RNA-Seq count data. I'm building machine learning models for survival prediction (Random Survival Forests, Survival Support Vector Machines,...

Yordany Paz

21

asked Oct 11 at 18:22

2 votes

0 answers

58 views

Cross-validating multi-output models: importance + SHAP

I am currently developing a project that deals with multiple targets which can have different numbers of cardinalities. The idea is to use different ML-models(e.g. Random Forest, SVM, AdaBoost) and ...

Le Roi des Aulnes

53

asked Oct 3 at 17:30

0 votes

0 answers

24 views

What is the best way to determine if cross validated R-squared scores are significantly different? [duplicate]

I'm comparing, pairwise, the results of Linear Regression models with transformations applied to one numerical feature and the target. I'm using K folds cross validation scoring with R-squared. The ...

Morgan P

1

asked Sep 22 at 11:14

1 vote

0 answers

45 views

How to choose between ARIMA and ARFIMA?

I am in the position of having a time series data set that I can model well using either a Autoregressive Fractionally Integrated Moving Average (ARFIMA) or an ARIMA model. I'm asking for ways to ...

David White

297

asked Sep 20 at 13:54

4 votes

1 answer

515 views

Should I normalize both train and valdiation sets or only the train set?

I have a question about normalization when merging training and validation sets for cross-validation. Normally, I normalize using re-scaling (Min-Max Normalization) calculated from the training set ...

Suebpong Pruttipattanapong

43

asked Aug 20 at 5:30

1 vote

2 answers

243 views

A proper approach to K-fold cross validation on imbalanced data

What is the proper algorithm for k-fold CV in case of class-balancing (under/over sampling)? Variant 1: split data into train and test set balance classes in the train set run k-fold CV Variant 2: ...

Jakub Małecki

378

asked Aug 14 at 10:47

4 votes

1 answer

128 views

When and how can unsupervised preprocessing before splitting data lead to overoptimistic model performance?

Conceptually, I understand that models should be built totally blind to the test set in order to most faithfully estimate performance on future data. However, I'm struggling to understand the extent ...

Evan

329

asked Jul 30 at 15:22

0 votes

0 answers

52 views

LASSO and cross validation when dealing with missing data

I want to simulate data with missing values and use them to compare the predictive performance of several machine learning algorithms, including LASSO. All analyses will be performed in R, using the ...

Benykō-Zamurai

553

asked Jul 23 at 12:38

4 votes

1 answer

88 views

Confused about the utility of nested cross-validation vs k-fold cross-validation

I am using nested cross validation in mlr3 to tune my model's hyperparameters and gauge its out-of-sample performance. Previously, when I was performing regular k-fold CV, my understanding was that ...

Adverse Effect

51

asked Jul 21 at 19:01

1 vote

1 answer

101 views

How to choose and structure a GLM for species richness with non-normal distribution? [closed]

I know my next steps involve using a GLM and selecting the type of GLM based on my response variables (possibly gamma or Poisson regression?). I also need to standardise explanatory variables to be ...

SMM

41

asked Jul 15 at 15:07

0 votes

1 answer

134 views

Comparing AUROCs of binary classifiers across cross-validation folds: alternatives to DeLong

I have two binary classifiers and would like to check whether there is a statistically significant difference between the area under the ROC curve (AUROC). I have reason to opt for AUROC as my ...

IsaacNuketon

1

asked Jul 7 at 16:09

2 votes

0 answers

30 views

How can one statistically compare machine learning models based on the results of a cross validation? [duplicate]

It is often recommended that one uses cross fold validation to estimate the generalisation ability of a machine learning model. Most ressources I've found however do not adres what one should do after ...

Digitallis

121

asked Jul 2 at 12:19

0 votes

0 answers

63 views

Time series LASSO K-fold cross validation

This topic has been discussed before but I couldn't find a specific answer. Here's my approach to forecast QoQ values, Run the usual LASSO K-fold CV on timeseries data and generate a one-step ahead ...

bebgejo

1

asked Jun 19 at 3:40

0 votes

1 answer

59 views

Data cross validation to predict label from cluster analysis [closed]

My project has the following steps: Use elbow method to determine the features and number of clusters for kmeans. Run kmeans on the data (with determined features and n clusters), and gives the ...

Xin Niu

103

asked Jun 17 at 22:34

Stack Exchange Network

Questions tagged [cross-validation]

Do k-folds risk sampling bias and, if so, how do we avoid it?

Should differential expression analysis be incorporated in cross validation for training machine learning models?

Cross-validating multi-output models: importance + SHAP

What is the best way to determine if cross validated R-squared scores are significantly different? [duplicate]

How to choose between ARIMA and ARFIMA?

Should I normalize both train and valdiation sets or only the train set?

A proper approach to K-fold cross validation on imbalanced data

When and how can unsupervised preprocessing before splitting data lead to overoptimistic model performance?

LASSO and cross validation when dealing with missing data

Confused about the utility of nested cross-validation vs k-fold cross-validation

How to choose and structure a GLM for species richness with non-normal distribution? [closed]

Comparing AUROCs of binary classifiers across cross-validation folds: alternatives to DeLong

How can one statistically compare machine learning models based on the results of a cross validation? [duplicate]

Time series LASSO K-fold cross validation

Data cross validation to predict label from cluster analysis [closed]

Hot Network Questions