Five-fold cross-validation (5-fold CV) is a statistical method used to evaluate the performance of a machine learning model. It helps ensure the model generalizes well to unseen data.
* The dataset is randomly split into 5 equal parts (folds). * The model is trained on 4 folds and tested on the remaining 1 fold. * This process repeats 5 times, each time using a different fold as the test set. * Final performance is the average of the 5 results.
* Reduces overfitting risk * Provides a more reliable estimate of model performance than a single train/test split * Especially useful when data is limited
from sklearn.model_selection import cross_val_score from xgboost import XGBClassifier model = XGBClassifier() scores = cross_val_score(model, X, y, cv=5, scoring='accuracy') print("5-fold CV Accuracy:", scores.mean())