====== 🔁 Five-Fold Cross-Validation ====== **Five-fold cross-validation (5-fold CV)** is a statistical method used to **evaluate the performance of a machine learning model**. It helps ensure the model generalizes well to unseen data. ===== 🧠 What Is It? ===== * The dataset is randomly split into **5 equal parts (folds)**. * The model is trained on **4 folds** and tested on the **remaining 1 fold**. * This process repeats **5 times**, each time using a different fold as the test set. * Final performance is the **average** of the 5 results. ===== 📊 Why Use It? ===== * Reduces **overfitting risk** * Provides a **more reliable estimate** of model performance than a single train/test split * Especially useful when data is **limited** ===== 🔢 Example Workflow ===== - Total samples = 100 patients with MRI radiomics - Split into 5 folds of 20 patients each: * Iteration 1: Train on folds 1–4, test on fold 5 * Iteration 2: Train on folds 1,2,3,5, test on fold 4 * … * Iteration 5: Train on folds 2–5, test on fold 1 - Compute metrics (e.g., accuracy, AUC) at each step - Final result: **mean of all 5 metrics** ===== ✅ Advantages ===== * Uses **all data** for both training and testing * Provides **variance estimate** of model performance * Good balance between **bias and variance** ===== ⚠️ Notes ===== * Use **stratified** 5-fold CV when class labels are imbalanced * Avoid data leakage: normalize or extract features **within each fold** ===== 🛠 Python Snippet ===== from sklearn.model_selection import cross_val_score from xgboost import XGBClassifier model = XGBClassifier() scores = cross_val_score(model, X, y, cv=5, scoring='accuracy') print("5-fold CV Accuracy:", scores.mean())