Validation Metrics
Validation metrics are used to evaluate the performance of a model on a validation dataset. These metrics help assess how well a model generalizes to unseen data. Below are common validation metrics categorized by problem type.
1. Regression Problems
- Mean Absolute Error (MAE):
Measures the average absolute difference between predicted and actual values.
Formula: **MAE = (1/n) Σ |y_i - ẏ_i|**
- Mean Squared Error (MSE):
Averages the squared differences between predicted and actual values.
Formula: **MSE = (1/n) Σ (y_i - ẏ_i)²**
- Root Mean Squared Error (RMSE):
The square root of MSE, with the same units as the target variable.
- R² Score (Coefficient of Determination):
Measures the proportion of variance explained by the model.
Formula: **R² = 1 - (Σ(y_i - ẏ_i)² / Σ(y_i - ȳ)²)**
2. Classification Problems
- Accuracy:
Proportion of correct predictions.
Formula: **Accuracy = (Correct Predictions / Total Predictions)**
- Precision:
Measures the proportion of true positives among predicted positives.
Formula: **Precision = TP / (TP + FP)**
- Recall (Sensitivity):
Measures the proportion of true positives identified.
Formula: **Recall = TP / (TP + FN)**
- F1 Score:
Harmonic mean of precision and recall.
Formula: **F1 = 2 × (Precision × Recall) / (Precision + Recall)**
- ROC-AUC:
Measures the trade-off between true positive and false positive rates at various thresholds.
- Log Loss (Cross-Entropy Loss):
Evaluates the accuracy of predicted probabilities.
3. Clustering Problems
- Silhouette Score:
Measures how similar an object is to its cluster compared to other clusters.
- Adjusted Rand Index (ARI):
Evaluates similarity between true labels and clustering results.
- Davies-Bouldin Index:
Assesses compactness and separation of clusters.
- Inertia:
Measures how tightly grouped the clusters are.
4. Time Series Problems
- Mean Absolute Percentage Error (MAPE):
Expresses prediction error as a percentage.
Formula: **MAPE = (100/n) Σ |(y_i - ẏ_i) / y_i|**
- Symmetric Mean Absolute Percentage Error (sMAPE):
Reduces bias for small values in MAPE.
- Mean Squared Logarithmic Error (MSLE):
Penalizes under- and over-predictions logarithmically.
5. Ranking Problems
- Mean Reciprocal Rank (MRR):
Evaluates ranking quality based on the reciprocal of the rank of the first relevant result.
- Normalized Discounted Cumulative Gain (NDCG):
Considers the position of relevant results in a ranked list.
- Precision at k (P@k):
Measures precision for the top-k predictions.
6. Multi-Label Problems
- Hamming Loss:
Proportion of misclassified labels.
Formula: **Hamming Loss = (1/nL) ΣΣ I(y_ij ≠ ẏ_ij)**
- Subset Accuracy:
Measures the percentage of samples where all labels are correctly predicted.
- Macro/Micro Averaged Metrics:
Aggregate metrics across labels (macro) or weight by support (micro).
Summary
The choice of validation metric depends on the problem type, dataset characteristics, and business goals.