๐Ÿ“ˆ Model Performance

Evaluating model performance is crucial to determine how well a machine learning model generalizes to new data. In medical applications (e.g., lesion classification), this ensures clinical usefulness and patient safety.

Metric Description Interpretation
Accuracy Ratio of correct predictions to total cases Good for balanced datasets
Precision TP / (TP + FP) How many predicted positives are true
Recall (Sensitivity) TP / (TP + FN) Ability to detect true positives
Specificity TN / (TN + FP) Ability to detect true negatives
F1-Score Harmonic mean of precision and recall Balances precision & recall
AUC (Area Under ROC Curve) Measures ability to distinguish between classes Closer to 1 = better
Balanced Accuracy Mean of sensitivity and specificity Useful for imbalanced datasets
Confusion Matrix Table showing TP, FP, TN, FN Full picture of model errors

* Metrics are computed per class, then averaged:

  • Macro average: Equal weight to each class
  • Micro average: Accounts for class imbalance
  • Weighted average: Weighted by class frequency
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
 
# Assume y_test = true labels, y_pred = predicted labels
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
 
# For multi-class AUC (with probabilities)
roc_auc_score(y_test, y_proba, multi_class="ovr")

* High sensitivity: Essential in detecting critical conditions (e.g., tumors) * High specificity: Important to avoid false positives * Balanced accuracy: Prevents overestimation in imbalanced data (e.g., rare tumors)

* Always report multiple metrics, not just accuracy * Use cross-validation to avoid overfitting * Consider confidence intervals for key metrics

  • model_performance.txt
  • Last modified: 2025/05/04 18:09
  • by administrador