This is an old revision of the document!
Magnetic Resonance Imaging for Intracranial Metastases Multicenter Studies
In a multicenter study Topff et al. from:
- Netherlands Cancer Institute, Amsterdam (Netherlands)
- Maastricht University, Maastricht (Netherlands)
- Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow (India)
- Robovision, Ghent (Belgium)
- Hospital Universitario Rey Juan Carlos & Universidad CEU San Pablo, Madrid (Spain)
- Stanford University, Stanford, CA (USA)
- Erasmus MC, Rotterdam & Delft (Netherlands)
- Elisabeth‑TweeSteden Hospital, Tilburg (Netherlands)
- Hospital Universitario Marqués de Valdecilla, Santander (Spain)
- Clínica Universidad de Navarra, Pamplona (Spain)
- Hospital Universitario La Moraleja, Madrid (Spain)
- Complejo Asistencial Universitario de León, León (Spain)
- Netherlands Cancer Institute, Amsterdam (Netherlands)
- St. Nikolaus Hospital, Eupen & Ghent University, Ghent (Belgium)
- Netherlands Cancer Institute, Amsterdam (Netherlands)
published in the Radiology Journal to develop a generalizable deep‑learning system for detection, segmentation, and longitudinal tracking of brain metastases (BMs) of any size on MRI. The model achieved high sensitivity (98.0% internal, 97.4% external; 93.3% for <3 mm lesions), Dice ~0.9, minimal false positives (~0.6/patient), and robust generalizability on pre‑ and post‑treatment MRI.
This ambitious study purports to deliver a “generalizable” BM detection tool, but cracks emerge on closer inspection:
- Data heterogeneity: Although incorporating 30 scanners and multiple centers, crucial metadata (e.g., magnet field strength, sequence protocol, slice thickness) are omitted. Without stratified analysis, the claimed generalizability is unverified.
- Annotation bias: Iterative annotation with radiologists is fine, but the absence of inter‑rater reliability metrics (Cohen's κ, ICC) leaves consistency claims unsupported.
- Model architecture is stale: Using a modified nnU‑Net is safe but not innovative; no comparisons are made with more recent architectures (e.g., transformer‑based models), so why should readers trust this model over next‑gen versions?
- External validation limited: The “external” test set is still retrospective, likely from similar geographic areas; true externality (other continents, vendor diversity) is not demonstrated.
- Post‑treatment lesions miscount: Post‑treatment false negatives can mislead; while sensitivity remains high, the clinical consequences of a missed small post‑radiotherapy lesion are downplayed.
- Lack of clinical outcome linkage: Performance metrics like Dice and sensitivity are algorithmic; no data show improved clinical workflow or patient outcomes.
Verdict: Promising engineering, but overhyped claims. The model lacks rigorous external validation, consistency assessment, and ultimate clinical relevance.
Takeaway for Neurosurgeons
High algorithmic performance (~98% sensitivity) promises support in volumetric monitoring and tiny lesion detection. But before clinical trust, we need proof of reliability across diverse scanners and actual patient workflow integration.
Bottom Line
A data‑rich multi‑center study showing technical prowess, yet falling short on validation breadth, clinical translation, and methodological transparency—far from paradigm‑shifting.
Quality Rating
4 / 10
Citation & Metadata
- Corresponding Author Email: l.topff@nki.nl :contentReference[oaicite:0]{index=0}
- Full Citation:
1).