In a pilot comparative study Hopkins et al. from the Keck School of Medicine, USC, Los Angeles (Neurosurgery; Endocrinology) published in Neurosurgical Focus to evaluate whether a modified OpenAI Generative artificial intelligence model can match or improve upon the accuracy of a commercial dictation tool (Nuance Dragon Medical One) in neurosurgical operative report generation. Whisper‑based model demonstrated non‑inferior overall word error rate (WER) versus Dragon (1.75% vs 1.54%, p=0.08). Excluding linguistic errors, Whisper outperformed Dragon (0.50% vs 1.34%, p<0.001; total errors 6.1 vs 9.7, p=0.002). Whisper’s performance was robust to faster speech and longer recordings, unlike Dragon 1).
Critical Review
* Strengths:
-
Direct comparison of a cutting-edge generative AI (Whisper) to an established clinical tool in a real-world neurosurgical workflow.
-
Objective metrics (WER) with appropriate statistical analysis.
-
Mixed-case operative reports cover cranial and spinal procedures, enhancing generalizability.
* Weaknesses & Limitations:
-
Small sample size (n=10 reports, 3 physicians) limits statistical power.
-
Lack of real-time clinical integration assessments—only offline comparisons.
-
No analysis of downstream impact on report quality, clinician satisfaction, or patient safety.