The use of generative artificial intelligence–based dictation in a neurosurgical practice: a pilot study

In a pilot comparative study Hopkins et al. from the Keck School of Medicine, USC, Los Angeles (Neurosurgery; Endocrinology) published in Neurosurgical Focus to evaluate whether a modified OpenAI Generative artificial intelligence model can match or improve upon the accuracy of a commercial dictation tool (Nuance Dragon Medical One) in neurosurgical operative report generation. Whisper‑based model demonstrated non‑inferior overall word error rate (WER) versus Dragon (1.75% vs 1.54%, p=0.08). Excluding linguistic errors, Whisper outperformed Dragon (0.50% vs 1.34%, p<0.001; total errors 6.1 vs 9.7, p=0.002). Whisper’s performance was robust to faster speech and longer recordings, unlike Dragon 1).


Strengths:

  • Direct comparison of a cutting-edge generative AI (Whisper) to an established clinical tool in a real-world neurosurgical workflow.
  • Objective metrics (WER) with appropriate statistical analysis.
  • Mixed-case operative reports cover cranial and spinal procedures, enhancing generalizability.

Weaknesses & Limitations:

  • Small sample size (n=10 reports, 3 physicians) limits statistical power.
  • Lack of real-time clinical integration assessments—only offline comparisons.
  • No analysis of downstream impact on report quality, clinician satisfaction, or patient safety.

    Read more