ChatGPT and similar language models can potentially transform decision-making in neurosurgery by providing real-time insights, aiding in research, and improving patient care workflows. Here’s how ChatGPT can support neurosurgical decision-making:
By harnessing ChatGPT’s capabilities, neurosurgeons can streamline workflows, improve patient care, and stay at the forefront of medical innovation. However, the integration of AI in neurosurgery demands continuous validation, ethical oversight, and collaboration between clinicians and technologists.
To assess the reliability of ChatGPT, we compared its responses against the 2023 Congress of Neurological Surgeons (CNS) guidelines for patients with Chiari I Malformation (CIM).
Methods: ChatGPT-3.5 and ChatGPT-4 were prompted with revised questions from the 2023 CNS guidelines for patients with CIM. ChatGPT responded were compared to CNS guideline recommendations using cosine similarity scores and reviewer assessments of 1) contradiction with guidelines, 2) recommendations not contained in guidelines, and 3) failure to include guideline recommendations. A scoping review was conducted to investigate reviewer-identified discrepancies between CNS recommendations and GPT-4 responses.
Results: A majority of ChatGPT responses were coherent with CNS recommendations. However, moderate contradiction was observed between responses and guidelines (15.3% ChatGPT-3.5 responses, 38.5% ChatGPT-4 responses). Additionally, a tendency toward over-recommendation (30.8% ChatGPT-3.5 responses, 46.1% ChatGPT-4 responses) rather than under-recommendation (15.4% ChatGPT-3.5 responses, 7.7% ChatGPT-4 responses) was observed. Cosine similarity scores revealed moderate similarity between CNS and ChatGPT recommendations (0.553 ChatGPT-3.5, 0.549 ChatGPT-4). The scoping review revealed 19 studies relevant to CNS-ChatGPT substantive contradictions, with mixed support for recommendations contradicting official guidelines.
Moderate incoherence was observed between ChatGPT responses and CNS guidelines on the diagnosis and management of CIM. The recency of the CNS guidelines and mixed support for contradictory ChatGPT responses highlights a need for further refinement of large language models prior to their application as clinical decision support tools 1).