Utility of ChatGPT for Neurosurgery

Do LLMs Have 'the Eye' for MRI? Evaluating GPT-4o, Grok, and Gemini on Brain MRI Performance: First Evaluation of Grok in Medical Imaging and a Comparative Analysis
Transforming neurosurgical practice with large language models: comparative performance of ChatGPT-omni and Gemini in complex case management
Super-resolution sodium MRI of human gliomas at 3T using physics-based generative artificial intelligence
Comparison of quality, empathy and readability of physician responses versus chatbot responses to common cerebrovascular neurosurgical questions on a social media platform
Transforming Neurosurgical Practice with Large Language Models: Comparative Performance of ChatGPT-Omni and Gemini in Complex Case Management
A Generative Artificial Intelligence Copilot for Biomedical Nanoengineering
The associations between cerebral microhemorrhages and cognitive decline across Alzheimer's continuum
Utilizing Large language models to select literature for meta-analysis shows workload reduction while maintaining a similar recall level as manual curation

ChatGPT can be a valuable tool in the field of neurosurgery in a variety of ways:

1. Neurosurgical Literature Review: ChatGPT can assist in quickly reviewing and summarizing medical literature, providing access to studies, clinical trials, and case reports related to neurosurgery. It can support clinicians in keeping up with the latest research and developments.

2. Patient Education: It can be used to generate patient-friendly explanations about neurosurgical conditions, procedures, and recovery processes. This helps in improving patient understanding and informed consent.

3. Clinical Decision Support: While not a replacement for clinical judgment, ChatGPT can assist neurosurgeons by providing general guidance on surgical techniques, post-operative care, and possible complications based on evidence-based practices.

4. Case Discussions: ChatGPT can help facilitate case discussions by generating possible differential diagnoses, and treatment options, or suggesting questions to consider during multidisciplinary team meetings.

5. Training and Education: ChatGPT can assist in educating medical students, residents, or fellows by offering explanations of complex neurosurgical concepts, surgical anatomy, and procedural steps. It can also provide quizzes, flashcards, or mock scenarios for educational purposes.

6. Documentation Assistance: ChatGPT can help neurosurgeons by suggesting templates or helping draft clinical notes, discharge summaries, or preoperative evaluations, saving time on administrative tasks.

7. Problem-solving: It can be a valuable tool for brainstorming or offering possible solutions to complex neurosurgical problems, whether related to specific patient cases, equipment usage, or procedural techniques.

ChatGPT encounters multiple opportunities and challenges in neurosurgery.

Although ChatGPT is a powerful language model, it cannot substitute for the expertise and experience of trained medical professionals. It cannot perform physical examinations, make diagnosis, administer treatments, establish trust, provide emotional support, and assist in the recovery process. Moreover, the implementation of Artificial Intelligence in healthcare necessitates careful consideration of legal and ethical concerns. While recognizing the potential of ChatGPT, additional training with comprehensive data is necessary to fully maximize its capabilities. ¹⁾.

The objective of the study of Roman et al. was to explore the use of ChatGPT (Chat-Generative Pre-Trained Transformer) in neurosurgery and its potential impact on the field. The authors aim to discuss, through a systematic review of current literature, how this rising new artificial intelligence (AI) technology may prove to be a useful tool in the future, weighing its potential benefits and limitations. The authors conducted a comprehensive and systematic literature review of the use of ChatGPT and its applications in healthcare and different neurosurgery topics. Through a systematic review of the literature, with a search strategy using databases such as PubMed, Google Scholar, and Embase, they analyzed the advantages and limitations of using ChatGPT in neurosurgery and evaluated its potential impact. ChatGPT has demonstrated promising results in various applications, such as natural language processing, language translation, and text summarization. In neurosurgery, ChatGPT can assist in different areas such as neurosurgical planning, image recognition, medical diagnosis, patient care, and scientific production. A total of 128 articles were retrieved from databases, where the final 22 articles were included for thorough analysis. The studies reviewed demonstrate the potential of AI and deep learning (DL), through language models such as ChatGPT, to improve the accuracy and efficiency of neurosurgical procedures, as well as diagnosis, treatment, and patient outcomes across various medical specialties, including neurosurgery. There are, however, limitations to its use, including the need for large datasets and the potential for errors in the output, which most authors concur will need human verification for the final application. The search demonstrated the potential that ChatGPT holds for the present and future, by the studies' authors' findings herein analyzed and expert opinions. Further research and development are required to fully understand its capabilities and limitations. AI technology can serve as a useful tool to augment human intelligence; however, it is essential to use it in a responsible and ethical manner ²⁾.

Ward et al. assessed ChatGPT's capability to determine the emergent nature of neurosurgical scenarios and make diagnoses based on information one would find in a neurosurgical consult.

Thirty clinical scenarios were given to 3 attendings, 4 residents, 2 physician assistants, and 2 subinterns. Participants were asked to determine if the scenario constituted an urgent neurosurgical consultation and what the most likely diagnosis was. Attending responses provided a consensus to use as the answer key. Generative pretraining transformer (GPT) 3.5 and GPT 4 were given the same questions, and their responses were compared with the other participants.

GPT 4 was 100% accurate in both diagnosis and triage of the scenarios. GPT 3.5 had an accuracy of 92.59%, slightly below that of a PGY1 (96.3%), an 88.24% sensitivity, 100% specificity, 100% positive predictive value, and 83.3% negative predictive value in triaging each situation. When making a diagnosis, GPT 3.5 had an accuracy of 92.59%, which was higher than the subinterns and similar to resident responders.

GPT 4 can diagnose and triage neurosurgical scenarios at the level of a senior neurosurgical resident. There has been a clear improvement between GPT 3.5 and 4. Likely, the recent updates in internet access and the functionality of ChatGPT will further improve its utility in neurosurgical triage ³⁾

ChatGPT for Neurosurgical Literature Review.

ChatGPT for Neurosurgical Patient Education.

ChatGPT for Neurosurgical Decision Support.

ChatGPT for Neurosurgical Case Discussion.

ChatGPT for Neurosurgical education.

ChatGPT for Neurosurgical Documentation Assistance.

ChatGPT provides accurate and reliable answers to patients with epilepsy and was a valuable source of information. It also provides partial emotional support, potentially assisting those experiencing emotional distress. However, ChatGPT may provide incorrect responses, leading users to inadvertently accept incorrect and potentially dangerous advice. Therefore, the direct use of ChatGPT for medical guidance is not recommended and its primary use at present is in patient education ⁴⁾

ChatGPT 4 and Scholar AI Premium excelled in classifying single-curve scoliosis with perfect sensitivity and specificity. These models demonstrated unmatched rater concordance and excellent performance metrics. In comparing real and AI-generated scoliosis classifications, they showed impeccable precision in all posturographic images, indicating total accuracy (1.0, MAE = 0.0) and remarkable inter-rater agreement, with a perfect Fleiss' Kappa score. This was consistent across scoliosis cases with Cobb's angle range of 11-92 degrees. Despite high classification accuracy, each model used an incorrect angular range for the mild stage of scoliosis. Our findings highlight the immense potential of AI in analyzing medical data sets. However, the diversity in competencies of AI models indicates the need for their further development to more effectively meet specific needs in clinical practice ⁵⁾.

The use of artificial intelligence in neurosurgical education has been growing in recent times. ChatGPT, has been gaining popularity as an alternative education method. It is necessary to explore the potential of this program in neurosurgery education and to evaluate its reliability.

Murphy Lonergan et al. evaluated the performance of LLMs in answering surgical questions relevant to clinical practice and to assess how this performance varies across different surgical specialties. We used the MedMCQA dataset, a large-scale multi-choice question-answer (MCQA) dataset consisting of clinical questions across all areas of medicine. We extracted the relevant 23,035 surgical questions and submitted them to the popular LLMs Generative Pre-trained Transformers (GPT)-3.5 and GPT-4 (OpenAI OpCo, LLC, San Francisco, CA). A Generative Pre-trained Transformer is a large language model that can generate human-like text by predicting subsequent words in a sentence based on the context of the words that come before it. It is pre-trained on a diverse range of texts and can perform a variety of tasks, such as answering questions, without needing task-specific training. The question-answering accuracy of GPT was calculated and compared between the two models and across surgical specialties. Both GPT-3.5 and GPT-4 achieved accuracies of 53.3% and 64.4%, respectively, on surgical questions, showing a statistically significant difference in performance. When compared to their performance on the full MedMCQA dataset, the two models performed differently: GPT-4 performed worse on surgical questions than on the dataset as a whole, while GPT -3.5 showed the opposite pattern. Significant variations in accuracy were also observed across different surgical specialties, with strong performances in anatomy, vascular, and pediatric surgery and worse performances in orthopedics, ENT, and neurosurgery. Large language models exhibit promising capabilities in addressing surgical questions, although the variability in their performance between specialties cannot be ignored. The lower performance of the latest GPT-4 model on surgical questions relative to questions across all medicine highlights the need for targeted improvements and continuous updates to ensure relevance and accuracy in surgical applications. Further research and continuous monitoring of LLM performance in surgical domains are crucial to fully harnessing their potential and mitigating the risks of misinformation ⁶⁾.

Use of artificial intelligence: In the preparation of a manuscript, artificial intelligence (AI) tools, specifically OpenAI’s ChatGPT, were utilized for grammar refinement and typographical error correction ⁷⁾

ChatGPT prompts for neurosurgery.

The integration of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT-4, is transforming healthcare. ChatGPT's potential to assist in decision-making for complex cases, such as spinal metastases treatment, is promising but widely untested. Especially in cancer patients who develop spinal metastases, precise and personalized treatment is essential. A study examines ChatGPT-4's performance in treatment planning for spinal metastases cases compared to experienced spine surgeons.

Five spine metastasis cases were randomly selected from recent literature. Consequently, five spine surgeons and ChatGPT-4 were tasked with providing treatment recommendations for each case in a standardized manner. Responses were analyzed for frequency distribution, agreement, and subjective rater opinions. ChatGPT's treatment recommendations aligned with the majority of human raters in 73% of treatment choices, with moderate to substantial agreement on systemic therapy, pain management, and supportive care. However, ChatGPT's recommendations tended towards generalized statements, with raters noting its generalized answers. Agreement among raters improved in sensitivity analyses excluding ChatGPT, particularly for controversial areas like surgical intervention and palliative care. ChatGPT shows potential in aligning with experienced surgeons on certain treatment aspects of spinal metastasis. However, its generalized approach highlights limitations, suggesting that training with specific clinical guidelines could potentially enhance its utility in complex case management. Further studies are necessary to refine AI applications in personalized healthcare decision-making ⁸⁾

1: Cheng K, Li Z, Guo Q, Sun Z, Wu H, Li C. Emergency surgery in the era of artificial intelligence: ChatGPT could be the doctor's right-hand man. Int J Surg. 2023 Apr 20. doi: 10.1097/JS9.0000000000000410. Epub ahead of print. PMID: 37074733.

2: Sevgi UT, Erol G, Doğruel Y, Sönmez OF, Tubbs RS, Güngor A. The role of an open artificial intelligence platform in modern neurosurgical education: a preliminary study. Neurosurg Rev. 2023 Apr 14;46(1):86. doi: 10.1007/s10143-023-01998-2. PMID: 37059815.

3: Cheng K, Sun Z, He Y, Gu S, Wu H. The potential impact of ChatGPT/GPT-4 on surgery: will it topple the profession of surgeons? Int J Surg. 2023 Apr 12. doi: 10.1097/JS9.0000000000000388. Epub ahead of print. PMID: 37037587.

4: Hegde A, Srinivasan S, Menon G. Extraventricular Neurocytoma of the Posterior Fossa: A Case Report Written by ChatGPT. Cureus. 2023 Mar 6;15(3):e35850. doi: 10.7759/cureus.35850. PMID: 37033498; PMCID: PMC10076908.

5: D'Amico RS, White TG, Shah HA, Langer DJ. I Asked a ChatGPT to Write an Editorial About How We Can Incorporate Chatbots Into Neurosurgical Research and Patient Care…. Neurosurgery. 2023 Apr 1;92(4):663-664. doi: 10.1227/neu.0000000000002414. Epub 2023 Feb 9. PMID: 36757199.

¹⁾

Qian C, Fang Y. Re: ChatGPT encounters multiple opportunities and challenges in neurosurgery. Int J Surg. 2024 Jan 18. doi: 10.1097/JS9.0000000000001086. Epub ahead of print. PMID: 38241318.

²⁾

Roman A, Al-Sharif L, Al Gharyani M. The Expanding Role of ChatGPT (Chat-Generative Pre-Trained Transformer) in Neurosurgery: A Systematic Review of Literature and Conceptual Framework. Cureus. 2023 Aug 15;15(8):e43502. doi: 10.7759/cureus.43502. PMID: 37719492; PMCID: PMC10500385.

³⁾

Ward M, Unadkat P, Toscano D, Kashanian A, Lynch DG, Horn AC, D'Amico RS, Mittler M, Baum GR. A Quantitative Assessment of ChatGPT as a Neurosurgical Triaging Tool. Neurosurgery. 2024 Aug 1;95(2):487-495. doi: 10.1227/neu.0000000000002867. Epub 2024 Feb 14. PMID: 38353523.

⁴⁾

Wu Y, Zhang Z, Dong X, Hong S, Hu Y, Liang P, Li L, Zou B, Wu X, Wang D, Chen H, Qiu H, Tang H, Kang K, Li Q, Zhai X. Evaluating the performance of the language model ChatGPT in responding to common questions of people with epilepsy. Epilepsy Behav. 2024 Jan 19;151:109645. doi: 10.1016/j.yebeh.2024.109645. Epub ahead of print. PMID: 38244419.

⁵⁾

ChatGPT 4 and Scholar AI Premium excelled in classifying single-curve scoliosis with perfect sensitivity and specificity. These models demonstrated unmatched rater concordance and excellent performance metrics. In comparing real and AI-generated scoliosis classifications, they showed impeccable precision in all posturographic images, indicating total accuracy (1.0, MAE = 0.0) and remarkable inter-rater agreement, with a perfect Fleiss' Kappa score. This was consistent across scoliosis cases with a Cobb's angle range of 11-92 degrees. Despite high accuracy in classification, each model used an incorrect angular range for the mild stage of scoliosis. Our findings highlight the immense potential of AI in analyzing medical data sets. However, the diversity in competencies of AI models indicates the need for their further development to more effectively meet specific needs in clinical practice.

⁶⁾

Murphy Lonergan R, Curry J, Dhas K, Simmons BI. Stratified Evaluation of GPT's Question Answering in Surgery Reveals Artificial Intelligence (AI) Knowledge Gaps. Cureus. 2023 Nov 14;15(11):e48788. doi: 10.7759/cureus.48788. PMID: 38098921; PMCID: PMC10720372.

⁷⁾

Neyazi M, Khajuria RK, Muhammad S. How I do it - focused Sylvian approach for clipping of middle cerebral artery aneurysms. Acta Neurochir (Wien). 2025 Jan 10;167(1):9. doi: 10.1007/s00701-025-06423-9. PMID: 39789366; PMCID: PMC11717822.

⁸⁾

Heisinger S, Salzmann SN, Senker W, Aspalter S, Oberndorfer J, Matzner MP, Stienen MN, Motov S, Huber D, Grohs JG. ChatGPT's Performance in Spinal Metastasis Cases-Can We Discuss Our Complex Cases with ChatGPT? J Clin Med. 2024 Dec 23;13(24):7864. doi: 10.3390/jcm13247864. PMID: 39768787; PMCID: PMC11727723.

Utility of ChatGPT for Neurosurgery

Systematic Reviews

ChatGPT for Neurosurgical Triaging Tool

ChatGPT for Neurosurgical Literature Review

ChatGPT for Neurosurgical Patient Education

ChatGPT for Neurosurgical Decision Support

ChatGPT for Neurosurgical Case Discussion

ChatGPT for Neurosurgical Training and Education

ChatGPT for Neurosurgical education

ChatGPT for Neurosurgical Documentation Assistance

Epilepsy

Scoliosis

Limitations

Grammar refinement and typographical error correction

Prompts

Spinal metastases treatment

References

Neurosurgery Wiki