AtlasGPT

AtlasGPT represents an innovative generative pretrained transformer, trained using neurosurgery literature. Its ability to construct its response according to the training level of the user is unique; however, whether its responses can be comprehended at each user's training level remains unknown. This study aimed to analyze the readability of responses provided by AtlasGPT.

Ten queries were presented to AtlasGPT across its 4 user profiles (i.e., surgeon, resident, medical student, patient). A readability analysis was performed using multiple instruments on Readability Studio. Readability scores of user-specific responses were compared using one-way analysis of variance testing and post hoc pairwise t-tests with Bonferroni correction. P value <0.05 was considered to be significant.

Across the readability instruments that were leveraged, significant differences in reading ease were observed across all user profiles on comparisons to the patient (P < 0.005). Readability scores for the medical student profile tended to show greater reading ease than the surgeon and resident profiles; these differences, however, were not significant. The mean grade levels for patient responses across multiple instruments ranged from 8.8 to 11.51. Only one output via the New Dale-Chall assessment was written at the level of fifth-sixth grade.

AtlasGPT-generated content demonstrates readability variations according to the user profile selected; however, the readability of patient content still exceeds recommendations set by United States departmental agencies, necessitating a call to action 1).


The study highlights both the potential and current limitations of AtlasGPT in tailoring responses for various user profiles. While it effectively differentiates content to some extent, particularly between patient and professional profiles, the findings indicate that patient-level readability remains a critical challenge. Addressing these limitations could enhance AtlasGPT's role in medical communication, advancing both patient care and education within the neurosurgical domain.


To assess the predictive accuracy of advanced AI language models and established clinical scales in prognosticating outcomes for patients with aneurysmal subarachnoid hemorrhage (aSAH). This retrospective cohort study included 82 patients suffering from aSAH. We evaluated the predictive efficacy of AtlasGPT and ChatGPT 4.0 by examining the area under the curve (AUC), sensitivity, specificity, and Youden's Index, in comparison to established clinical grading scales such as the World Federation of Neurological Surgeons (WFNS) scale, Simplified Endovascular Brain Edema Score (SEBES), and Fisher scale. This assessment focused on four endpoints: in-hospital mortality, the need for decompressive hemicraniectomy, and functional outcomes at discharge and after 6-month follow-up. In-hospital mortality occurred in 22% of the cohort, and 34.1% required decompressive hemicraniectomy during treatment. At hospital discharge, 28% of patients exhibited a favorable outcome (mRS ≤ 2), which improved to 46.9% at the 6-month follow-up. Prognostication utilizing the WFNS grading scale for 30-day in-hospital survival revealed an AUC of 0.72 with 59.4% sensitivity and 83.3% specificity. AtlasGPT provided the highest diagnostic accuracy (AUC 0.80, 95% CI: 0.70-0.91) for predicting the need for decompressive hemicraniectomy, with 82.1% sensitivity and 77.8% specificity. Similarly, for discharge outcomes, the WFNS score and AtlasGPT demonstrated high prognostic values with AUCs of 0.74 and 0.75, respectively. Long-term functional outcome predictions were best indicated by the WFNS scale, with an AUC of 0.76. The study demonstrates the potential of integrating AI models such as AtlasGPT with clinical scales to enhance outcome prediction in aSAH patients. While established scales like WFNS remain reliable, AI language models show promise, particularly in predicting the necessity for surgical intervention and short-term functional outcomes. The study explored the use of advanced AI language models, AtlasGPT and ChatGPT 4.0, to predict outcomes for patients with aneurysmal subarachnoid hemorrhage (aSAH). It found that AtlasGPT provided the highest diagnostic accuracy for predicting the need for decompressive hemicraniectomy, outperforming traditional clinical scales, while both AI models showed promise in enhancing outcome predictions when integrated with established clinical assessment tools 2).


This study highlights the potential of AI models like AtlasGPT to augment traditional clinical scales in predicting aneurysmal subarachnoid hemorrhage prognosis, particularly in identifying the need for surgical interventions and short-term functional outcomes. While the results are promising, significant challenges remain, including the need for larger sample sizes, external validation, and improved explainability. Integrating AI models with clinical workflows holds great promise for enhancing prognostication and personalizing care in neurosurgery.


Hopkins BS, Carter B, Lord J, Rutka JT, Cohen-Gadol AA. Editorial. AtlasGPT: dawn of a new era in neurosurgery for intelligent care augmentation, operative planning, and performance. J Neurosurg. 2024 Feb 27;140(5):1211-1214. doi: 10.3171/2024.2.JNS232997. PMID: 38412477.


Mohamed AA, Johansen PM, Lucke-Wold B. Letter to the Editor. AtlasGPT and beyond: optimizing neurosurgical chatbots through data sources and transfer learning. J Neurosurg. 2024 Jun 28;141(3):880-881. doi: 10.3171/2024.4.JNS24895. PMID: 38941629.


A large language model (LLM), in the context of natural language processing and artificial intelligence, refers to a sophisticated neural network that has been trained on a massive amount of text data to understand and generate human-like language. These models are typically built on architectures like transformers. The term “large” indicates that the neural network has many parameters, making it more powerful and capable of capturing complex patterns in language. One notable example of a large language model is ChatGPT. ChatGPT is a large language model developed by OpenAI that uses deep learning techniques to generate human-like text. It can be trained on various tasks, such as language translation, question answering, and text completion. One of the key features of ChatGPT is its ability to understand and respond to natural language inputs. This makes it a powerful tool for generating a wide range of text, including medical reports, surgical notes, and even poetry. Additionally, the model has been trained on a large corpus of text, which allows it to generate text that is both grammatically correct and semantically meaningful. In terms of applications in neurosurgery, ChatGPT can be used to generate detailed and accurate surgical reports, which can be very useful for sharing information about a patient's case with other members of the medical team. Additionally, the model can be used to generate detailed surgical notes, which can be very useful for training and educating residents and medical students. Overall, LLMs have the potential to be a valuable tool in the field of neurosurgery. Indeed, this abstract was generated by ChatGPT within a few seconds. Potential applications and pitfalls of the applications of LLMs are discussed in this paper 3)

1)
Lavadi RS, Carnovale B, Tirmizi Z, Gajjar AA, Kumar RP, Shah MJ, Hamilton DK, Agarwal N. Examining the Readability of AtlasGPT, the Premiere Resource for Neurosurgical Education. World Neurosurg. 2024 Dec 6;194:123469. doi: 10.1016/j.wneu.2024.11.052. Epub ahead of print. PMID: 39577655.
2)
Basaran AE, Güresir A, Knoch H, Vychopen M, Güresir E, Wach J. Beyond traditional prognostics: integrating RAG-enhanced AtlasGPT and ChatGPT 4.0 into aneurysmal subarachnoid hemorrhage outcome prediction. Neurosurg Rev. 2025 Jan 11;48(1):40. doi: 10.1007/s10143-025-03194-w. PMID: 39794551; PMCID: PMC11723888.
3)
Di Ieva A, Stewart C, Suero Molina E. Large Language Models in Neurosurgery. Adv Exp Med Biol. 2024;1462:177-198. doi: 10.1007/978-3-031-64892-2_11. PMID: 39523266.