Neurosurgical literature
The findings demonstrate promising potential for the application of the ChatGPT in patient education. GPT4 is an accessible tool that can be an immediate solution to enhancing the readability of current neurosurgical literature. Layperson summaries generated by GPT4 would be a valuable addition to neurosurgical journals and would be likely to improve comprehension for patients using internet resources like PubMed 1).
Reporting quality within the neurosurgical literature is low, limiting the ability of journals to act as gatekeepers for evidence-based neurosurgical care. Journal policies during article submission aim to improve reporting quality. We conducted a metascience study characterizing the reporting policies of neurosurgical journals and other related peer-reviewed publications.
Journals were retrieved in 7 searches using Journal Citation Reports and Google Scholar. Characteristics, impact metrics, and submission policies were extracted.
Of 486 results, 54 journals were included, including 27 neurosurgical and 27 related topical journals. Thirty-eight (70.4%) adopted authorship guidelines and 20 (37.0%) disclosure standards of the International Council of Medical Journal Editors. Twenty-six (48.1%) required data availability statement and 33 (61.1%) clinical trials registration. Twenty-one (38.9%) required and 11 (20.4%) recommended adherence to reporting guidelines. Twenty (37.0%) endorsed EQUATOR network guidelines. PRISMA was mentioned by 30 (55.6%) journals, CONSORT by 28 (51.9%), and STROBE by 18 (33.3%). Among neurosurgical journals, factors associated with a requirement or recommendation to follow reporting guidelines among neurosurgical journals included impact factor (P = 0.0013), Article Influence Score (P = 0.0236), SCImago h-index (P = 0.0152), SCImago journal rank (P = 0.002), and CiteScore (P = 0.0023), as well as recommendations pertaining to International Council of Medical Journal Editors authorship guidelines (P = 0.0085), ORCID (P = 0.014), clinical trials registration (P = 0.0369), or data availability statement (P = 0.0047). CONSORT, PRISMA, or STROBE delineations were significantly associated with the mention of another guideline (P < 0.01).
Neurosurgical journal submission policies are inconsistent. Frameworks to improve reporting quality are uncommonly used. Increasing rigor and standardization of reporting policies across journals publishers may improve quality 2).
Quantitative experimental research studies
The escalating complexity of medical literature necessitates tools to enhance readability for patients. This study aimed to evaluate the efficacy of ChatGPT-4 in simplifying neurology and neurosurgical abstracts and patient education materials (PEMs) while assessing content preservation using Latent Semantic Analysis (LSA).
Picton et al. total of 100 abstracts (25 each from Neurosurgery, Journal of Neurosurgery, Lancet Neurology, and JAMA Neurology) and 340 PEMs (66 from the American Association of Neurological Surgeons, 274 from the American Academy of Neurology) were transformed by a GPT-4.0 prompt requesting a 5th grade reading level. Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FKRE) scores were used before/after transformation. Content fidelity was validated via LSA (ranging 0-1, 1 meaning identical topics) and by expert assessment (0-1) for a subset (n = 40). Pearson correlation coefficient compared assessments.
FKGL decreased from 12th to 5th grade for abstracts and 13th to 5th for PEMs (p < 0.001). FKRE scores showed similar improvement (p < 0.001). LSA confirmed high content similarity for abstracts (mean cosine similarity 0.746) and PEMs (mean 0.953). Expert assessment indicated a mean topic similarity of 0.775 for abstracts and 0.715 for PEMs. The Pearson coefficient between LSA and expert assessment of textual similarity was 0.598 for abstracts and -0.167 for PEMs. Segmented analysis of similarity correlations revealed a correlation of 0.48 (p = 0.02) below 450 words and a -0.20 (p = 0.43) correlation above 450 words.
GPT-4.0 markedly improved the readability of medical texts, predominantly maintaining content integrity as substantiated by LSA and expert evaluations. LSA emerged as a reliable tool for assessing content fidelity within moderate-length texts, but its utility diminished for longer documents, overestimating similarity. These findings support the potential of AI in combating low health literacy, however, the similarity scores indicate expert validation is crucial. Future research must strive to improve transformation precision and develop validation methodologies 3)
The study provides compelling evidence that GPT-4.0 can enhance the readability of medical literature while largely preserving content. However, it also highlights the limitations of automated tools in fully capturing content fidelity, particularly for longer texts. Future research should:
Explore methods to refine AI transformations for nuanced accuracy. Investigate the impact of simplified materials on patient comprehension and health outcomes. Develop hybrid workflows that integrate AI simplification with efficient expert review. This research sets a strong foundation for leveraging AI in patient education but underscores the need for continued development and expert oversight to ensure accuracy and trustworthiness.