Machine learning (ML) in neurosurgery is an evolving field with significant potential to improve diagnosis, treatment planning, intraoperative decision-making, and patient outcomes. Here are some key areas where ML is making an impact:
### Challenges & Future Directions
Machine learning is revolutionizing neurosurgery, from preoperative planning to intraoperative assistance and postoperative care. As technology advances, AI-driven tools will continue to enhance surgical precision, improve patient outcomes, and transform the field of neurosurgery.
Would you like a deeper dive into a specific application or a practical example of how ML is currently used in neurosurgery?
A study aimed to summarize the current applications of ML in the analysis and assessment of neurosurgical skills. We conducted this systematic review in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We searched the PubMed and Google Scholar databases for eligible studies published until November 15, 2022, and used the Medical Education Research Study Quality Instrument (MERSQI) to assess the quality of the included articles. Of the 261 studies identified, we included 17 in the final analysis. Studies were most commonly related to oncological, spinal, and vascular neurosurgery using microsurgical and endoscopic techniques. Machine learning-evaluated tasks included subpial brain tumor resection, anterior cervical discectomy and fusion, hemostasis of the lacerated internal carotid artery, brain vessel dissection and suturing, glove microsuturing, lumbar hemilaminectomy, and bone drilling. The data sources included files extracted from VR simulators and microscopic and endoscopic videos. The ML application was aimed at classifying participants into several expertise levels, analysis of differences between experts and novices, surgical instrument recognition, division of operation into phases, and prediction of blood loss. In two articles, ML models were compared with those of human experts. The machines outperformed humans in all tasks. The most popular algorithms used to classify surgeons by skill level were the support vector machine and k-nearest neighbors, and their accuracy exceeded 90%. The “you only look once” detector and RetinaNet usually solved the problem of detecting surgical instruments - their accuracy was approximately 70%. The experts differed by more confident contact with tissues, higher bimanual, smaller distance between the instrument tips, and relaxed and focused state of mind. The average MERSQI score was 13.9 (from 18). There is growing interest in the use of ML in neurosurgical training. Most studies have focused on the evaluation of microsurgical skills in oncological neurosurgery and on the use of virtual simulators; however, other subspecialties, skills, and simulators are being investigated. Machine learning models effectively solve different neurosurgical tasks related to skill classification, object detection, and outcome prediction. Properly trained ML models outperform human efficacy. Further research on ML application in neurosurgery is needed
Conclusion: The study found that ML is becoming increasingly important in neurosurgical training. While most studies focused on microsurgery for brain tumors, researchers are also looking into other types of surgeries and simulators. Machine Learning models are proving to be very effective in tasks related to neurosurgery skills, instrument recognition, and predicting outcomes. In fact, properly trained ML models performed better than humans. 1).
Machine learning applications have been reviewed in neurosurgery 2)
see Machine learning for degenerative cervical myelopathy.
Machine learning (ML) involves algorithms learning patterns in large, complex datasets to predict and classify. Algorithms include neural networks (NN), logistic regression (LR), and support-vector machines (SVM). ML may generate substantial improvements in neurosurgery. This systematic review assessed the current state of neurosurgical ML applications and the performance of algorithms applied. Our systematic search strategy yielded 6866 results, 70 of which met inclusion criteria. Performance statistics analyzed included area under the receiver operating characteristics curve (AUC), accuracy, sensitivity, and specificity. Natural language processing (NLP) was used to model topics across the corpus and to identify keywords within surgical subspecialties. ML applications were heterogeneous. The densest cluster of studies focused on preoperative evaluation, planning, and outcome prediction in spine surgery. The main algorithms applied were NN, LR, and SVM. Input and output features varied widely and were listed to facilitate future research. The accuracy (F(2,19) = 6.56, p < 0.01) and specificity (F(2,16) = 5.57, p < 0.01) of NN, LR, and SVM differed significantly. NN algorithms demonstrated significantly higher accuracy than LR. SVM demonstrated significantly higher specificity than LR. We found no significant difference between NN, LR, and SVM AUC and sensitivity. NLP topic modeling reached maximum coherence at seven topics, which were defined by modeling approach, surgery type, and pathology themes. Keywords captured research foci within surgical domains. ML technology accurately predicts outcomes and facilitates clinical decision-making in neurosurgery. NNs frequently outperformed other algorithms on supervised learning tasks. This study identified gaps in the literature and opportunities for future neurosurgical ML research 3).
A study implemented a supervised machine learning-based approach in modeling estimated symptom resolve time in high school athletes who incurred a concussion during sport activity.
They examined the efficacy of 10 classification algorithms using machine learning for prediction of symptom resolution time (within seven, fourteen, or twenty-eight days), with a dataset representing three years of concussions suffered by high school student-athletes in football (most concussion incidents) and other contact sports.
The most prevalent sport-related concussion reported symptom was headache (94.9%), followed by dizziness (74.3%) and difficulty concentrating (61.1%). For all three category thresholds of predicted symptom resolution time, single-factor ANOVAs revealed statistically significant performance differences across the ten classification models for all learners at a 95% confidence level (P=0.000). Naïve Bayes and Random Forest with either 100 or 500 trees were the top-performing learners with an area under the ROC curve performance ranging between 0.666 and 0.742 (0.0-1.0 scale).
Considering the limitations of these data specific to symptom presentation and resolve, supervised machine learning demonstrated efficacy, while warranting further exploration, in developing symptom-based prediction models for practical estimation of sport-related concussion recovery in enhancing clinical decision support 4).
Current practice of neurosurgery depends on clinical practice guidelines and evidence based research publications that derive results using statistical methods. However, statistical analysis methods have some limitations such as the inability to analyze nonlinear variables, requiring setting a level of significance, being impractical for analyzing large amounts of data and the possibility of human bias. Machine learning is an emerging method for analyzing massive amounts of complex data which relies on algorithms that allow computers to learn and make accurate predictions.
Machine learning has been increasingly implemented in medical research as well as neurosurgical publications. A systematical review aimed to assemble the current neurosurgical literature that machine learning has been utilized, and to inform neurosurgeons on this novel method of data analysis 5)
ML is increasingly tested in neurosurgical applications and even demonstrated to emulate the performance of clinical experts 6) 7) 8) 9) 10) 11) 12) 13) 14) 15) 16) 17) 18) 19) 20) 21) 22) 23) 24) 25) 26) 27) 28).
Automated analysis of radiological data for diagnosis, segmentation, or outcome prediction could, be one of the first ML applications that finds its way to actual clinical practice 29).
Current outcome prediction are largely based on and limited by regression methods. Utilization of machine learning (ML) methods that can handle multiple diverse inputs could strengthen predictive abilities and improve patient outcomes. Inpatient length of stay (LOS) is one such outcome that serves as a surrogate for patient disease severity and resource utilization.
To develop a novel method to systematically rank, select, and combine ML algorithms to build a model that predicts LOS following craniotomy for brain tumor.
A training dataset of 41 222 patients who underwent craniotomy for brain tumor was created from the National Inpatient Sample. Twenty-nine ML algorithms were trained on 26 preoperative variables to predict LOS. Trained algorithms were ranked by calculating the root mean square logarithmic error (RMSLE) and top performing algorithms combined to form an ensemble. The ensemble was externally validated using a dataset of 4592 patients from the National Surgical Quality Improvement Program. Additional analyses identified variables that most strongly influence the ensemble model predictions.
The ensemble model predicted LOS with RMSLE of .555 (95% confidence interval, .553-.557) on internal validation and .631 on external validation. Nonelective surgery, preoperative pneumonia, sodium abnormality, or weight loss, and non-White race were the strongest predictors of increased LOS.
An ML ensemble model predicts LOS with good performance on internal and external validation, and yields clinical insights that may potentially improve patient outcomes. This systematic ML method can be applied to a broad range of clinical problems to improve patient care 30).
A systematic search was performed in the PubMed and Embase databases as of August 2016 to review all studies comparing the performance of various ML approaches with that of clinical experts in neurosurgical literature.
Twenty-three studies were identified that used ML algorithms for diagnosis, presurgical planning, or outcome prediction in neurosurgical patients. Compared to clinical experts, ML models demonstrated a median absolute improvement in accuracy and area under the receiver operating curve of 13% (interquartile range 4-21%) and 0.14 (interquartile range 0.07-0.21), respectively. In 29 (58%) of the 50 outcome measures for which a P-value was provided or calculated, ML models outperformed clinical experts (P < .05). In 18 of 50 (36%), no difference was seen between ML and expert performance (P > .05), while in 3 of 50 (6%) clinical experts outperformed ML models (P < .05). All 4 studies that compared clinicians assisted by ML models vs clinicians alone demonstrated a better performance in the first group.
Senders et al., conclude that ML models have the potential to augment the decision-making capacity of clinicians in neurosurgical applications; however, significant hurdles remain associated with creating, validating, and deploying ML models in the clinical setting. Shifting from the preconceptions of a human-vs-machine to a human-and-machine paradigm could be essential to overcome these hurdles 31).
Lazaridis et al., and others, have developed predictive models based on machine learning from continuous time series of intracranial pressure and partial pressure of brain tissue oxygen. These models provide accurate predictions of physiologic crises events in a timely fashion, offering the opportunity for an earlier application of targeted interventions.They review the rationale for prediction, discuss available predictive models with examples, and offer suggestions for their future prospective testing in conjunction with preventive clinical algorithms 32).
Machine learning (ML) is a domain of artificial intelligence that allows computer algorithms to learn from experience without being explicitly programmed.
To summarize neurosurgical applications of ML where it has been compared to clinical expertise, here referred to as “natural intelligence.”
Two important and rapidly developing scientific movements—data reproducibility and machine learning—are central to a recent Neuron paper by Chung et al 33)
A systematic search was performed in the PubMed and Embase databases as of August 2016 to review all studies comparing the performance of various ML approaches with that of clinical experts in neurosurgical literature.
Twenty-three studies were identified that used ML algorithms for diagnosis, presurgical planning, or outcome prediction in neurosurgical patients. Compared to clinical experts, ML models demonstrated a median absolute improvement in accuracy and area under the receiver operating curve of 13% (interquartile range 4-21%) and 0.14 (interquartile range 0.07-0.21), respectively. In 29 (58%) of the 50 outcome measures for which a P -value was provided or calculated, ML models outperformed clinical experts ( P < .05). In 18 of 50 (36%), no difference was seen between ML and expert performance ( P > .05), while in 3 of 50 (6%) clinical experts outperformed ML models ( P < .05). All 4 studies that compared clinicians assisted by ML models vs clinicians alone demonstrated a better performance in the first group.
They conclude that ML models have the potential to augment the decision-making capacity of clinicians in neurosurgical applications; however, significant hurdles remain associated with creating, validating, and deploying ML models in the clinical setting. Shifting from the preconceptions of a human-vs-machine to a human-and-machine paradigm could be essential to overcome these hurdles 34).
Yepes-Calderon et al. presented a segmentation strategy based on an algorithm that uses four features extracted from the medical images to create a statistical estimator capable of determining ventricular volume. When compared with manual segmentations, the correlation was 94% and holds promise for even better accuracy by incorporating the unlimited data available. The volume of any segmentable structure can be accurately determined utilizing the machine learning strategy presented and runs fully automatically within the PACS 35).