Natural language processing
Natural Language Processing (NLP) is a field of artificial intelligence (AI) and computational linguistics that focuses on the interaction between computers and human language. Its goal is to enable machines to understand, interpret, generate, and respond to natural language in a meaningful way.
Key Components
1. Morphological and Syntactic Analysis: Processes the structure of words and sentences (tokenization, lemmatization, part-of-speech tagging).
2. Semantic Analysis: Determines the meaning of text (word sense disambiguation, named entity recognition).
3. Pragmatic Analysis: Considers context and intent in communication.
4. Speech Processing: Converts speech to text and vice versa.
5. Natural Language Generation (NLG): Creates human-readable text from structured data.
Applications of NLP
- Chatbots and Virtual Assistants (Siri, Alexa, ChatGPT).
- Machine Translation (Google Translate).
- Sentiment Analysis in social media and surveys.
- Text Correction and Autocompletion (Grammarly, spell checkers).
- Automatic Summarization of Documents.
- Information Extraction from Large Text Volumes.
NLP combines techniques from computational linguistics, machine learning, and deep neural networks to enhance accuracy and efficiency in processing human language.
Decision support systems (DSSs) for suggesting optimal low back pain treatment (LBP) are currently insufficiently accurate for clinical application. Most of the input provided to train these systems is based on patient-reported outcome measures. However, with the appearance of electronic health records (EHRs), additional qualitative data on reasons for referrals and patients' goals become available for DSSs. Currently, no decision support tools cover a wide range of biopsychosocial factors, including referral letter information to help clinicians triage patients to the optimal LBP treatment.
The objective of the study was to investigate the added value of including qualitative data from EHRs and referral letters to the accuracy of a quantitative DSS for patients with LBP.
A retrospective study was conducted in a clinical cohort of Dutch patients with LBP. Patients filled out a baseline questionnaire about demographics, pain, disability, work status, quality of life, medication, psychosocial functioning, comorbidity, history, and duration of pain. Referral reasons and patient requests for help (patient goals) were extracted via natural language processing (NLP) and enriched in the data set. For decision support, these data were considered independent factors for triage to neurosurgery, anesthesiology, rehabilitation, or minimal intervention. Support vector machine, k-nearest neighbor, and multilayer perceptron models were trained for 2 conditions: with and without consideration of the referral letter content. The models' accuracies were evaluated via F1-scores, and confusion matrices were used to predict the treatment path (out of 4 paths) with and without additional referral parameters.
Data from 1608 patients were evaluated. The evaluation indicated that 2 referral reasons from the referral letters (for anesthesiology and rehabilitation intervention) increased the F1-score accuracy by up to 19.5% for triaging. The confusion matrices confirmed the results.
This study indicates that data enriching by adding NLP-based extraction of the content of referral letters increases the model accuracy of DSSs in suggesting optimal treatments for individual patients with LBP. Overall model accuracies were considered low and insufficient for clinical application 1).
There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pre-trained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest of which trained in the clinical domain is comparatively small at 110 million parameters (compared with billions of parameters in the general domain). It is not clear how large clinical language models with billions of parameters can help medical AI systems utilize unstructured EHRs 2).
A significant portion of data in Electronic Health Records is only available as unstructured text, such as surgical or finding reports, clinical notes, and discharge summaries. To use this data for secondary purposes, natural language processing (NLP) tools are required to extract structured information. Furthermore, for interoperable use, data harmonization. HL7 Fast Healthcare Interoperability Resources (FHIR), an emerging standard for exchanging healthcare data, defines such a structured format. For German-language medical NLP, the tool Averbis Health Discovery (AHD) represents a comprehensive solution. AHD offers a proprietary REST interface for text analysis pipelines. To build a bridge between FHIR and this interface, we created a service that translates the communication around AHD from and to FHIR. The application is available under an open-source license 3).
Noncommercial tools for natural language processing are currently provided by a number of platforms.
Text Blob module for sentiment analysis, is based on the Natural Language ToolKit (NLTK) 4)
Surgical site infections are a major driver of morbidity and increased costs in the postoperative period after spine surgery. Current tools for surveillance of these adverse events rely on prospective clinical tracking, manual retrospective chart review, or administrative procedural and diagnosis codes.
The purpose of a study was to develop natural language processing (NLP) algorithms for automated reporting of postoperative wound infection requiring reoperation after lumbar discectomy.
Adult patients undergoing discectomy at two academic and three community medical centers between January 1st, 2000 and July 31st, 2019 for lumbar disc herniation.
Reoperation for wound infection within 90-days after surgery METHODS: Free-text notes of patients who underwent surgery from January 1st, 2000 to December 31st, 2015 were used for algorithm training. Free-text notes of patients who underwent surgery after January 1st, 2016 were used for algorithm testing. Manual chart review was used to label which patients had reoperation for wound infection. An extreme gradient-boosting NLP algorithm was developed to detect reoperation for postoperative wound infection.
Overall, 5860 patients were included in this study and 62 (1.1%) had a reoperation for wound infection. In patients who underwent surgery after January 1st, 2016 (n = 1377), the NLP algorithm detected 15 of the 16 patients (sensitivity = 0.94) who had reoperation for infection. In comparison, current procedural terminology (CPT) and international classification of disease (ICD) codes detected 12 of these 16 patients (sensitivity = 0.75). At a threshold of 0.05, the NLP algorithm had positive predictive value of 0.83 and F1-score of 0.88.
Temporal validation of the algorithm developed in this study demonstrates a proof-of-concept application of NLP for automated reporting of adverse events after spine surgery. Adapting this methodology for other procedures and outcomes in spine and orthopaedics has the potential to dramatically improve and automatize quality and safety reporting 5).
Accurate prediction of outcomes among patients in intensive care units (ICUs) is important for clinical research and monitoring care quality. Most existing prediction models do not take full advantage of the electronic health record, using only the single worst value of laboratory tests and vital signs and largely ignoring information present in free-text notes. Whether capturing more of the available data and applying machine learning and natural language processing (NLP) can improve and automate the prediction of outcomes among patients in the ICU remains unknown.
To evaluate the change in power for a mortality prediction model among patients in the ICU achieved by incorporating measures of clinical trajectory together with NLP of clinical text and to assess the generalizability of this approach.
This retrospective cohort study included 101 196 patients with a first-time admission to the ICU and a length of stay of at least 4 hours. Twenty ICUs at 2 academic medical centers (University of California, San Francisco [UCSF], and Beth Israel Deaconess Medical Center [BIDMC], Boston, Massachusetts) and 1 community hospital (Mills-Peninsula Medical Center [MPMC], Burlingame, California) contributed data from January 1, 2001, through June 1, 2017. Data were analyzed from July 1, 2017, through August 1, 2018. : In-hospital mortality and model discrimination as assessed by the area under the receiver operating characteristic curve (AUC) and model calibration as assessed by the modified Hosmer-Lemeshow statistic.
Among 101 196 patients included in the analysis, 51.3% (n = 51 899) were male, with a mean (SD) age of 61.3 (17.1) years; their in-hospital mortality rate was 10.4% (n = 10 505). A baseline model using only the highest and lowest observed values for each laboratory test result or vital sign achieved a cross-validated AUC of 0.831 (95% CI, 0.830-0.832). In contrast, that model augmented with measures of clinical trajectory achieved an AUC of 0.899 (95% CI, 0.896-0.902; P < .001 for AUC difference). Further augmenting this model with NLP-derived terms associated with mortality further increased the AUC to 0.922 (95% CI, 0.916-0.924; P < .001). These NLP-derived terms were associated with improved model performance even when applied across sites (AUC difference for UCSF: 0.077 to 0.021; AUC difference for MPMC: 0.071 to 0.051; AUC difference for BIDMC: 0.035 to 0.043; P < .001) when augmenting with NLP at each site.
Intensive care unit mortality prediction models incorporating measures of clinical trajectory and NLP-derived terms yielded excellent predictive performance and generalized well in this sample of hospitals. The role of these automated algorithms, particularly those using unstructured data from notes and other sources, in clinical research and quality improvement seems to merit additional investigation 6).