====== Sentiment analysis ====== Sentiment [[analysis]] (also known as opinion mining or emotion AI) refers to the use of [[natural language processing]], text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. ---- Sentiment analysis is the principal tool to classify [[customer feedback]] as positive or negative. There are generic algorithms available for sentiment analysis that are independent of the type of text. Sentiment analysis tries to extract from the text how people feel on products and services ((Hu M, Liu B. Mining opinion features in customer reviews. In: Proceedings of the national conference on artificial intelligence; 2004. p. 755-760.)) ((Grinvald B. What Is Sentiment Analysis and how to perform one well. https://www.revuze.it/blog/sentimentanalysis/. Accessed 16 May 2020.)) ---- Usually these [[feedback]]s are gathered through the [[internet]], whether it’s in specific [[blog]], in a [[Twitter]] thread, or in a [[Facebook]] post. Companies have a strong interest in intercepting these online “conversations,” so that they can learn more about their customers and users, as well as the customers of their competitors in the market ((TextBlob, Simplified Text Processing. https://textblob. readthedocs.io/en/dev/. Accessed 4 April 2020.)) ---- Automated sentiment analysis is helpful for analyzing large amounts of [[feedback]]s, which cannot be managed manually anymore. Most often sentiment analysis works by identifying certain key words in the texts allowing to classify the statement as “positive,” “negative,” or “neutral.” e.g. in a restaurant review the phrase “The food was delicious!” can be easily classified as strongly positive, while “The service sucked” will be identified as a strongly negative comment. Thanks to a library of positive and negative words and expressions, an [[algorithm]] can identify nouns, verbs, adjectives and adverbs in these texts and recognize that for example delicious is an indicator of a positive reaction, while disturbing is an indicator of a negative perception. The problem is that not all statements are that straightforward, sometimes positive and negative aspects are intermingled, and sometimes negative aspects are worded in a cryptic or sarcastic way. Sentiment analysis assesses a “score” of the entire text, the so-called polarity, placing it on a spectrum of attitudes that goes between +1 (totally positive) and -1 (totally negative). Similarly, to the overall assessment, sentiment analysis can extract information on the subjectivity of a statement. For example, “I hated it” would be classified as a highly subjective statement, while “Temperature was 3 °C” would be interpreted as little subjective. Subjectivity is quantified with a score ranging from 0 to 1. Machine learning in general allows for classifying data into specifically predefined classes based on manually annotated training data. The algorithm then can identify by itself key features specific of the defined classes. For text analysis, the words used and their frequencies commonly represent the entry data. This method is called the “bag of words” method ((Zhou V.A simple explanation of the Bag-of-Words model. https://towardsdatascience.com/a-simple-explanation- of-the-bag-of-words-model-b88fc4f4971. Accessed 16 May 2020.)). ---- Noncommercial tools for [[natural language processing]] are currently provided by a number of [[platform]]s. Text Blob module for sentiment analysis, which is based on the Natural Language Tool Kit (NLTK) ((Natural Language Toolkit, NLTK 3.5. documentation. https://www.nltk.org/. Accessed 16 May 2020.)) ---- [[Text mining]] with automatic extraction of key features is gaining increasing importance in science and particularly medicine due to the rapidly increasing number of [[publication]]s. Fischer et al. evaluated the current potential of [[sentiment analysis]] and [[machine learning]] to extract the importance of the reported results and [[conclusion]]s of [[randomized trial]]s on stroke. PubMed abstracts of 200 recent reports of randomized trials were reviewed and manually classified according to the estimated importance of the studies. Importance of the papers was classified as "game changer", "suggestive", "maybe" "negative result". Algorithmic sentiment analysis was subsequently used on both the "Results" and the "Conclusions" paragraphs, resulting in a numerical output for polarity and subjectivity. The result of the human assessment was then compared to polarity and subjectivity. In addition, a neural network using the Keras platform built on Tensorflow and Python was trained to map the "Results" and "Conclusions" to the dichotomized human assessment (1: "game changer" or "suggestive"; 0:"maybe" or "negative", or no results reported). 120 abstracts were used as the training set and 80 as the test set. Results: 9 out of the 200 reports were classified manually as "game changer", 40 as "suggestive", 73 as "maybe" and 32 and "negative"; 46 abstracts did not contain any results. Polarity was generally higher for the "Conclusions" than for the "Results". Polarity was highest for the "Conclusions" classified as "suggestive". Subjectivity was also higher in the classes "suggestive" and "maybe" than in the classes "game changer" and "negative". The trained neural network provided a correct dichotomized output with an accuracy of 71% based on the "Results" and 73% based on "Conclusions" . Conclusions: Current statistical approaches to text analysis can grasp the impact of scientific medical abstracts to a certain degree. Sentiment analysis showed that mediocre results are apparently written in more enthusiastic words than clearly positive or negative results ((Fischer I, Steiger HJ. Toward automatic evaluation of medical abstracts: The current value of sentiment analysis and machine learning for classification of the importance of PubMed abstracts of randomized trials for stroke. J Stroke Cerebrovasc Dis. 2020 Sep;29(9):105042. doi: 10.1016/j.jstrokecerebrovasdis.2020.105042. Epub 2020 Jun 23. PMID: 32807454.)).