… Sentiment Analysis of Public Tweets Towards the Emergence of SARS-CoV-2 Omicron Variant: A Social Media Analytics Framework

-While different variants of COVID-19 dramatically affected the lives of millions of people across the globe, a new version of COVID-19, "SARS-CoV-2 Omicron," emerged. This paper analyzes the public attitude and sentiment towards the emergence of the SARS-CoV-2 Omicron variant on Twitter. The proposed approach relies on the text analytics of Twitter data considering tweets, retweets, and hashtags' main themes, the pandemic restriction, the efficacy of covid-19 vaccines, transmissible variants, and the surge of infection. A total of 18,737 tweets were pulled via Twitter Application Programming Interface (API) from December 3, 2021, to December 26, 2021, using the SentiStrength software that employs a lexicon of sentiment terms and a set of linguistic rules. The analysis was conducted to distinguish and codify subjective content and estimate the strength of positive and negative sentiment with an average of 95% confidence intervals based upon emotion strength scales of 1-5. It is found that negativity was dominated after the outbreak of Omicron and scored 31.01% for weak, 16.32% for moderate, 5.36% for strong, and 0.35% for very strong sentiment strength. In contrast, positivity decreased gradually and scored 16.48% for weak, 11.19% for moderate, 0.80% for strong, 0.04% for very strong sentiment strength. Identifying the public emotional status would help the concerned authorities to provide appropriate strategies and communications to relieve public worries towards pandemics.


INTRODUCTION
With the emergence of social media, using machine and deep learning for sentiment analysis attract many researchers because it is scalable to process and analyze big data. Machine Learning (ML) algorithms learn the hidden patterns of the data and can predict the class labels of unknown samples. Thus, ML is widely applied in sentiment analysis to predict public sentiments [1,2]. COVID-19 emerged as an infectious disease that created a global crisis that dramatically affected the world in different sectors like education [3], health, economy, etc. Amidst the crisis of coronavirus, new mutations of COVID-19 emerged, such as the Beta, Delta, and Omicron variants [4], spreading panic and fear in people. The SARS-CoV-2 omicron variant was first detected in South Africa on November 24, 2021, and has spread to more than 57 countries [5].
The impact of the SARS-CoV-2 virus is still a source of concern globally. Many countries announced an acceleration of booster jab rollouts, and the people fear that the variant may destabilize the efficacy of COVID-19 vaccines. Many countries sealed their borders to foreign visitors. Measuring the public's sentiments and opinions towards the Omicron version is important, not only to give us a clear picture of their sentiments, but also to explain whether the public attitude and awareness towards this variant are affected by their earlier experiences with COVID-19. In addition, this evaluation would provide the policymakers with the actual public sentiments and enable them to evaluate their earlier proposed strategies, recommendations, and effectiveness messages and modify them according to the current condition. The disseminated information on the COVID-19 pandemic gave birth to various mental and psychological concerns for social media users [6].
The impact of the Omicron pandemic may affect the public in the same way and return the earlier scenario of the COVID-19 pandemic. In the beginning, people in different countries were more worried than before, as the early indication is that Omicron is more transmissible than other COVID-19 variants, and there is a devastating consequence in its infection rate [7]. In addition, it is more capable than Delta of preventing the immune defense of both the vaccinated and the previously infected people. The publics' concerns were highly raised after the World Health Organization (WHO) reports about the Omicron rapid outbreak. People and health officials started sharing and expressing their opinions, emotions, clarifications, and recommendations on social media and social networking sites. The main concern of social media analytics is to collect data using various methods, processes and understand, summarize, and visualize the output [8].
Social media surveillance can systematically monitor public emotions and reactions to epidemical events in real-time [9]. These expressions comprise rich and open data for researchers, especially for survey and classification studies such as sentiment analysis or opinion mining. Sentiment analysis processes and identifies the data of certain domains through natural language processing. Sentiment analysis using social media plays a crucial role in various fields such as social development, people's awareness, economic development [10][11][12]. Indeed, many studies applied ML, and Deep Learning (DL) methods and algorithms to explore and investigate the public's sentiments on social media platforms towards the outbreak of COVID-19 and its emerging variants [13][14][15]. However, no recent studies have detected public sentiments towards the emergence of the omicron variant. Only a few articles and reports such as [16,17] investigate Omicron's situation as it emerged recently. To the best of our knowledge, this study is the first to reveal and explore the public feelings and views towards the SARS-CoV-2 Omicron variant worldwide through the sentiment analysis of social media, mainly Twitter. The social media analytics SentiStrength software and Voyant-tools were utilized to analyze the data. This study gives us a clear picture of the public's sentiment towards this pandemic which would help the authorities provide appropriate information to relax and ease the public's panic. Moreover, it helps the government address future health emergencies, including transmittable diseases, and provide better healthcare.

II. RELATED WORK
This study aims to explore and mine the publics' opinion and find out the percentage of the positivity and negativity toward the new Covid-19 Omicron variant. As the new variant emerged on November 24, 2021, a few articles and reports are investigating Omicron's situation. Authors in [18] commented on the Omicron variant's implications for transmission, cure, and diagnosis. Their recommendation is to continue in the cautions of wearing masks, social distancing, vaccination, etc. Authors in [19] studied the potential prediction of the Omicron variant and provided some recommendations which can be used to protect people from the Omicron virus. Authors in [20] studied the detection of public opinion about the effectiveness of vaccination during the Omicron variant outbreak. The data were collected from YouTube comments of English news channels. They classified the comments into positive, negative, and neutral using Vader and TextBlob tools. They applied the SVM algorithm to analyze the data. The results scored 63% accuracy in TextBlob and 70% with Vader. Authors in [16] studied the previous Covid-19 vaccines' efficacy against the SARS-CoV-2 omicron variant. The study concluded that there is no or limited protection against the symptoms of the Omicron variant by using some previous cures used for the other Covid-19 variants [17]. Many studies investigated the public sentiment since the COVID-19 emergence. Authors in [21] conducted a study about the outbreak of COVID-19 in Italy. The interest of the research was to predict the disease evolution in the country. The authors collected their data from the official channels declaring the number of infected people in different Italian provinces. The model of the spatio-temporal distribution of COVID-19 was used. An endemic-epidemic multivariate time-series mixed-effects generalized model has been used for counting to understand the spatio-temporal diffusion of the disease. The study results were divided into three phases. The first was related to the outbreak of COVID-19 over time. The second was devoted to the transmittance of the disease among the people of the same district. And the third was concerned with the spatial neighborhood and the main reasons for the contagion effect. They also found that strict control measures in some districts effectively limit contagion and disease outbreaks. A considerable amount of the literature has been published on the public's attitudes towards the COVID-19 pandemic. To the best of our knowledge, no previous study has investigated the reaction of the people to the new variant of the pandemic in social media, especially Twitter, and this is the gap this study aims to fill.

III. METHOD AND DATA ACQUISITION
This section presents the construction of the harvested data and the proposed method. The tweets were collected by searching keywords submitted as seeds to Twitter via its API, likely about COVID-19 Omicron variant from December 4, 2021, to December 26, 2021, using the social media analytics SentiStrength software [22]. The quoted queries were "Omicron", "COVID-19 variant Omicron", "sars-cov-2", "COVID-19". The data were filtered out for cleaning and removing the duplicates, getting 18,737 posts generated by 15,388 post authors (2400 female, 3759 male, 12578 unknown) distributed across countries (56.9% None, 23.1% USA, 14.7% UK, 9.0% Australia, 7.1% Canada). The 'tweets' data table comprise 'id', 'date', 'tweet', 'URL', 'username', 'outlinks', 'likeCount', 'retweetCount', 'replyCounnt', and 'quateCount'. After collecting the data, we extracted, analyzed, and visualized them using a combination of tools: SentiStrength, Mozdeh, and Voyant-tools (free, web-based text analytics tools). We analyzed to distinguish and codify subjective content and estimate the strength of positive and negative sentiment in the data that employs a lexicon of sentiment terms and a set of linguistic rules (e.g. for idioms, negation, and booster words). SentiStrenght library were used to compute the polarity of the word class of the harvested data. The strength range of positive sentiments varies from 1 to 5, where 1 means not positive and 5 means extremely or very strong positive. At the same time, the strength range of negative sentiment is from -1 to -5, where -1 means not negative, and -5 means extremely or strong negative. Any tweet with [-1, 1] score was considered to not show any sentiment, so it was categorized as neutral sentiment. SentiStrength was chosen for accuracy approaching humanlevel, and its dual system lets negative sentiments be investigated independently from positive sentiments, something that is essential for the research goal [22,24]. The Voyant-tools were employed to visualize some data.  Figure 2 illustrates the time-series graph of the collected tweets. Figure 2(a) displays the overall trends and spikes of the tweets about the Omicron variant. Intuitively, the general direction gives us helpful background information about topic interest and the period when the Twitter users discuss it. It shows a gradual increase in the overall trend. The highest number of tweets were posted at the beginning of the second week of December 2021, mainly on 8 th , with 6087 posted tweets. This spike is followed by a steady lower level of activity and an increase in the trend in the fourth week of December. Figure 2(b) represents the average post sentiment from 1 to 5, and the most down green line represents the proportion of subjective texts. The thick black line is the average negative sentiment strength, and the thick red line is the average positive sentiment strength. The thinner grey and pink lines are the same but just for the subjective texts (i.e. positive, or negative sentiment > 1). The sentiment data is bucketed into a minimum of 20 data points for smoothing. It is noted that the two sentiment polarities are close, with an apparent increase in negative sentiment during the peak. Time series graph for the proportion of (a) tweet volume and (b) sentiment containing the word Omicron. Figure 3 shows the sparkline graph, representing the mentioned terms' distribution with linear data segments. It indicates the sudden increase in the volume of the tweets. Zscore is a normalized value for the terms' raw frequency compared to the other term frequencies in the same document.  The sentiment strength scores of the posted tweets during the Omicron spread are illustrated in Figure 4. It is indicated that most public sentiments were neutral, as most of the posted tweets scored 1, which means no positive or negative sentiments were expressed because zeros were not used. Positivity decreased for the weak, moderate, strong, and very strong sentiment strength scores. At the same time, negativity increased gradually and became dominant, which means people started to show significant concerns about the Omicron variant. Figure 5 shows the significant decrease of positive sentiments to 0.04%, which is very positive after a few days of the outbreak. In contrast, negative sentiment was dominated and scored 0.35% for very negative.  Figure 6 illustrates the trends and spikes of Omicron-related tweets in countries like the UK, USA, KSA, India, Canada, Australia, Germany, and China. It is noted that the curves show an increase in the number of tweets during Dec 2021, almost from 4 th to 25 th December 2021, except in KSA (Kingdom of Saudi Arabia). This reflects the people's high interest in Omicron due to their previous experience with COVID-19. There are millions of expatriates who live in Saudi Arabia. Their native language isn't Arabic, and they use English on social media platforms. English is the first or second language in all other selected countries. Interestingly, in China, the posts on Omicron increased and covered a more comprehensive range of time during the study period. This reflects the high Chinese interest in Omicron due to their experience with COVID-19. Millions in China are using Twitter and they tweet and retweet using the English language. In contrast, KSA shows less interest in the Omicron variant. The bubble chart cross shows the classification of the texts for positive and negative sentiment across different countries and how positivity and negativity are associated with each other. The positive and negative sentiment scores are presented in the bubbles. As shown in Figure 7, the sentiment in the listed countries is very negative with a score of 5, while there is no very positive sentiment with a score of 5 except the USA. It is found that in KSA, public sentiment was almost neutral, and no significant concerns were monitored. This may prove the positive effect of vaccination on physical and mental health [25]. Due to the considerable role of the Saudi government in encountering the COVID-19 outbreak, the positive and neutral sentiment is highly noticed. Distribution of Omicron-related tweets across countries. Table I represents the top frequent terms, including the hashtags associated with the specified sentiment range. It illustrates the occurrence of the word "Omicron" and all other related words more often in tweets containing Omicron. The pMatch rate displays the proportion of Omicron tweets, including the term Omicron 336.4%, variant 142.9%, vaccine 75%, etc. The list of the most frequent words related to the SARS-CoV-2 Omicron variant is ranked according to the match and order of importance. Words that occur most often in the text match the search and filters compared to the remaining texts, and they are listed according to their statistical significance. The chi-square value represents the association between the listed term and searches with filters. The percentage of texts that don't match the search but contain the word, the most frequent words, and their statistical significance can also be seen.  Table II shows the overall sentiment average alongside 95% confidence intervals in the first phase from 3 rd to 10 th December 2021 and the second phase from 11 th to 25 th December 2021. For the positive sentiment of the first set, the average is higher, and the difference is statistically significant since the confidence intervals are (1.3963, 1.4255). The second set score is (1.3527, 1.4193), there is no overlapping between the two data sets. People tended to tweet more positively at the beginning of the Omicron variant emergence. The second phase is higher for the negative sentiment, and the difference is statistically significant, since the confidence intervals are (1.8712, 1.9541). The score of the first phase is (1.8139, 1.8517) and the difference is statistically significant between the two groups. It is reasonably straightforward and statistical evident that people tend to tweet more negatively after the outbreak of the Omicron variant reached at least 57 countries.  in omicron variants worldwide was detected at the beginning of the second week of December 2021, as many cases have been detected in this period. Many countries set restrictive measures to control the outbreak of Omicron and minimize its subsequent effects as this variant is highly transmissible even among fully vaccinated people [26]. The results indicate that the polarized tweets concerning the new variant are inevitable as they started with negative and increased to very negative after 10 days of the Omicron variant outbreak.

C. Top Frequent Terms and Hashtags
The results also exemplify a significantly higher negative share of tweets at the end of the month compared to the beginning. That is due to people's potential worry of the surge of infection, transmissible variant, the efficacy of covid-19 vaccines, etc. that corroborates the findings of [27]. It is noted that the z-score of the term "Omicron" is 3,602.925 and is the highest frequent term used by social media users in their tweets and comments, compared with other related words such as "vaccine," which scored 404.925. These scores indicate that the social media users believe that the Omicron variant may expose their life and health to risk. The sentiment analysis conducted in this study clarifies the emotion of the lay audience and how the pandemic negatively influences their language and thoughts towards the new variant of COVID-19. This conclusion is consistent with [28,29]. Indeed, the analysis exposes that the publics' emotions can contribute to relaxing their worries and concerns by the authorities and to an intense polarization of health care on social media. The concordance entries of the data.

VI. CONCLUSION AND FUTURE WORK
The purpose of the current study was to understand Twitter users' views and sentiments towards the emergence of the SARS-CoV-2 Omicron variant. The analysis conducted in this study has shown the emotion of the lay audience and how the pandemic negatively influences their language and thoughts towards the new variant of COVID-19. The study reveals a worldwide increase in the overall trend and a high spike in public interest and publications about the omicron variant on Twitter in the second week of December 2021. This variant is highly transmissible even among fully vaccinated individuals.
Many cases have been detected during this period, forcing many countries to set restrictive measures to control the outbreak and minimize its subsequent effects. It is found that the polarized tweets concerning the Omicron variant are inevitable as they started with negative and increased to very negative after 10 days.
The results also exemplify a significantly higher negative share of tweets at the end of the month compared to the beginning. That is due to people's potential worry of the surge of infection, transmissible variant, the efficacy of vaccines, etc. Sentiment analysis can contribute to an intense polarization of health care on social media. This study gives us a clear picture of the global public's sentiment towards this pandemic which would help authorities to provide timely information to relax and ease the public's concerns. Moreover, it helps the governments address future health crises involving infectious diseases earlier. This study has many limitations. First, the data were collected from one social media platform (Twitter) considering only English tweets. As an extension of the work, it would be interesting to investigate and assess the effect of the new variant of COVID-19 on peoples' emotions and attitudes in other social network platforms and different languages.