Arabic Hands-On Analysis, Clustering and Classification of Large Arabic Twitter Data Set on COVID19

Abdelrahman Hamdy (Arab Open University-Egypt, Egypt); Ayman Mahgoub (Electronics Research Institute, Egypt); Conor Ryan (University of Limerick, Ireland)

Coronavirus is one of these diseases that has huge impact on the world around us not only for the people injured with the Coronavirus, but that the whole population life has changed in the context of the work, money incoming, economy, and the spread of data on social media related to Coronavirus. In this work, we have combined different machine learning models to classify tweets that talked about Coronavirus and others that talk about a different topic, but, because of the millions of tweets collected based on some keywords like coronavirus and other words. In our result, based on the different analysis and models, we have seen that more than 50% of these tweets are talking about other topics, not on the Coronavirus. The result we got will give researchers insights about the large Arabic data set we are working with. the conclusion of the work that we should not relay on general worlds when searching for tweets related to specific topic. as million of users on Twitter and other social media are also expressing there opinion using similar words that related to the specific event.

Journal: International Journal of Simulation- Systems, Science and Technology- IJSSST V22

Published: Apr 1, 2021

DOI: 10.5013/IJSSST.a.22.01.06