Framework of Hate Speech Identification for Formal and Informal Text Using Lexical Approach
DOI:
https://doi.org/10.53555/ks.v12i1.3249Keywords:
NLP, Hate Speech, Toxic Speech, Roman Urdu, Lexicon Based ApproachAbstract
Social media refers to digital platforms and online venues where individuals and organizations share dynamic content and broadcast information. Through this dynamic virtual environment, prominent social networking sites such as Facebook, Instagram, Twitter, and YouTube allow users to produce, share, and trade different types of multimedia material such as text, photographs, videos, and links. Sentiment analysis is a digital process for determining and categorizing the emotional tone of textual material on social networking sites such as messages, comments, or tweets. It is also observed that this problem is extremely significant in the field of Natural Language Processing (NLP). Hate speech or Toxic speech is described in this context as language comprising hostile attitudes, insulting statements, and destructive intents directed against a person or a group of individuals. In this study, we used a lexicon-based approach at the sentence level to detect toxic speech in bilingual text specially published in English (Formal) and Roman-Urdu (informal) text. Moreover, in this study, we concentrated on three areas in particularly; race, religion, and nationality. We extracted our dataset from Twitter via the Twitter API, comprised of 3,030 tweets, 1,010 of which are relevant to each of the aforementioned domains. The proposed Framework attained outstanding average accuracy for race, religion, and nationality domains of 92.52%, 93.03%, and 93.35%, respectively.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Husnain Saleem, Muhammad Javed, Syed Muhammad Ali Haider, Hamid Masood Khan, Muhammad Ahmad Jan, Asad Ullah

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.