Framework of Hate Speech Identification for Formal and Informal Text Using Lexical Approach

Authors

  • Husnain Saleem
  • Muhammad Javed
  • Syed Muhammad Ali Haider
  • Hamid Masood Khan
  • Muhammad Ahmad Jan
  • Asad Ullah

DOI:

https://doi.org/10.53555/ks.v12i1.3249

Keywords:

NLP, Hate Speech, Toxic Speech, Roman Urdu, Lexicon Based Approach

Abstract

Social media refers to digital platforms and online venues where individuals and organizations share dynamic content and broadcast information. Through this dynamic virtual environment, prominent social networking sites such as Facebook, Instagram, Twitter, and YouTube allow users to produce, share, and trade different types of multimedia material such as text, photographs, videos, and links. Sentiment analysis is a digital process for determining and categorizing the emotional tone of textual material on social networking sites such as messages, comments, or tweets. It is also observed that this problem is extremely significant in the field of Natural Language Processing (NLP). Hate speech or Toxic speech is described in this context as language comprising hostile attitudes, insulting statements, and destructive intents directed against a person or a group of individuals. In this study, we used a lexicon-based approach at the sentence level to detect toxic speech in bilingual text specially published in English (Formal) and Roman-Urdu (informal) text. Moreover, in this study, we concentrated on three areas in particularly; race, religion, and nationality. We extracted our dataset from Twitter via the Twitter API, comprised of 3,030 tweets, 1,010 of which are relevant to each of the aforementioned domains. The proposed Framework attained outstanding average accuracy for race, religion, and nationality domains of 92.52%, 93.03%, and 93.35%, respectively.

Author Biographies

Husnain Saleem

Institute of Computing and Information Technology (ICIT), Gomal University, D.I. Khan, K.P.K, Pakistan.

Muhammad Javed

Institute of Computing and Information Technology (ICIT), Gomal University, D.I. Khan, K.P.K, Pakistan.

Syed Muhammad Ali Haider

Institute of Computing and Information Technology (ICIT), Gomal University, D.I. Khan, K.P.K, Pakistan.

Hamid Masood Khan

Institute of Computing and Information Technology (ICIT), Gomal University, D.I. Khan, K.P.K, Pakistan.

Muhammad Ahmad Jan

Institute of Computing and Information Technology (ICIT), Gomal University, D.I. Khan, K.P.K, Pakistan.

Asad Ullah

Institute of Computing and Information Technology (ICIT), Gomal University, D.I. Khan, K.P.K, Pakistan.

Downloads

Published

2024-01-15

How to Cite

Husnain Saleem, Muhammad Javed, Syed Muhammad Ali Haider, Hamid Masood Khan, Muhammad Ahmad Jan, & Asad Ullah. (2024). Framework of Hate Speech Identification for Formal and Informal Text Using Lexical Approach. Kurdish Studies, 12(1), 5079–5094. https://doi.org/10.53555/ks.v12i1.3249

Most read articles by the same author(s)