Contrastive Learning-Enhanced BERT Models for Hate Speech Detection in Marathi and Telugu

Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

Abstract

Homophobia and transphobia are pervasive issues in online platforms, manifesting as hate speech directed towards LGBTQ+ individuals. Identifying and mitigating such toxic language is crucial for creating safer online spaces especially in low resource Indic languages. This work focuses on the task of detecting homophobia, transphobia, and non-anti-LGBT+ content in YouTube comments, which are annotated at the comment/post level in Marathi and Telugu languages. We employed pre-trained and fine-tuned versions of BERT models on Marathi and Telugu data. These models were further fine-tuned with contrastive learning objectives to enhance their discriminatory power. For Marathi data, the MahaBERT model, combined with Supervised Contrastive Learning (SupCon), achieved an accuracy of 70.53%, a precision of 52.48%, a recall of 59.21%, and an F1-score of 54.52%. For Telugu data, the TeluguBERT model with SupCon achieved superior performance, with an accuracy of 96.90%, a precision of 96.95%, a recall of 96.98%, and an F1-score of 96.96%.

Original languageEnglish
Title of host publicationFIRE 2024 - Proceedings of the 16th Annual Meeting of the Forum for Information Retrieval Evaluation
EditorsDebasis Ganguly, Debarshi Kumar Sanyal, Prasenjit Majumder, Srijoni Majumdar, Surupendu Gangopadhyay
PublisherAssociation for Computing Machinery
Pages48-54
Number of pages7
ISBN (Electronic)9798400713187
DOIs
Publication statusPublished - 24 Jul 2025
Event16th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2024 - Gandhinagar, India
Duration: 12 Dec 202415 Dec 2024

Publication series

NameACM International Conference Proceeding Series

Conference

Conference16th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2024
Country/TerritoryIndia
CityGandhinagar
Period12/12/2415/12/24

Keywords

  • Contrastive Learning
  • Hate Speech
  • Homophobia
  • Large Language Models
  • Transphobia

Fingerprint

Dive into the research topics of 'Contrastive Learning-Enhanced BERT Models for Hate Speech Detection in Marathi and Telugu'. Together they form a unique fingerprint.

Cite this