MaCmS: Magahi Code-mixed Dataset for Sentiment Analysis

Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

1 Citation (Scopus)

Abstract

The present paper introduces new sentiment data, MaCmS, for Magahi-Hindi-English (MHE) code-mixed languages, where Magahi is a less-resourced minority language. This dataset is the first Magahi-Hindi-English code-mixed dataset for sentiment analysis tasks. Further, we provide a linguistic analysis of the dataset to understand the structure of code-mixing and a statistical study to understand the language preferences of speakers with different sentiment categories. With these analyses, we also train baseline models to evaluate the dataset's quality.

Original languageEnglish
Title of host publication2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
EditorsNicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
PublisherEuropean Language Resources Association (ELRA)
Pages10880-10889
Number of pages10
ISBN (Electronic)9782493814104
Publication statusPublished - 2024
EventJoint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 - Hybrid, Torino, Italy
Duration: 20 May 202425 May 2024

Publication series

Name2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings

Conference

ConferenceJoint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
Country/TerritoryItaly
CityHybrid, Torino
Period20/05/2425/05/24

Keywords

  • Code-mixing
  • Less-resourced language
  • Magahi
  • Sentiment Analysis

Fingerprint

Dive into the research topics of 'MaCmS: Magahi Code-mixed Dataset for Sentiment Analysis'. Together they form a unique fingerprint.

Cite this