NUIG at SemEval-2020 Task 12: Pseudo labelling for offensive content classification

    Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

    2 Citations (Scopus)

    Abstract

    This work addresses the classification problem defined by sub-task A (English only) of the OffensEval 2020 challenge. We used a semi-supervised approach to classify given tweets into an offensive (OFF) or not-offensive (NOT) class. As the OffensEval 2020 dataset is loosely labelled with confidence scores given by unsupervised models, we used last year's offensive language identification dataset (OLID) to label the OffensEval 2020 dataset. Our approach uses a pseudo-labelling method to annotate the current dataset. We trained four text classifiers on the OLID dataset and the classifier with the highest macro-averaged F1-score has been used to pseudo label the OffensEval 2020 dataset. The same model which performed best amongst four text classifiers on OLID dataset has been trained on the combined dataset of OLID and pseudo labelled OffensEval 2020. We evaluated the classifiers with precision, recall and macro-averaged F1-score as the primary evaluation metric on the OLID and OffensEval 2020 datasets.

    Original languageEnglish
    Title of host publicationCOLING 2020 - The International Workshop on Semantic Evaluation, Proceedings of the 14th Workshop
    EditorsAurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
    PublisherInternational Committee for Computational Linguistics
    Pages1598-1604
    Number of pages7
    ISBN (Electronic)9781952148316
    DOIs
    Publication statusPublished - 2020
    Event14th International Workshops on Semantic Evaluation, SemEval 2020 - Virtual, Online, Spain
    Duration: 12 Dec 202013 Dec 2020

    Publication series

    Name14th International Workshops on Semantic Evaluation, SemEval 2020 - co-located 28th International Conference on Computational Linguistics, COLING 2020, Proceedings

    Conference

    Conference14th International Workshops on Semantic Evaluation, SemEval 2020
    Country/TerritorySpain
    CityVirtual, Online
    Period12/12/2013/12/20

    Fingerprint

    Dive into the research topics of 'NUIG at SemEval-2020 Task 12: Pseudo labelling for offensive content classification'. Together they form a unique fingerprint.

    Cite this