TY - GEN
T1 - NUIG at SemEval-2020 Task 12
T2 - 14th International Workshops on Semantic Evaluation, SemEval 2020
AU - Suryawanshi, Shardul
AU - Arcan, Mihael
AU - Buitelaar, Paul
N1 - Publisher Copyright:
© 2020 14th International Workshops on Semantic Evaluation, SemEval 2020 - co-located 28th International Conference on Computational Linguistics, COLING 2020, Proceedings. All rights reserved.
PY - 2020
Y1 - 2020
N2 - This work addresses the classification problem defined by sub-task A (English only) of the OffensEval 2020 challenge. We used a semi-supervised approach to classify given tweets into an offensive (OFF) or not-offensive (NOT) class. As the OffensEval 2020 dataset is loosely labelled with confidence scores given by unsupervised models, we used last year's offensive language identification dataset (OLID) to label the OffensEval 2020 dataset. Our approach uses a pseudo-labelling method to annotate the current dataset. We trained four text classifiers on the OLID dataset and the classifier with the highest macro-averaged F1-score has been used to pseudo label the OffensEval 2020 dataset. The same model which performed best amongst four text classifiers on OLID dataset has been trained on the combined dataset of OLID and pseudo labelled OffensEval 2020. We evaluated the classifiers with precision, recall and macro-averaged F1-score as the primary evaluation metric on the OLID and OffensEval 2020 datasets.
AB - This work addresses the classification problem defined by sub-task A (English only) of the OffensEval 2020 challenge. We used a semi-supervised approach to classify given tweets into an offensive (OFF) or not-offensive (NOT) class. As the OffensEval 2020 dataset is loosely labelled with confidence scores given by unsupervised models, we used last year's offensive language identification dataset (OLID) to label the OffensEval 2020 dataset. Our approach uses a pseudo-labelling method to annotate the current dataset. We trained four text classifiers on the OLID dataset and the classifier with the highest macro-averaged F1-score has been used to pseudo label the OffensEval 2020 dataset. The same model which performed best amongst four text classifiers on OLID dataset has been trained on the combined dataset of OLID and pseudo labelled OffensEval 2020. We evaluated the classifiers with precision, recall and macro-averaged F1-score as the primary evaluation metric on the OLID and OffensEval 2020 datasets.
UR - https://www.scopus.com/pages/publications/85123952774
U2 - 10.18653/v1/2020.semeval-1.208
DO - 10.18653/v1/2020.semeval-1.208
M3 - Conference Publication
AN - SCOPUS:85123952774
T3 - 14th International Workshops on Semantic Evaluation, SemEval 2020 - co-located 28th International Conference on Computational Linguistics, COLING 2020, Proceedings
SP - 1598
EP - 1604
BT - COLING 2020 - The International Workshop on Semantic Evaluation, Proceedings of the 14th Workshop
A2 - Herbelot, Aurelie
A2 - Zhu, Xiaodan
A2 - Palmer, Alexis
A2 - Schneider, Nathan
A2 - May, Jonathan
A2 - Shutova, Ekaterina
PB - International Committee for Computational Linguistics
Y2 - 12 December 2020 through 13 December 2020
ER -