Abstract
This paper provides a new approach combining BERT and Knowledge Graphs (KGs) to solve a multi-label classification problem with limited training data. The paper introduces a method of using taxonomies and a dataset with 518 entries and 340 concepts to fine-tune BERT. It also introduces a new data augmentation technique called Perfect Binary Tree (PBT)-Flow to deal with limited or imbalanced training data. The proposed approach obtained a recall@10 of 61.12%, a precision@10 of 11.86% and F1score@10 of 18.83%. While these results seem low, they are promising because of the simple architecture of the model used (BERT+2xFC), the limited size of the training data, and the large number of output concepts.
| Original language | English |
|---|---|
| Journal | CEUR Workshop Proceedings |
| Volume | 3342 |
| Publication status | Published - 2022 |
| Externally published | Yes |
| Event | 2022 Workshop on Deep Learning for Knowledge Graphs, DL4KG 2022 - Virtual, Online Duration: 24 Oct 2022 → … |
Keywords
- BERT
- Data augmentation
- Knowledge graphs
- Multi-label classification