CMVC+: a Multi-View Clustering Framework for Open Knowledge Base Canonicalization via Contrastive Learning

Yang Yang, Wei Shen, Junfeng Shu, Yinan Liu, Edward Curry, Guoliang Li

Research output: Contribution to a Journal (Peer & Non Peer)Articlepeer-review

Abstract

Open information extraction (OIE) methods extract plenty of OIE triples from unstructured text, which compose large open knowledge bases (OKBs). Noun phrases and relation phrases in such OKBs are not canonicalized, which leads to scattered and redundant facts. It is found that two views of knowledge (i.e., a fact view based on the fact triple and a context view based on the fact triple's source context) provide complementary information that is vital to the task of OKB canonicalization, which clusters synonymous noun phrases and relation phrases into the same group and assigns them unique identifiers. In order to leverage these two views of knowledge jointly, we propose CMVC+, a novel unsupervised framework for canonicalizing OKBs without the need for manually annotated labels. Specifically, we propose a multi-view CHF K-Means clustering algorithm to mutually reinforce the clustering of view-specific embeddings learned from each view by considering the clustering quality in a fine-grained manner. Furthermore, we propose a novel contrastive learning module to refine the learned view-specific embeddings and further enhance the canonicalization performance.

Original languageEnglish
JournalIEEE Transactions on Knowledge and Data Engineering
DOIs
Publication statusAccepted/In press - 2025

Keywords

  • Contrastive Learning
  • Multi-View Clustering
  • Open Knowledge Base Canonicalization

Fingerprint

Dive into the research topics of 'CMVC+: a Multi-View Clustering Framework for Open Knowledge Base Canonicalization via Contrastive Learning'. Together they form a unique fingerprint.

Cite this