TY - JOUR
T1 - CMVC+
T2 - a Multi-View Clustering Framework for Open Knowledge Base Canonicalization via Contrastive Learning
AU - Yang, Yang
AU - Shen, Wei
AU - Shu, Junfeng
AU - Liu, Yinan
AU - Curry, Edward
AU - Li, Guoliang
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Open information extraction (OIE) methods extract plenty of OIE triples from unstructured text, which compose large open knowledge bases (OKBs). Noun phrases and relation phrases in such OKBs are not canonicalized, which leads to scattered and redundant facts. It is found that two views of knowledge (i.e., a fact view based on the fact triple and a context view based on the fact triple's source context) provide complementary information that is vital to the task of OKB canonicalization, which clusters synonymous noun phrases and relation phrases into the same group and assigns them unique identifiers. In order to leverage these two views of knowledge jointly, we propose CMVC+, a novel unsupervised framework for canonicalizing OKBs without the need for manually annotated labels. Specifically, we propose a multi-view CHF K-Means clustering algorithm to mutually reinforce the clustering of view-specific embeddings learned from each view by considering the clustering quality in a fine-grained manner. Furthermore, we propose a novel contrastive learning module to refine the learned view-specific embeddings and further enhance the canonicalization performance.
AB - Open information extraction (OIE) methods extract plenty of OIE triples from unstructured text, which compose large open knowledge bases (OKBs). Noun phrases and relation phrases in such OKBs are not canonicalized, which leads to scattered and redundant facts. It is found that two views of knowledge (i.e., a fact view based on the fact triple and a context view based on the fact triple's source context) provide complementary information that is vital to the task of OKB canonicalization, which clusters synonymous noun phrases and relation phrases into the same group and assigns them unique identifiers. In order to leverage these two views of knowledge jointly, we propose CMVC+, a novel unsupervised framework for canonicalizing OKBs without the need for manually annotated labels. Specifically, we propose a multi-view CHF K-Means clustering algorithm to mutually reinforce the clustering of view-specific embeddings learned from each view by considering the clustering quality in a fine-grained manner. Furthermore, we propose a novel contrastive learning module to refine the learned view-specific embeddings and further enhance the canonicalization performance.
KW - Contrastive Learning
KW - Multi-View Clustering
KW - Open Knowledge Base Canonicalization
UR - https://www.scopus.com/pages/publications/85218721915
U2 - 10.1109/TKDE.2025.3543423
DO - 10.1109/TKDE.2025.3543423
M3 - Article
AN - SCOPUS:85218721915
SN - 1041-4347
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
ER -