TY - GEN
T1 - Personalization of Dataset Retrieval Results Using a Data Valuation Method
AU - Ebiele, Malick
AU - Bendechache, Malika
AU - Clinton, Eamonn
AU - Brennan, Rob
N1 - Publisher Copyright:
© 2024 by SCITEPRESS – Science and Technology Publications, Lda.
PY - 2024
Y1 - 2024
N2 - In this paper, we propose a data valuation method that is used for Dataset Retrieval (DR) results re-ranking. Dataset retrieval is a specialization of Information Retrieval (IR) where instead of retrieving relevant documents, the information retrieval system returns a list of relevant datasets. To the best of our knowledge, data valuation has not yet been applied to dataset retrieval. By leveraging metadata and users’ preferences, we estimate the personal value of each dataset to facilitate dataset ranking and filtering. With two real users (stakeholders) and four simulated users (users’ preferences generated using a uniform weight distribution), we studied the user satisfaction rate. We define users’ satisfaction rate as the probability that users find the datasets they seek in the top k = {5, 10} of the retrieval results. Previous studies of fairness in rankings (position bias) have shown that the probability or the exposure rate of a document drops exponentially from the top 1 to the top 10, from 100% to about 20%. Therefore, we calculated the Jaccard score@5 and Jaccard score@10 between our approach and other re-ranking options. It was found that there is a 42.24% and a 56.52% chance on average that users will find the dataset they are seeking in the top 5 and top 10, respectively. The lowest chance is 0% for the top 5 and 33.33% for the top 10; while the highest chance is 100% in both cases. The dataset used in our experiments is a real-world dataset and the result of a query sent to a National mapping agency data catalog. In the future, we are planning to extend the experiments performed in this paper to publicly available data catalogs.
AB - In this paper, we propose a data valuation method that is used for Dataset Retrieval (DR) results re-ranking. Dataset retrieval is a specialization of Information Retrieval (IR) where instead of retrieving relevant documents, the information retrieval system returns a list of relevant datasets. To the best of our knowledge, data valuation has not yet been applied to dataset retrieval. By leveraging metadata and users’ preferences, we estimate the personal value of each dataset to facilitate dataset ranking and filtering. With two real users (stakeholders) and four simulated users (users’ preferences generated using a uniform weight distribution), we studied the user satisfaction rate. We define users’ satisfaction rate as the probability that users find the datasets they seek in the top k = {5, 10} of the retrieval results. Previous studies of fairness in rankings (position bias) have shown that the probability or the exposure rate of a document drops exponentially from the top 1 to the top 10, from 100% to about 20%. Therefore, we calculated the Jaccard score@5 and Jaccard score@10 between our approach and other re-ranking options. It was found that there is a 42.24% and a 56.52% chance on average that users will find the dataset they are seeking in the top 5 and top 10, respectively. The lowest chance is 0% for the top 5 and 33.33% for the top 10; while the highest chance is 100% in both cases. The dataset used in our experiments is a real-world dataset and the result of a query sent to a National mapping agency data catalog. In the future, we are planning to extend the experiments performed in this paper to publicly available data catalogs.
KW - Data Valuation
KW - Data Value
KW - Dataset Retrieval
KW - Information Retrieval
KW - Personalized Data Value
KW - Quantitative Data Valuation
UR - http://www.scopus.com/inward/record.url?scp=85215301738&partnerID=8YFLogxK
U2 - 10.5220/0013044100003838
DO - 10.5220/0013044100003838
M3 - Conference Publication
AN - SCOPUS:85215301738
T3 - International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K - Proceedings
SP - 122
EP - 134
BT - 16th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2024 as part of IC3K 2024 - Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
A2 - Coenen, Frans
A2 - Fred, Ana
A2 - Bernardino, Jorge
PB - Science and Technology Publications, Lda
T2 - 16th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2024 as part of 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2024
Y2 - 17 November 2024 through 19 November 2024
ER -