TY - JOUR
T1 - RDOF
T2 - An outlier detection algorithm based on relative density
AU - Wahid, Abdul
AU - Rao, Annavarapu Chandra Sekhara
N1 - Publisher Copyright:
© 2021 John Wiley & Sons Ltd.
PY - 2022/2
Y1 - 2022/2
N2 - An outlier has a significant impact on data quality and the efficiency of data mining. The outlier identification algorithm observes only data points that do not follow clearly defined meanings of projected behaviour in a data set. Several techniques for identifying outliers have been presented in recent years, but if outliers are located in areas where neighbourhood density varies substantially, it can result in an imprecise estimate. To address this problem, we provide a ‘Relative Density-based Outlier Factor (RDOF)’ algorithm based on the concept of mutual proximity between a data point and its neighbours. The proposed approach is divided into two stages: an influential space is created at a test point in the first stage. In the later stage, a test point is assigned an outlier-ness score. We have conducted experiments on three real-world data sets, namely the Johns Hopkins University Ionosphere, the Iris Plant, and Wisconsin Breast Cancer data sets. We have investigated three performance metrics for comparison: precision, recall, and rank power. In addition, we have compared our proposed method against a set of relevant baseline methods. The experimental results reveal that our proposed method detected all (i.e., 100%) outlier class objects with higher rank power than baseline approaches over these experimental data sets.
AB - An outlier has a significant impact on data quality and the efficiency of data mining. The outlier identification algorithm observes only data points that do not follow clearly defined meanings of projected behaviour in a data set. Several techniques for identifying outliers have been presented in recent years, but if outliers are located in areas where neighbourhood density varies substantially, it can result in an imprecise estimate. To address this problem, we provide a ‘Relative Density-based Outlier Factor (RDOF)’ algorithm based on the concept of mutual proximity between a data point and its neighbours. The proposed approach is divided into two stages: an influential space is created at a test point in the first stage. In the later stage, a test point is assigned an outlier-ness score. We have conducted experiments on three real-world data sets, namely the Johns Hopkins University Ionosphere, the Iris Plant, and Wisconsin Breast Cancer data sets. We have investigated three performance metrics for comparison: precision, recall, and rank power. In addition, we have compared our proposed method against a set of relevant baseline methods. The experimental results reveal that our proposed method detected all (i.e., 100%) outlier class objects with higher rank power than baseline approaches over these experimental data sets.
UR - http://www.scopus.com/inward/record.url?scp=85118297105&partnerID=8YFLogxK
U2 - 10.1111/exsy.12859
DO - 10.1111/exsy.12859
M3 - Article
AN - SCOPUS:85118297105
SN - 0266-4720
VL - 39
JO - Expert Systems
JF - Expert Systems
IS - 2
M1 - e12859
ER -