TY - GEN
T1 - Performance evaluation of a distributed clustering approach for spatial datasets
AU - Bendechache, Malika
AU - Le-Khac, Nhien An
AU - Kechadi, M. Tahar
N1 - Publisher Copyright:
© Springer Nature Singapore Pte Ltd. 2018.
PY - 2018
Y1 - 2018
N2 - The analysis of big data requires powerful, scalable, and accurate data analytics techniques that the traditional data mining and machine learning do not have as a whole. Therefore, new data analytics frameworks are needed to deal with the big data challenges such as volumes, velocity, veracity, variety of the data. Distributed data mining constitutes a promising approach for big data sets, as they are usually produced in distributed locations, and processing them on their local sites will reduce significantly the response times, communications, etc. In this paper, we propose to study the performance of a distributed clustering, called Dynamic Distributed Clustering (DDC). DDC has the ability to remotely generate clusters and then aggregate them using an efficient aggregation algorithm. The technique is developed for spatial datasets. We evaluated the DDC using two types of communications (synchronous and asynchronous), and tested using various load distributions. The experimental results show that the approach has super-linear speed-up, scales up very well, and can take advantage of the recent programming models, such as MapReduce model, as its results are not affected by the types of communications.
AB - The analysis of big data requires powerful, scalable, and accurate data analytics techniques that the traditional data mining and machine learning do not have as a whole. Therefore, new data analytics frameworks are needed to deal with the big data challenges such as volumes, velocity, veracity, variety of the data. Distributed data mining constitutes a promising approach for big data sets, as they are usually produced in distributed locations, and processing them on their local sites will reduce significantly the response times, communications, etc. In this paper, we propose to study the performance of a distributed clustering, called Dynamic Distributed Clustering (DDC). DDC has the ability to remotely generate clusters and then aggregate them using an efficient aggregation algorithm. The technique is developed for spatial datasets. We evaluated the DDC using two types of communications (synchronous and asynchronous), and tested using various load distributions. The experimental results show that the approach has super-linear speed-up, scales up very well, and can take advantage of the recent programming models, such as MapReduce model, as its results are not affected by the types of communications.
KW - Asynchronous communication
KW - Distributed computing
KW - Distributed data mining
KW - Spacial data mining
KW - Super-speedup
KW - Synchronous communication
UR - http://www.scopus.com/inward/record.url?scp=85045840972&partnerID=8YFLogxK
U2 - 10.1007/978-981-13-0292-3_3
DO - 10.1007/978-981-13-0292-3_3
M3 - Conference Publication
AN - SCOPUS:85045840972
SN - 9789811302916
T3 - Communications in Computer and Information Science
SP - 38
EP - 56
BT - Data Mining - 15th Australasian Conference, AusDM 2017, Revised Selected Papers
A2 - Stirling, David
A2 - Boo, Yee Ling
A2 - Chi, Lianhua
A2 - Ong, Kok-Leong
A2 - Liu, Lin
A2 - Williams, Graham
PB - Springer-Verlag
T2 - 15th Australasian Conference on Data Mining, AusDM 2017
Y2 - 19 August 2017 through 20 August 2017
ER -