Performance evaluation of a distributed clustering approach for spatial datasets

Malika Bendechache, Nhien An Le-Khac, M. Tahar Kechadi

Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

2 Citations (Scopus)

Abstract

The analysis of big data requires powerful, scalable, and accurate data analytics techniques that the traditional data mining and machine learning do not have as a whole. Therefore, new data analytics frameworks are needed to deal with the big data challenges such as volumes, velocity, veracity, variety of the data. Distributed data mining constitutes a promising approach for big data sets, as they are usually produced in distributed locations, and processing them on their local sites will reduce significantly the response times, communications, etc. In this paper, we propose to study the performance of a distributed clustering, called Dynamic Distributed Clustering (DDC). DDC has the ability to remotely generate clusters and then aggregate them using an efficient aggregation algorithm. The technique is developed for spatial datasets. We evaluated the DDC using two types of communications (synchronous and asynchronous), and tested using various load distributions. The experimental results show that the approach has super-linear speed-up, scales up very well, and can take advantage of the recent programming models, such as MapReduce model, as its results are not affected by the types of communications.

Original languageEnglish
Title of host publicationData Mining - 15th Australasian Conference, AusDM 2017, Revised Selected Papers
EditorsDavid Stirling, Yee Ling Boo, Lianhua Chi, Kok-Leong Ong, Lin Liu, Graham Williams
PublisherSpringer-Verlag
Pages38-56
Number of pages19
ISBN (Print)9789811302916
DOIs
Publication statusPublished - 2018
Externally publishedYes
Event15th Australasian Conference on Data Mining, AusDM 2017 - Melbourne, Australia
Duration: 19 Aug 201720 Aug 2017

Publication series

NameCommunications in Computer and Information Science
Volume845
ISSN (Print)1865-0929

Conference

Conference15th Australasian Conference on Data Mining, AusDM 2017
Country/TerritoryAustralia
CityMelbourne
Period19/08/1720/08/17

Keywords

  • Asynchronous communication
  • Distributed computing
  • Distributed data mining
  • Spacial data mining
  • Super-speedup
  • Synchronous communication

Fingerprint

Dive into the research topics of 'Performance evaluation of a distributed clustering approach for spatial datasets'. Together they form a unique fingerprint.

Cite this