Skip to main navigation Skip to search Skip to main content

Big Holes in Big Data: A Monte Carlo Algorithm for Detecting Large Hyper-Rectangles in High Dimensional Data

  • Central Washington University
  • Western Washington University
  • Transilvania University

Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

20 Citations (Scopus)

Abstract

We present the first algorithm for finding holes in high dimensional data that runs in polynomial time with respect to the number of dimensions. Previous algorithms are exponential. Finding large empty rectangles or boxes in a set of points in 2D and 3D space has been well studied. Efficient algorithms exist to identify the empty regions in these low-dimensional spaces. Unfortunately such efficiency is lacking in higher dimensions where the problem has been shown to be NP-complete when the dimensions are included in the input. Applications for algorithms that find large empty spaces include big data analysis, recommender systems, automated knowledge discovery, and query optimization. Our Monte Carlo-based algorithm discovers interesting maximal empty hyper-rectangles in cases where dimensionality and input size would otherwise make analysis impractical. The run-time is polynomial in the size of the input and the number of dimensions. We apply the algorithm on a 39-dimensional data set for protein structures and discover interesting properties that we think could not be inferred otherwise.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016
EditorsWilliam Claycomb, Dejan Milojicic, Ling Liu, Mihhail Matskin, Zhiyong Zhang, Sorel Reisman, Hiroyuki Sato, Zhiyong Zhang, Sheikh Iqbal Ahamed
PublisherIEEE Computer Society
Pages563-571
Number of pages9
ISBN (Electronic)9781467388450
DOIs
Publication statusPublished - 24 Aug 2016
Externally publishedYes
Event2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016 - Atlanta, United States
Duration: 10 Jun 201614 Jun 2016

Publication series

NameProceedings - International Computer Software and Applications Conference
Volume1
ISSN (Print)0730-3157

Conference

Conference2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016
Country/TerritoryUnited States
CityAtlanta
Period10/06/1614/06/16

Keywords

  • Holes in Data
  • Maximum empty rectangle
  • Monte Carlo

Fingerprint

Dive into the research topics of 'Big Holes in Big Data: A Monte Carlo Algorithm for Detecting Large Hyper-Rectangles in High Dimensional Data'. Together they form a unique fingerprint.

Cite this