Skip to main navigation Skip to search Skip to main content

The effect of principal component analysis on machine learning accuracy with high-dimensional spectral data

  • University of Galway

Research output: Contribution to a Journal (Peer & Non Peer)Articlepeer-review

126 Citations (Scopus)

Abstract

This paper presents the results of an investigation into the use of machine learning methods for the identification of narcotics from Raman spectra. The classification of spectral data and other high-dimensional data, such as images, gene-expression data and spectral data, poses an interesting challenge to machine learning, as the presence of high numbers of redundant or highly correlated attributes can seriously degrade classification accuracy. This paper investigates the use of principal component analysis (PCA) to reduce high-dimensional spectral data and to improve the predictive performance of some well-known machine learning methods. Experiments are carried out on a high-dimensional spectral dataset. These experiments employ the NIPALS (Non-Linear Iterative Partial Least Squares) PCA method, a method that has been used in the field of chemometrics for spectral classification, and is a more efficient alternative than the widely used eigenvector decomposition approach. The experiments show that the use of this PCA method can improve the performance of machine learning in the classification of high-dimensional data.

Original languageEnglish
Pages (from-to)363-370
Number of pages8
JournalKnowledge-Based Systems
Volume19
Issue number5
DOIs
Publication statusPublished - Sep 2006

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • High-dimensional data
  • Machine learning
  • NIPALS
  • Principal component analysis
  • Spectroscopy

Fingerprint

Dive into the research topics of 'The effect of principal component analysis on machine learning accuracy with high-dimensional spectral data'. Together they form a unique fingerprint.

Cite this