Skip to main navigation Skip to search Skip to main content

Adaptation of Whisper models to child speech recognition

  • Rishabh Jain
  • , Andrei Barcovschi
  • , Mariam Yiwere
  • , Peter Corcoran
  • , Horia Cucu
  • University of Galway
  • Polytechnica University

Research output: Contribution to a Journal (Peer & Non Peer)Conference articlepeer-review

36 Citations (Scopus)

Abstract

Automatic Speech Recognition (ASR) systems often struggle with transcribing child speech due to the lack of large child speech datasets required to accurately train child-friendly ASR models. However, there are huge amounts of annotated adult speech datasets which were used to create multilingual ASR models, such as Whisper. Our work aims to explore whether such models can be adapted to child speech to improve ASR for children. In addition, we compare Whisper child-adaptations with finetuned self-supervised models, such as wav2vec2. We demonstrate that finetuning Whisper on child speech yields significant improvements in ASR performance on child speech, compared to non-finetuned Whisper models. Additionally, utilizing self-supervised Wav2vec2 models that have been finetuned on child speech outperforms Whisper finetuning.

Original languageEnglish
Pages (from-to)5242-5246
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2023-August
DOIs
Publication statusPublished - 2023
Event24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
Duration: 20 Aug 202324 Aug 2023

Keywords

  • Automatic Speech Recognition
  • Child Speech Recognition
  • CMU Kids
  • MyST
  • PF-STAR
  • Whisper model

Fingerprint

Dive into the research topics of 'Adaptation of Whisper models to child speech recognition'. Together they form a unique fingerprint.

Cite this