Skip to main navigation Skip to search Skip to main content

Using Information Retrieval Techniques to Automatically Repurpose Existing Dialogue Datasets for Safe Chatbot Development

  • Insight SFI Research Centre for Data Analytics
  • University of Galway
  • Lua Health Ltd.

Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

Abstract

There has been notable progress in the development of open-domain dialogue systems (chatbots) especially with the rapid advancement of the capabilities of Large Language Models. Chatbots excel at holding conversations in a manner that keeps a user interested and engaged. However, their responses can be unsafe, as they can respond in an offensive manner or offer harmful professional advice. As a way to mitigate this issue, recent work crowdsource datasets with exemplary responses or annotate dialogue safety datasets, which are relatively scarce compared to casual dialogues. Despite the quality of data obtained from crowdsourcing, it can be expensive and time consuming. This work proposes an effective pipeline, using information retrieval, to automatically repurpose existing dialogue datasets for safe chatbot development, as a way to address the aforementioned challenges. We select an existing dialogue dataset, revise its unsafe responses, as a way to obtain a dataset with safer responses to unsafe user inputs. We then fine-tune dialogue models on the original and revised datasets and generate responses to evaluate the safeness of the models.

Original languageEnglish
Title of host publication3rd Workshop on Safety for Conversational AI, Safety4ConvAI 2024 at LREC-COLING 2024 - Workshop Proceedings
EditorsTanvi Dinkar, Giuseppe Attanasio, Amanda Cercas Curry, Ioannis Konstas, Dirk Hovy, Verena Rieser
PublisherEuropean Language Resources Association (ELRA)
Pages16-27
Number of pages12
ISBN (Electronic)9782493814449
DOIs
Publication statusPublished - 2024
Event3rd Workshop on Safety for Conversational AI, Safety4ConvAI 2024 - Torino, Italy
Duration: 21 May 2024 → …

Publication series

Name3rd Workshop on Safety for Conversational AI, Safety4ConvAI 2024 at LREC-COLING 2024 - Workshop Proceedings

Conference

Conference3rd Workshop on Safety for Conversational AI, Safety4ConvAI 2024
Country/TerritoryItaly
CityTorino
Period21/05/24 → …

Keywords

  • chatbots
  • dataset
  • dialogue safety
  • generation
  • information retrieval
  • toxicity

Fingerprint

Dive into the research topics of 'Using Information Retrieval Techniques to Automatically Repurpose Existing Dialogue Datasets for Safe Chatbot Development'. Together they form a unique fingerprint.

Cite this