Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models

Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

3 Citations (Scopus)

Abstract

Addressing hard cases in autonomous driving, such as anomalous road users, extreme weather conditions, and complex traffic interactions, presents significant challenges. To ensure safety, it is crucial to detect and manage these scenarios effectively for autonomous driving systems. However, the rarity and high-risk nature of these cases demand extensive, diverse datasets for training robust models. Vision-Language Foundation Models (VLMs) have shown remarkable zero-shot capabilities as being trained on extensive datasets. This work explores the potential of VLMs in detecting hard cases in autonomous driving. We demonstrate the capability of VLMs such as GPT-4v in detecting hard cases in traffic participant motion prediction on both agent and scenario levels. We introduce a feasible pipeline where VLMs, fed with sequential image frames with designed prompts, effectively identify challenging agents or scenarios, which are verified by existing prediction models. Moreover, by taking advantage of this detection of hard cases by VLMs, we further improve the training efficiency of the existing motion prediction pipeline by performing data selection for the training samples suggested by GPT. We show the effectiveness and feasibility of our pipeline incorporating VLMs with state-of-the-art methods on NuScenes datasets. The code is accessible at https://github.com/KTH-RPL/Detect-VLM.

Original languageEnglish (Ireland)
Title of host publication2024 IEEE Intelligent Vehicles Symposium (IV)
Place of PublicationJeju Island, Korea
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2405-2412
Number of pages8
ISBN (Electronic)9798350348811
Publication statusPublished - 1 May 2024
Event35th IEEE Intelligent Vehicles Symposium, IV 2024 - Jeju Island, Korea, Republic of
Duration: 2 Jun 20245 Jun 2024

Publication series

Name1931-0587

Conference

Conference35th IEEE Intelligent Vehicles Symposium, IV 2024
Country/TerritoryKorea, Republic of
CityJeju Island
Period2/06/245/06/24

Authors (Note for portal: view the doc link for the full list of authors)

  • Authors
  • Y. Yang, Q. Zhang, K. Ikemura, N. Batool and J. Folkesson

Fingerprint

Dive into the research topics of 'Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models'. Together they form a unique fingerprint.
  • Yi Yang

    Batool, N. (Co-Supervisor)

    Jan 2021 → …

    Activity: OtherCurrent Postgraduates (Research) Supervised

Cite this