RAGCol: RAG-Based Automatic Video Colorization Through Text Caption Generation and Knowledge Enrichment

Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

Abstract

Automatic video colorization is a challenging task where multiple plausible colorizations can be deployed for any black-and-white film. For single photos, it is possible to have human knowledge guide the colorization process through text prompts, but for all of the frames and entities shown in a video, it becomes more difficult to achieve. With recent advances in automatic video colorization, natural language processing and knowledge enrichment, it is feasible to leverage external knowledge in automatic text-guided video colorization. To realize this possibility, we propose RAGCol, a knowledge-enriched video colorization system which adapts the retrieval augmented generation (RAG) framework to an automated colorization pipeline. We validated our RAGCol on the DAVIS [46] and Videvo [29] datasets. RAGCol demonstrated an average improvement of 9% over the previous state-of-the-art L-CAD [8] across the PSNR, SSIM, FID and FVD metrics. In a user study, we found that videos colorized by RAGCol were preferred by 74% on average over contemporary colorizers by human evaluators.

Original languageEnglish
Title of host publication40th Annual ACM Symposium on Applied Computing, SAC 2025
PublisherAssociation for Computing Machinery
Pages953-962
Number of pages10
ISBN (Electronic)9798400706295
DOIs
Publication statusPublished - 14 May 2025
Event40th Annual ACM Symposium on Applied Computing, SAC 2025 - Catania, Italy
Duration: 31 Mar 20254 Apr 2025

Publication series

NameProceedings of the ACM Symposium on Applied Computing

Conference

Conference40th Annual ACM Symposium on Applied Computing, SAC 2025
Country/TerritoryItaly
CityCatania
Period31/03/254/04/25

Keywords

  • knowledge enrichment
  • machine learning
  • retrieval augmented generation
  • text caption generation
  • video colorization

Fingerprint

Dive into the research topics of 'RAGCol: RAG-Based Automatic Video Colorization Through Text Caption Generation and Knowledge Enrichment'. Together they form a unique fingerprint.

Cite this