TY - GEN
T1 - RAGCol
T2 - 40th Annual ACM Symposium on Applied Computing, SAC 2025
AU - Ward, Rory
AU - Dalal, Dhairya
AU - Buitelaar, Paul
AU - Breslin, John
N1 - Publisher Copyright:
Copyright © 2025 held by the owner/author(s).
PY - 2025/5/14
Y1 - 2025/5/14
N2 - Automatic video colorization is a challenging task where multiple plausible colorizations can be deployed for any black-and-white film. For single photos, it is possible to have human knowledge guide the colorization process through text prompts, but for all of the frames and entities shown in a video, it becomes more difficult to achieve. With recent advances in automatic video colorization, natural language processing and knowledge enrichment, it is feasible to leverage external knowledge in automatic text-guided video colorization. To realize this possibility, we propose RAGCol, a knowledge-enriched video colorization system which adapts the retrieval augmented generation (RAG) framework to an automated colorization pipeline. We validated our RAGCol on the DAVIS [46] and Videvo [29] datasets. RAGCol demonstrated an average improvement of 9% over the previous state-of-the-art L-CAD [8] across the PSNR, SSIM, FID and FVD metrics. In a user study, we found that videos colorized by RAGCol were preferred by 74% on average over contemporary colorizers by human evaluators.
AB - Automatic video colorization is a challenging task where multiple plausible colorizations can be deployed for any black-and-white film. For single photos, it is possible to have human knowledge guide the colorization process through text prompts, but for all of the frames and entities shown in a video, it becomes more difficult to achieve. With recent advances in automatic video colorization, natural language processing and knowledge enrichment, it is feasible to leverage external knowledge in automatic text-guided video colorization. To realize this possibility, we propose RAGCol, a knowledge-enriched video colorization system which adapts the retrieval augmented generation (RAG) framework to an automated colorization pipeline. We validated our RAGCol on the DAVIS [46] and Videvo [29] datasets. RAGCol demonstrated an average improvement of 9% over the previous state-of-the-art L-CAD [8] across the PSNR, SSIM, FID and FVD metrics. In a user study, we found that videos colorized by RAGCol were preferred by 74% on average over contemporary colorizers by human evaluators.
KW - knowledge enrichment
KW - machine learning
KW - retrieval augmented generation
KW - text caption generation
KW - video colorization
UR - https://www.scopus.com/pages/publications/105006416886
U2 - 10.1145/3672608.3707748
DO - 10.1145/3672608.3707748
M3 - Conference Publication
AN - SCOPUS:105006416886
T3 - Proceedings of the ACM Symposium on Applied Computing
SP - 953
EP - 962
BT - 40th Annual ACM Symposium on Applied Computing, SAC 2025
PB - Association for Computing Machinery
Y2 - 31 March 2025 through 4 April 2025
ER -