FlintstonesSV++: Improving Story Narration using Visual Scene Graph

    Research output: Contribution to a Journal (Peer & Non Peer)Conference articlepeer-review

    Abstract

    Recent advancements in text-to-image, text-to-video, and large language models have significantly enhanced the performance of various downstream tasks. In the field of Story Visualization, models have been developed to generate coherent image sequences from storylines composed of multiple scenes. These innovations have largely relied on benchmark datasets such as FlintstonesSV and PororoSV, which provide essential resources for tasks like Story Visualization and Story Continuation. However, our analysis identifies several limitations in the FlintstonesSV dataset that restrict the performance of models trained on it. To address these limitations, we introduce FlintstonesSV++, an enhanced version of the FlintstonesSV dataset. FlintstonesSV++ leverages visual Scene Graphs and Large Language Models to enrich storylines with factual details, further validated by human reviewers. By fine-tuning text-to-story generation models on FlintstonesSV++, we demonstrate substantial improvements, achieving a 5.2% average increase in alignment scores and a 5.72% boost in image generation quality compared to models trained on the original dataset. Moreover, a qualitative comparative analysis highlights the superior performance of FlintstonesSV++ compared to the original dataset. The FlintstonesSV++ dataset marks a significant advancement in enabling tasks such as Story Visualization and Story Continuation. To support further research in story-based visual content generation, we made the code and dataset publicly available.

    Original languageEnglish
    Pages (from-to)29-38
    Number of pages10
    JournalCEUR Workshop Proceedings
    Volume3964
    Publication statusPublished - 2025
    Event8th International Workshop on Narrative Extraction From Texts, Text2Story 2025 - Lucca, Italy
    Duration: 10 Apr 2025 → …

    Keywords

    • Dataset Improvement
    • Large Language Models
    • Large Multimodal Models
    • Narrative Resources
    • Story Narrative Generation
    • Storyline Visualization
    • Visual Scene Graphs

    Fingerprint

    Dive into the research topics of 'FlintstonesSV++: Improving Story Narration using Visual Scene Graph'. Together they form a unique fingerprint.

    Cite this