Abstract
Recent advancements in text-to-image, text-to-video, and large language models have significantly enhanced the performance of various downstream tasks. In the field of Story Visualization, models have been developed to generate coherent image sequences from storylines composed of multiple scenes. These innovations have largely relied on benchmark datasets such as FlintstonesSV and PororoSV, which provide essential resources for tasks like Story Visualization and Story Continuation. However, our analysis identifies several limitations in the FlintstonesSV dataset that restrict the performance of models trained on it. To address these limitations, we introduce FlintstonesSV++, an enhanced version of the FlintstonesSV dataset. FlintstonesSV++ leverages visual Scene Graphs and Large Language Models to enrich storylines with factual details, further validated by human reviewers. By fine-tuning text-to-story generation models on FlintstonesSV++, we demonstrate substantial improvements, achieving a 5.2% average increase in alignment scores and a 5.72% boost in image generation quality compared to models trained on the original dataset. Moreover, a qualitative comparative analysis highlights the superior performance of FlintstonesSV++ compared to the original dataset. The FlintstonesSV++ dataset marks a significant advancement in enabling tasks such as Story Visualization and Story Continuation. To support further research in story-based visual content generation, we made the code and dataset publicly available.
| Original language | English |
|---|---|
| Pages (from-to) | 29-38 |
| Number of pages | 10 |
| Journal | CEUR Workshop Proceedings |
| Volume | 3964 |
| Publication status | Published - 2025 |
| Event | 8th International Workshop on Narrative Extraction From Texts, Text2Story 2025 - Lucca, Italy Duration: 10 Apr 2025 → … |
Keywords
- Dataset Improvement
- Large Language Models
- Large Multimodal Models
- Narrative Resources
- Story Narrative Generation
- Storyline Visualization
- Visual Scene Graphs