FaceExpr: Personalized facial expression generation via attention-focused U-Net feature fusion in diffusion models

Research output: Contribution to a Journal (Peer & Non Peer)Articlepeer-review

Abstract

Text-to-image diffusion models have revolutionized image generation by creating high-quality visuals from text descriptions. Despite their potential for personalized text-to-image applications, existing standalone methods have struggled to provide effective semantic modifications, while approaches relying on external embeddings are computationally complex and often compromise identity and face fidelity. To overcome these challenges, we propose FaceExpr, an innovative three-instance framework using standalone text-to-image models that provide accurate facial semantic modifications and synthesize facial images with diverse expressions, all while preserving the subject’s identity. Specifically, we introduce a person-specific fine-tuning approach with two key components: (1) Attention-Focused Fusion, which uses an attention mechanism to align identity and expression features by focusing on critical facial landmarks, preserving the subject’s identity, and (2) Expression Text Embeddings, integrated into the U-Net denoising module to resolve language ambiguities and enhance expression accuracy. Additionally, an expression crafting loss is employed to strengthen the alignment between identity and expression. Furthermore, by leveraging the prior preservation loss, we enable the synthesis of expressive faces in diverse scenes, views, and conditions. FaceExpr establishes state-of-the-art performance over both standalone and hybrid methods, demonstrating its effectiveness in controllable facial expression generation. It shows strong potential for personalized content generation in digital storytelling, immersive virtual environments, and advanced research applications. For code please visit: https://github.com/MSAfganUSTC/FaceExpr.git
Original languageEnglish
Pages (from-to)103431
Number of pages1
JournalInformation Fusion
Volume125
DOIs
Publication statusPublished - 2026

Keywords

  • Attention
  • Diffusion
  • Expressions synthesis
  • Fusion
  • Identity preservation
  • Text-to-image

Fingerprint

Dive into the research topics of 'FaceExpr: Personalized facial expression generation via attention-focused U-Net feature fusion in diffusion models'. Together they form a unique fingerprint.

Cite this