Demonstration-Guided Multi-Objective Reinforcement Learning

Research output: Contribution to a Journal (Peer & Non Peer)Articlepeer-review

Abstract

Multi-objective reinforcement learning (MORL) closely mirrors real-world conditions and has consequently gained attention. However, training a MORL policy from scratch is more challenging as it needs to balance multiple objectives according to differing preferences during policy optimization. Demonstrations often embody a wealth of domain knowledge that can improve MORL training efficiency without specific design. We propose an algorithm i.e. demonstration-guided multi-objective reinforcement learning (DG-MORL), which is the first MORL algorithm that can use prior demonstrations to enhance training efficiency seamlessly. Our novel algorithm aligns prior demonstrations with latent preferences via corner weight support. We also propose a self-evolving mechanism to gradually refine the demonstration set and avoid sub-optimal demonstration from hindering the training. DG-MORL offers a universal framework that can be utilized for any MORL algorithm. Our empirical studies demonstrate DG-MORL’s superiority over state-of-the-art MORL algorithms, establishing its robustness and efficacy. We also provide the sample complexity lower bound and the upper bound of Pareto regret of the algorithm.

Original languageEnglish
JournalTransactions on Machine Learning Research
Volume2024
Publication statusPublished - 2024

Fingerprint

Dive into the research topics of 'Demonstration-Guided Multi-Objective Reinforcement Learning'. Together they form a unique fingerprint.

Cite this