Abstract
In many multi-agent interactions in the real world, agents receive payoffs over multiple distinct criteria; i.e. the payoffs are multiobjective in nature. However, the same multi-objective payoff vector may lead to different utilities for each participant. Therefore, it is essential for an agent to learn about the behaviour of other agents in the system. In this work, we present the first study of the effects of such opponent modelling on multi-objective multiagent interactions with non-linear utilities. Specifically, we consider multi-objective normal form games with non-linear utility functions under the scalarised expected returns optimisation criterion. We contribute a novel actor-critic formulation to allow reinforcement learning of mixed strategies in this setting, along with an extension that incorporates opponent policy reconstruction using conditional action frequencies. Empirical results in five different MONFGs demonstrate that opponent modelling can drastically alter the learning dynamics in this setting. When equilibria are present opponent modelling can confer significant benefits on agents that implement it. However, when there are no Nash equilibria, opponent modelling can have adverse effects on utility, and has a neutral effect at best (after extensive hyper-parameter optimisation).
Original language | English |
---|---|
Publication status | Published - 2020 |
Event | Adaptive and Learning Agents Workshop, ALA 2020 at AAMAS 2020 - Auckland, New Zealand Duration: 9 May 2020 → 10 May 2020 |
Conference
Conference | Adaptive and Learning Agents Workshop, ALA 2020 at AAMAS 2020 |
---|---|
Country/Territory | New Zealand |
City | Auckland |
Period | 9/05/20 → 10/05/20 |
Keywords
- game theory
- Multi-agent systems
- multi-objective decision making
- Nash equilibrium
- opponent modelling
- reinforcement learning
- solution concepts