Know Your Enemy: Identifying and Adapting to Adversarial Attacks in Deep Reinforcement Learning

Research output: Contribution to a Journal (Peer & Non Peer)Conference articlepeer-review

Abstract

It has been shown that an agent can be trained with an adversarial policy which achieves high degrees of success against a state-of-the-art DRL victim despite taking unintuitive actions. This prompts the question: is this adversarial behaviour detectable through the observations of the victim alone? In competitive simulation environments, we find that widely used classification methods such as random forests are only able to achieve a maximum of ≈ 71% test set accuracy when classifying an agent for a single timestep. However, when the classifier inputs are treated as time-series data, test set classification accuracy is increased significantly to ≈ 98%. This is true for both classification of episodes as a whole, and for “live” classification at each timestep in an episode. These classifications can then be used to “react” to incoming attacks and increase the overall win rate against Adversarial opponents by approximately 17%. Classification of the victim's own internal activations in response to the adversary is shown to achieve similarly impressive accuracy while also offering advantages like increased transferability to other domains.

Original languageEnglish
Pages (from-to)2813-2814
Number of pages2
JournalProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume2023-May
Publication statusPublished - 2023
Event22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023 - London, United Kingdom
Duration: 29 May 20232 Jun 2023

Keywords

  • Adversarial Reinforcement Learning
  • Deep Reinforcement Learning
  • Opponent Modelling

Fingerprint

Dive into the research topics of 'Know Your Enemy: Identifying and Adapting to Adversarial Attacks in Deep Reinforcement Learning'. Together they form a unique fingerprint.

Cite this