A parallel framework for Bayesian reinforcement learning

Research output: Contribution to a Journal (Peer & Non Peer)Articlepeer-review

8 Citations (Scopus)

Abstract

Solving a finite Markov decision process using techniques from dynamic programming such as value or policy iteration require a complete model of the environmental dynamics. The distribution of rewards, transition probabilities, states and actions all need to be fully observable, discrete and complete. For many problem domains, a complete model containing a full representation of the environmental dynamics may not be readily available. Bayesian reinforcement learning (RL)\ is a technique devised to make better use of the information observed through learning than simply computing Q-functions. However, this approach can often require extensive experience in order to build up an accurate representation of the true values. To address this issue, this paper proposes a method for parallelising a Bayesian RL technique aimed at reducing the time it takes to approximate the missing model. We demonstrate the technique on learning next state transition probabilities without prior knowledge. The approach is general enough for approximating any probabilistically driven component of the model. The solution involves multiple learning agents learning in parallel on the same task. Agents share probability density estimates amongst each other in an effort to speed up convergence to the true values.

Original languageEnglish (Ireland)
Pages (from-to)7-23
Number of pages17
JournalConnection Science
Volume26
Issue number1
Publication statusPublished - 1 Jan 2014

Keywords

  • Bayesian inference
  • parallel learning
  • posterior sampling
  • reinforcement learning

Authors (Note for portal: view the doc link for the full list of authors)

  • Authors
  • Enda Barrett, Jim Duggan, Enda Howley
  • Barrett, E,Duggan, J,Howley, E
  • E.Barrett, J. Duggan & E.Howley

Fingerprint

Dive into the research topics of 'A parallel framework for Bayesian reinforcement learning'. Together they form a unique fingerprint.

Cite this