Abstract
Solving a finite Markov decision process using techniques from dynamic programming such as value or policy iteration require a complete model of the environmental dynamics. The distribution of rewards, transition probabilities, states and actions all need to be fully observable, discrete and complete. For many problem domains, a complete model containing a full representation of the environmental dynamics may not be readily available. Bayesian reinforcement learning (RL)\ is a technique devised to make better use of the information observed through learning than simply computing Q-functions. However, this approach can often require extensive experience in order to build up an accurate representation of the true values. To address this issue, this paper proposes a method for parallelising a Bayesian RL technique aimed at reducing the time it takes to approximate the missing model. We demonstrate the technique on learning next state transition probabilities without prior knowledge. The approach is general enough for approximating any probabilistically driven component of the model. The solution involves multiple learning agents learning in parallel on the same task. Agents share probability density estimates amongst each other in an effort to speed up convergence to the true values.
| Original language | English (Ireland) |
|---|---|
| Pages (from-to) | 7-23 |
| Number of pages | 17 |
| Journal | Connection Science |
| Volume | 26 |
| Issue number | 1 |
| Publication status | Published - 1 Jan 2014 |
Keywords
- Bayesian inference
- parallel learning
- posterior sampling
- reinforcement learning
Authors (Note for portal: view the doc link for the full list of authors)
- Authors
- Enda Barrett, Jim Duggan, Enda Howley
- Barrett, E,Duggan, J,Howley, E
- E.Barrett, J. Duggan & E.Howley