# Bounded Policy Iteration for Decentralized POMDPs

Daniel S. Bernstein, Eric A. Hansen, and Shlomo Zilberstein.
Bounded Policy Iteration for Decentralized POMDPs.
*Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence*
(IJCAI), 1287-1292, Edinburgh, Scotland, 2005.

## Abstract

We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs.

### Bibtex entry:

@inproceedings{BHZijcai05, author = {Daniel S. Bernstein and Eric A. Hansen and Shlomo Zilberstein}, title = {Bounded Policy Iteration for Decentralized {POMDP}s}, booktitle = {Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence}, year = {2005}, pages = {1287-1292}, address = {Edinburgh, Scotland}, url = {http://rbr.cs.umass.edu/shlomo/papers/BHZijcai05.html} }shlomo@cs.umass.edu