Reinforcement Learning for Mixed Open-loop and Closed-loop Control

Eric Hansen, Andrew G. Barto, and Shlomo Zilberstein. Reinforcement Learning for Mixed Open-loop and Closed-loop Control. Proceedings of the Ninth Neural Information Processing Systems Conference (NIPS), 1026-1032, Denver, Colorado, 1996.

Abstract

Closed-loop control relies on sensory feedback that is usually assumed to be free. But if sensing incurs a cost, it may be costeffective to take sequences of actions in open-loop mode. We describe a reinforcement learning algorithm that learns to combine open-loop and closed-loop control when sensing incurs a cost. Although we assume reliable sensors, use of open-loop control means that actions must sometimes be taken when the current state of the controlled system is uncertain. This is a special case of the hidden-state problem in reinforcement learning, and to cope, our algorithm relies on short-term memory. The main result of the paper is a rule that significantly limits exploration of possible memory states by pruning memory states for which the estimated value of information is greater than its cost. We prove that this rule allows convergence to an optimal policy.

Bibtex entry:

@inproceedings{HBZnips96,
  author	= {Eric Hansen and Andrew G. Barto and Shlomo Zilberstein},
  title		= {Reinforcement Learning for Mixed Open-loop and Closed-loop Control},
  booktitle     = {Proceedings of the Ninth Neural Information Processing Systems 
                   Conference},
  year		= {1996},
  pages		= {1026-1032},
  address       = {Denver, Colorado},
  url		= {http://rbr.cs.umass.edu/shlomo/papers/HBZnips96.html}
}

shlomo@cs.umass.edu
UMass Amherst