Adaptive Optimization Techniques for Large-Scale Stochastic Planning

Sponsored by:
Air Force Office of Scientific Research, Optimization and Discrete Mathematics Program

Shlomo Zilberstein, PI
Marek Petrik, RA

Project Description

Developing scalable and adaptive algorithms for reasoning and acting under uncertainty is an important challenge in artificial intelligence, optimization and operations research. A large subclass of these problems can be formulated as Markov decision processes and are typically solved using Approximate Dynamic Programming (ADP). While ADP has recently gained traction in many domains, the successful applications often require extensive parameter tuning in order to obtain a sufficiently small approximation error. The goal of this project is to develop ADP methods that are robust and scalable, while reducing the need for extensive tuning. We particularly focus on Approximate Linear Programming (ALP), a type of ADP. ALP has a number of theoretical advantages over other approximate dynamic programming methods, but in practice it suffers from the same performance issues that lead to a large approximation error.

We have examined the properties of approximate linear programming (ALP) both theoretically and practically. We evaluated ALP on several applications including a complex blood inventory management problem. Our experiments and analysis indicate that it is possible to obtain good results with ALP, but it requires a significant amount of parameter tweaking to control the approximation error. We analyzed the approximation error and identified three key components that affect the performance of ALP. The first component is the representational error, which is the fundamental error caused by limiting the value function to the approximation subspace. It is the error between the optimal value function and its closest possible approximation in the basis. The second component is the transitional error, which results from the ALP formulation. Generally, ALP is not guaranteed to find the best possible approximation of the optimal value function. The third component is the sampling error, caused by considering only a subset of the constraints in the linear program. The main objective of the project is to develop techniques to control and reduce these errors in a systematic and disciplined manner. We have developed several approaches that reduce the transitional error or guarantee that it is small. We have also shown that ALP can be used to calculate admissible heuristic functions often used to solve complex search and planning problems.

Related Publications

  • Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes.

    M. Petrik, G. Taylor, R. Parr, and S. Zilberstein. Proceedings of the Twenty-Seventh International Confererence on Machine Learning (ICML), Haifa, Israel, 2010. [abs] [bib] [pdf]

  • A Bilinear Programming Approach for Multiagent Planning.

    M. Petrik and S. Zilberstein. Journal of Artificial Intelligence Research, 35:235-274, 2009. [abs] [bib] [pdf]

  • Constraint Relaxation in Approximate Linear Programs.

    M. Petrik and S. Zilberstein. Proceedings of the Twenty-Sixth International Conference on Machine Learning (ICML), 809-816, Montreal, Canada, 2009. [abs] [bib] [pdf]

  • Learning Heuristic Functions Through Approximate Linear Programming.

    M. Petrik and S. Zilberstein. Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), 248-255, Sydney, Australia, 2008. [abs] [bib] [pdf]
UMass Amherst