Decision-Theoretic Foundations for Multi-Agent Planning
National Science Foundation,
Division of Information & Intelligent Systems
Air Force Office of Scientific Research,
Systems and Software Program
Shlomo Zilberstein, PI
Research Assistants: Christopher Amato, Alan Carlin, Akshat Kumar, Marek Petrik
Collaborators: Martin Allen, Xiaoping Chen, Claudia Goldman, Feng Wu
A fundamental challenge in artificial intelligence is how to achieve intelligent coordination of a group of decision makers in spite of stochasticity and limited information. Decision theory offers a normative framework for optimizing decisions under uncertainty. But due to computation complexity barriers, developing decision-theoretic reasoning algorithms for multi-agent systems is a serious challenge. In this project we develops a comprehensive approach to accomplish this using the framework of decentralized Markov decision processes. Target applications of this work involve a group of decision makers or agents that operate under uncertainty and rely on noisy sensors. Each agent may obtain different partial information about the global state. Such decentralized planning and coordination problems abound in practice. Examples include autonomous mobile robots, distributed management of servers or a power grid, as well as the operation of complex human organizations. In such problems, centralized planning is inadequate, because agents cannot share all their information all the time.
Despite rapid progress in recent years in decentralized decision making, existing algorithms can only solve small "toy" problems. Compared with single-agent MDPs, research of decentralized MDPs is in its infancy. Classical MDPs and POMDPs solution techniques are often inapplicable because of some fundamental differences between centralized and decentralized operation. The goal of this project is to develop well-founded coordination strategies that are widely applicable and scalable to large domains. This requires the development of approximation algorithms that alleviate the large memory requirements of current solution techniques and scale up to larger problems. It is also important to derive error bounds for these algorithms that would make it possible to decide when a satisfactory plan is available. To evaluate these algorithms, we are also developing a new set of challenging test problems and benchmarks.
The resulting solution techniques are based on several principles: (a) combining dynamic programming with heuristic search to produce scalable solution techniques, (b) developing observation compression methods that identify the most relevant subset of observations to be considered while bounding the error, (c) developing algorithms that represent policies compactly as finite-state controllers, (d) exploiting randomization and correlation mechanisms to improve the value of compact policies, (e) developing quantitative approaches to represent and reason about interaction and to manage the tradeoff between the degree of interaction and problem hardness, and (f) exploiting the locality and structure of interactions between agents to improve the efficiency of coordination algorithms. The evaluation of these techniques has already shown that they can improve the scalability of existing algorithms by several orders of magnitude.
- AAMAS 2013 Workshop on Multiagent Sequential Decision Making Under Uncertainty
- AAMAS 2012 Workshop on Multiagent Sequential Decision Making Under Uncertainty
- AAMAS 2011 Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains
- AAMAS 2010 Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains
- Qualitative Planning under Partial Observability in Multi-Agent Domains.
- Monte-Carlo Expectation Maximization for Decentralized POMDPs.
- Parameter Learning for Latent Network Diffusion.
- Automated Generation of Interaction Graphs for Value-Factored Dec-POMDPs.
- Multiagent Planning, Control, and Execution.
- Message-Passing Algorithms for MAP Estimation Using DC Programming.
- Online Planning for Multi-Agent Systems with Bounded Communication.
- Decentralized Monitoring of Anytime Decision Making.
- Message Passing Algorithms for Large Structured Decentralized POMDPs.
- Message-Passing Algorithms for Quadratic Programming Formulations of MAP Estimation.
- Scalable Multiagent Planning Using Probabilistic Inference.
- Online Planning for Ad Hoc Autonomous Agent Teams.
- On Message-Passing MAP Estimation in Graphical Models and DCOPs.
- Bounded Rationality in Multiagent Systems Using Decentralized Metareasoning.
- Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms.
- Point-Based Policy Generation for Decentralized POMDPs.
- Anytime Planning for Decentralized POMDPs using Expectation Maximization.
- Rollout Sampling Policy Iteration for Decentralized POMDPs.
- Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs.
- Trial-Based Dynamic Programming for Multi-Agent Planning.
- Policy Iteration for Decentralized Control of Markov Decision Processes.
- A Bilinear Programming Approach for Multiagent Planning.
- Analyzing Myopic Approaches for Multi-Agent Communications.
- Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs.
- Achieving Goals in Decentralized POMDPs.
- Constraint-Based Dynamic Programming for Decentralized POMDPs with Structured Interactions.
- Value of Communication In Decentralized POMDPs.
- Dynamic Programming Approximations for Partially Observable Stochastic Games.
- Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation.
- Incremental Policy Generation for Finite-Horizon DEC-POMDPs.
- Multi-Agent Online Planning with Communication.
- Myopic and Non-Myopic Communication Under Partial Observability.
- Complexity of Decentralized Control: Special Cases.
- Communication-Based Decomposition Mechanisms for Decentralized MDPs.
- Formal Models and Algorithms for Decentralized Decision Making under Uncertainty.
- A Successive Approximation Algorithm for Coordination Problems.
- Value-Based Observation Compression for DEC-POMDPs.
- Heuristic Policy Iteration for Infinite-Horizon Decentralized POMDPs.
- Observation Compression in DEC-POMDP Policy Trees.
- Optimizing Fixed-Size Stochastic Controllers for POMDPs.
- POMDP and DEC-POMDP Point-Based Observation Aggregation.
- Interaction Structure and Dimensionality in Decentralized Problem Solving
- Learning to Communicate in a Decentralized Environment.
- Solving POMDPs Using Quadratically Constrained Linear Programs.
- Memory-Bounded Dynamic Programming for DEC-POMDPs.
- Bounded Dynamic Programming for Decetralized POMDPs.
- Optimizing Memory-Bounded Controllers for Decentralized POMDPs.
- Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs.
- Agent Influence as a Predictor of Difficulty for Decentralized Problem-Solving.
- Anytime Coordination Using Separable Bilinear Programs.