Information Gathering Using Decision Models
Shlomo Zilberstein, PI, Victor Lesser, CoPI
This project is aimed at developing a decision-theoretic approach to information gathering from a large distributed network of information sources. The project is motivated by the rapid growth in on-line information sources such as digital libraries, independent news agencies, government agencies, as well as human experts providing a variety of services. A continued expansion of these services is expected over the next 5-10 years. In addition, improved information retrieval (IR) and information extraction (IE) techniques are becoming available, allowing a system not only to locate but also to extract necessary information from unstructured textual documents. The large number of information sources that are currently emerging and their different levels of accessibility, reliability and associated costs present a complex information gathering planning problem that a human decision maker cannot possibly solve.
A fundamental premise of our approach to the problem is that information gathering is an intermediate step in a decision making process. We provide the system with the user's decision model so that it can prioritize, plan and execute information gathering actions. We have designed a system architecture for information gathering composed of three layers that operate concurrently: the user interface (UI), the decision model evaluation subsystem (DME), and the information gathering subsystem (IG). Each layer is engaged in activation, monitoring, and negotiation with the lower layers. Our approach to this research agenda is incremental in that at the end of each year we plan to have the three components (UI, DME, and IG) working together. We have made substantial progress with the DME and IG subsystems and are planning to have an integrated prototype by the end of the first year.
The core of the DME subsystem is a value-driven information gathering planner that receives an influence diagram representing the user's decision model. The information gathering planner issues requests based on the value of information taking into account the reliability, responsiveness, and monetary costs associated with each source of information. At any given time, the system assesses the marginal value of dispatching new queries and selects the one with maximal value. When no further improvement of the comprehensive utility function is possible, the system stops gathering information and reports the results.
The IG subsystem is being implemented based on the RESUN blackboard-based architecture. In this framework we are casting the information-gathering task as an interpretation task. The input to the subsystem, instead of being raw sensor data, is a set of documents gathered from the WWW. Uncertainty in this domain comes from ambiguity in the process of extracting features from documents, the quality of the sites from which documents are taken, and the breadth of the search performed in finding WWW documents. The task we are implementing is making a decision about which software to purchase, given criteria specified by the user in terms of cost, desirable features, and hardware platform constraints. The subsystem contains the following components: knowledge sources for text extraction, document retrieval, document parsing, software feature integration and feature evaluation.
We have also developed a new scheduler that we feel is appropriate as a basis for scheduling information gathering subtasks. It deals with dynamic multi-dimensional utility criteria and reasons about the trade-offs of different possible courses of action. The scheduler uses a diverse model of task achievement where multiple actions exist for achieving a given task and the actions are characterized statistically via discrete probability distributions that describe the expected quality, cost, and duration behavior of the action. Thus the scheduler reasons about quality, cost, and duration trade-offs, as well as uncertainty about quality, uncertainty about cost, and uncertainty about duration. This allows scheduler clients to specify complex notions like "solution quality is twice as important as the search cost, but the duration must be under five minutes and I want a low probability of failure." The scheduler will also handle the scheduling of asynchronous tasks so that multiple document retrievals and search engine accesses can be initiated concurrently from a single agent.
To summarize, although much work has been done on information gathering and decision making separately, little work has capitalized on the synergy that can develop when these two problems are solved together. The system that we are developing exploits an explicit representation of the user's decision model so that information gathering activity can be organized on the basis of its effect on the quality of the decision. When operating under resource constraints (related to cost of communication and database access, limited computational power, and limited amount of time), we anticipate that this project will lead to substantial performance improvement over current information gathering techniques.
- A Value-Driven System for Autonomous Information Gathering
- Control in a 3D Reconstruction System Using Selective Perception
- From HTML to Usable Data: Problems in Meaning and Credibility in the WWW
- Value-Driven Information Gathering
- Using Anytime Algorithms in Intelligent Systems
- Intelligent Information Gathering Using Decision Models
- Optimizing Decision Quality with Contract Algorithms