ARO Workshop on Abductive Reasoning

Evidence and Intelligent Systems
August 23-24, 2007

Marriott Inn and Conference Center UMUC
3501 University Blvd E
Adelphi, MD 20783
(301) 985-7300
Meeting Location:
Thai Room 1109/1111 (on main concourse)

 Day 1
  10:00- 10:30 Michael Pazzani
Rutgers University
Cliff Wang
Army Research Office
Workshop Overview
  10:30-11:00

John Josephson
Ohio State University
Abductive Inferences, Perception and Multi-Source
Information Fusion

(PowerPoint) (pdf)

  11:00-12:00 Pedro Domingos
University of Washington
Statistical Relational Learning 
(PowerPoint) (pdf)
 
  12:00- 1:00 Lunch
 
  1:00 - 2:00 Glenn Shafer
Rutgers University
Probability vs. Evidence   
(PowerPoint in pdf)
  2:00 - 2:15 Discussion: Abduction, Statistical Relational Learning, and Uncertainty 
 
  2:15 - 2:30 Break
 
  2:30 - 4:30 Statistical Relational Learning (SRL) Panel
 
 

Tina Eliassi-Rad
Lawrence Livermore National Laboratory
Learning Structure from Relational Observations: A Bayesian Approach
(PowerPoint) (pdf)

Lise Getoor
University of Maryland
Statistical Relational Learning for Abductive Reasoning in Heterogeneous Environments     
(PowerPoint) (pdf)

David Jensen
University of Massachusetts, Amherst
Learning Causal Dependencies with Relational Data
(pdf)

William Pottenger
Rutgers University
Higher Order Learning
(PowerPoint) (pdf)

 
  4:30 - 5:00 Discussion: Open research issues in Evidential Inference and Abductive Reasoning

                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Day 2
9:00 - 9:30 Pat Langley
Arizona State University
Robust Reasoning and Learning about Goal-Directed Activities

 (PowerPoint) (pdf)
9:30- 10:00 Amit Sheth
Wright State University
Trailblazing, Complex Hypothesis Evaluation, Abductive, Inference and
Semantic Web – exploring possible synergy
   
(PowerPoint) (pdf)
10:00-10:30
Michael Witbrock
Cycorp Inc., Austin, Texas
Known Unknowns - Probabilistic Inference and Large Knowledge Bases
(powerpoint) (pdf) (PowerPoint - Office 2007 format Zip file)

10:30-10:45

Break
 
10:45-11:15

Padhraic Smyth
University of California, Irvine
Statistical Learning and Probabilistic Abduction
(PowerPoint) (pdf)

11:15-11:45 Paul Cohen
University of Southern California
Searching for Interesting Plans as They Happen
(PowerPoint) (pdf)
11:45-12:15

Dana Nau
University of Maryland
Abductive Inference of Opponent Models
(PowerPoint) (pdf)


12:15 -1:15

Lunch
 
1:15 - 2:00 Wrap Up Discussion: A research agenda for Evidential Inference and Abductive Reasoning

Abstracts

1) Abductive Inferences, Perception and Multi-Source Information Fusion
John Josephson, Ohio State University

I will briefly introduce the idea of abductive inferences, describe some of the characteristics of such inferences, and argue briefly for several propositions about them, including the following: Abductive inferences are part of commonsense logic. They are "empirically real" as well as "normatively ideal." They occur in perceptual processing as well as in higher cognitive processes such as situation understanding, diagnosis, and scientific theory formation. Although they are fallible, abductive inferences provide (non-deductively) valid justifications for their conclusions. The confidence of an abductive conclusion properly depends on only a small number of considerations, as do decisions about the acceptance of abductive conclusions. Abductions can be used to infer hidden causes, such as intentions. "Abduction" represents both a conceptual framework for analysis and a basis for implementation. In particular, it is quite natural to implement multi-source information fusion as abductive inferencing, analogous to the information processing in biological perception and situation understanding. [Top]

2) Statistical Relational Learning
Pedro Domingos, University of Washington

Intelligent agents must be able to handle the complexity and uncertainty of the real world. Logical AI has focused mainly on the former, and statistical AI on the latter. In recent years, the field of statistical relational learning has set about combining the two. In this talk, I will survey some of the main concepts and techniques in this area, using Markov logic as a concrete example. Markov logic combines logic and probability by attaching weights to first-order formulas and viewing them as templates for features of Markov networks. I will present inference algorithms drawing on ideas from satisfiability testing, Markov chain Monte Carlo and knowledge-based model construction; and learning algorithms based on conditional likelihood, convex optimization and inductive logic programming. These algorithms are available in the open-source Alchemy system. Statistical relational learning techniques are the state-of-the-art approach to a growing number of problems, including entity resolution, link prediction, social network modeling, information extraction, and others. I will conclude by discussing open problems and research directions in this area. [Top]

3) Probability vs Evidence
Glenn Shafer, Rutgers University

How can reasoning from fragmentary evidence make use of statistical theories that require more structured starting points? Recent work on defensive forecasting casts new light on this question. The conventional wisdom is that statistical methods require an objective probability model, supplemented by subjective assumptions when no such model has been validated. But the success of defensive forecasting shows that the crucial question is not whether there is a valid model but simply whether there is an agreed-on repetitive structure for the question to be answered and the data for answering it. If we agree to place the current example in a particular sequence of other examples, then defensive forecasting can produce valid probability forecasts—valid insofar as they resist all statistical tests—without any assumptions. But if no single sequence of previous examples imposes itself, then we must go outside the framework of probability theory to weigh evidence. Reasoning to the best explanation means reasoning to the best repetitive structure.

References:
--“Probability and Finance: It’s Only a Game!” By Glenn Shafer and Vladimir Vovk, Wiley, 2001. Sample chapters, related publications, and working papers at
www.probabilityandfinance.com
--“Defensive Forecasting,” by Vladimir Vovk, Akimichi Takemura, and Glenn Shafer. Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics. Also Working Paper #8 at www.probabilityandfinance.com [Top]

4) Learning Structure from Relational Observations: A Bayesian Approach
Tina Eliassi-Rad, Lawrence Livermore National Laboratory

In many real-world applications, observations appear in the form of links or connections between objects (e.g. communication networks, financial transaction networks, social networks, etc). Such networked data contain valuable information that can be exploited in order to discover groups, predict links, and generally speaking better understand the structure within the observations. A fundamental issue in all structure discovery problems is that the actual size of the structure is unknown a priori. In most cases, this is addressed by running the same model several times with a different cardinality each time and selecting the one that provides the best fit to the data (e.g., based on a Bayes factor criterion in the case of Bayesian techniques). Obviously, it would be preferable to develop techniques that are able to infer the size of the structure from the data and simultaneously examine hypotheses of different cardinalities. Nonparametric Bayesian latent variable models provide such a flexible modeling environment, where the size of the model (i.e., the number of groups) can adapt to the available data and readily accommodate outliers. Of particular interest are models that can represent an object’s mixed membership by quantifying the degrees of membership to various groups. In this talk, I will describe such a model, called the Infinite Mixed Membership Model (IMMM). Our IMMM model has a hierarchical nonparametric prior that is based on the Chinese Restaurant Franchise. Inference is readily performed using Gibbs sampling. I will also describe an extension to our model that generates hierarchical structures from the data. Based on empirical studies on synthetic and real data, our model performs very well (as measured by the area under the ROC) when missing links abound. Three areas of future work include: (1) efficient inference procedures for nonparametric Bayesian approaches, (2) structure discovery by combining Bayesian and graph-theoretic approaches, and (3) nonparametric Bayesian models for time-evolving networked data.

This is joint work with P.S. Koutsourelakis (LLNL & Cornell University). [Top]

5) Statistical Relational Learning for Abductive Reasoning in Heterogeneous Environments
Lise Getoor, University of Maryland

New technology is making it possible to deploy large numbers of sensing devices with widely varying capabilities. Because of their large size and diverse properties, it will be extremely difficult to hand-code control strategies for these heterogeneous sensor networks. At the same time, effective strategies are extremely important to effectively integrate large amounts of data, to carefully allocate scarce resources such as power, and to detect anomalies that can occur due to malicious tampering.

Statistical Relational Learning (SRL) is an emerging research area that combines probabilistic (statistical) inference with logical (relational) inference. Its roots are in probabilistic graphical models, but it is distinguished by its use of logical representations and constraints to model and exploit domain structure and regularities. It is well-suited to domains in which there is a mix of low-level noisy observational data that needs to be combined together based on logical relationships such as proximity, field-of-view, and other contextual constraints.

In this presentation, I will survey some preliminary work from the SRL community on intelligent, cost-sensitive data acquisition and reasoning, and describe some of my group’s work on effective real-time, goal-directed SRL methods. In addition, I will describe some of the outstanding research challenges for practical, abductive SRL systems. [Top]

6) Learning Causal Dependencies with Relational Data
David Jensen, University of Massachusetts, Amherst


Over the past decade, machine learning researchers have developed techniques to estimate the joint distribution of a set of variables that span multiple related entities.  These methods, often grouped under the rubric "relational learning", include probabilistic relational models, relational Markov networks, and relational dependency networks.  These techniques build on work in artificial intelligence, statistics, databases, graph theory, and social network analysis, and they are profoundly expanding the phenomena that we can understand and predict.  However, new frontiers await.

Current techniques are capable of learning only a subset of the knowledge needed by practitioners in military, intelligence, and industrial domains.  Informing effective action in these domains often requires a causal model.  I will examine the open question of whether relational representations make the problem of learning causal models easier or harder, and present some reasons for optimism that relational representations may be able to greatly improve our ability to learn such models. [Top]

7) Higher Order Learning
William Pottenger, Rutgers University

Modern data mining algorithms primarily leverage first-order relations, aiming to discover patterns such as association or classification rules only within records. The underlying assumption is that records are independent and identically distributed (Taskar et al., 2002). This assumption simplifies the underlying mathematics of statistical models, but in fact does not hold for many real world applications (Getoor & Diehl, 2005). Although useful models can be learned, in many cases more accurate models can be learned if higher-order associations are leveraged by the data mining algorithm – associations between records. Coupled with this is a growing need for data fusion due to increasing data fragmentation (Rahm & Bernstein, 2001). Consequently, there is a greater need to incorporate contextual information – i.e., higher-order associations that cross record boundaries.

Some limited higher-order information is already employed in a number of applications including market basket analysis, law enforcement, homeland defense, etc. For example, an important law enforcement application that employs 2nd-order associations is COPLINK(r) Detect (Hauck et al., 2002), which assists law enforcement personnel using textual entity extraction and link-generation to build a semantic network. Mooney et al. (2002) discuss a link discovery algorithm developed as part of the DARPA Evidence Extraction and Link Discovery program that uses Inductive Logic Programming to mine multi-relational data. Tan et al. (2000; 2002) intimate that the future of cross-selling in the e-Marketplace may be greatly impacted by the results of further study of higher-order associations. This is in keeping with our own results on real world e-Marketplace data, which indicate that the latent information contained in higher-order associations can be leveraged to build more effective association rule models. Yet other than our own, existing approaches at best incorporate only a fraction of the available higher-order latent information in that most are limited to 2nd-order associations.

In this presentation we outline a Higher Order Learning framework that extends both traditional association rule mining and traditional approaches to classification to a higher-order space. Unlike existing approaches, the framework provides a theoretical foundation for higher-order learning that utilizes all available higher-order associations. The framework has two components: Higher Order Path Analysis (HOPA), a supervised learning method based on higher-order associations; and Higher Order Association Mining (HOAM), an unsupervised learning method based on higher-order associations. HOPA extends the traditional classification models to make use of the contextual information, i.e., higher-order associations. Preliminary results indicate that the classes of instances in labeled training data are separable based on statistical comparisons of distributions of higher-order association frequencies. Similarly, HOAM extends the context in traditional association rule mining algorithms to include higher-order associations. Compared with traditional ARM, HOAM discovers more novel information and identifies important knowledge more quickly by leveraging smaller datasets. Current machine learning algorithms are almost all limited to 2nd-order relations, whereas our preliminary results indicate that much is to be gained from extending beyond 2nd-order to 3rd, 4th and even higher orders. In addition, we have also obtained preliminary results that indicate higher order associations may provide insight into the quality (representativeness) of labeled training data in supervised machine learning applications. [Top]

8) Robust Reasoning and Learning about Goal-Directed Activities
Pat Langley, Arizona State University

One key facet of intelligence is the ability to interpret events in the environment, including abductive inference about the activities and motives of other agents. Despite their importance, there has been far less research on activity interpretation and plan understanding than there has been on plan execution and generation. Nevertheless, many ideas from the latter traditions are relevant to the former and can be adapted to address them.

In this talk, I review work on hierarchical task networks, a formalism that encodes ways to decompose tasks into subtasks and that has been used successfully for knowledge-based planning and execution. I focus on a special class of these structures - teleoreactive logic programs - that support both the explanation of observed behavior sequences and learning network structure from them.

Our initial efforts have focused on logical analysis of behavior traces, but we are developing an extended formalism - Markov task networks - that incorporates probabilistic information about task clauses and that supports plausible inference in the presence of incomplete and uncertain information. Unlike more general formalisms, Markov task networks are designed to support abductive inference about goal-directed behaviors as they are observed over time.

I will also describe our plans for learning within this framework.

Unlike most statistical relational methods, our approach uses knowledge-guided inference over individual traces to generate candidate relational structures. Statistical evidence is used to evaluate these hypotheses and to revise probabilities associated with each structure. We claim that such a knowledge-driven approach will learn from far fewer cases than ones which rely on statistical regularities to generate hypotheses.

In closing, I will discuss extensions of Markov task networks that are needed to support abductive inference in realistic settings. These include handling geographically distributed data, supporting goals that involve more than simple achievement, and reasoning about the activities of agents that are acting to reach common objectives. [Top]

9) Trailblazing, Complex Hypothesis Evaluation, Abductive Inference and Semantic Web – exploring possible synergy
Amit P. Sheth, Wright State University

Dr. Vannevar Bush outlined his vision for Memex in a 1945 Atlantic Monthly article [B45]. Describing how the human brain navigates an information space in what he called trailblazing, Dr. Bush said, “It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain.” Now that we can label content with associated meaning (semantics), using the techniques of information extraction, we can develop an interactive, human-directed, semantic browsing environment in which analysts can explore heterogeneous content from disparate sources in a kind of stream-of-consciousness, identifying one item of interest and then following contextually relevant links to another [SR07]. See Figure 1.

Semantic Query Complex Query
Figure 1: Semantic Browser Interface showing neoplastic process "degree of" experimental model of disease Figure 2: Complex Hypothesis linking Migrane and Magnesium that can be divided into sub-hypotheses and validated against a set of relevant documents

A second, complementary form of analysis that is now possible is to specify a complex hypothesis, break it down into different pieces, and look for evidence to match parts of the hypothesis (Figure 2).

In current Semantic Web research, longstanding work in IR, information extraction, NLP, statistical NLP, graph and query processing, among other techniques, is paired with knowledge representation and populated ontologies (schemas and associated fact/knowledge base) for the following activities that process all varieties of data: structured, semi-structured, and unstructured/text:

Extraction of entities [HSK02] and relationships [RSK06] (which form the semantic annotation or labeling of data—a key characteristic of Semantic Web) and their representation in RDF, which models named relationships (predicated in triples of the form <subject, predicate, object>

Subgraph extraction [RMPS05], path computations (semantic associations discovery) [AMS07], similarity, causality and other pattern discovery;

Support for higher-order activities including semantic search and querying [AMS05], semantic browsing, reasoning, analysis, hypothesis evaluation [STP03], discovery, explanation, question-answering and visualization

Some of the activities above involve processing in which complete certainty is not possible (e.g., entity and relationship extraction techniques applied to text). Some involve the need to process graphs with weights. Still others involve the need to model knowledge that cannot be modeled using the mainstream representation choice of description logics (such as OWL) and that requires fuzzy logic.. We used the terms “implicit semantics,” “formal semantics,” and “power semantics” to describe the range of requirements and choices [SRT05].

Abduction or abductive inference is about “inference to the best explanation” [JJ94] and “reasoning from given data to a hypothesis that explains the data” [W04]. Both the principles and the techniques associated with abduction seem to have strong similarity to the activities and techniques in Semantic Web, such as (a) creating metadata (also called semantic annotation), which involves information processing with a considerable degree of uncertainty regarding what entity or relationship best describes what is captured in the data or text; (b) complementing incomplete information using domain knowledge provided by relevant ontologies; (c) reasoning over representations with uncertain information, such as processing of graphs with weighted edges; or (d) explaining search ranking or discovery in terms of the metadata used to do the corresponding reasoning, where there is uncertainty about the quality of metadata extracted through the use of external knowledge from an ontology, or about the provenance of knowledge applied in completing the reasoning. This talk will present various examples from contemporary Semantic Web research where abduction has a potential to enrich the research outcome. [Top]

10) Known Unknowns - Probabilistic Inference and Large Knowledge Bases
Michael Witbrock, Cycorp, Inc., Austin, Texas

One aspect of Cyc is a very large, logic-based knowledge base that includes, inter-alia, large amounts of background knowledge about military and counter-terrorism matters, but it is more than that; the Cyc project is an attempt to move towards general artificial intelligence by supporting automated reasoning about a very wide variety of real-world concerns. To support that goal, Cyc also encompasses, obviously enough, and inference engine able to reason over a large, contextual, knowledge base, but it also includes components for interpreting and producing natural language, acquiring knowledge and responding to user queries, and for interfacing with other software.

Applying logic to representation of general knowledge, /at scale/, and using it in the production of intelligent behaviors has been difficult enough; unfortunately it is becoming clear that doing so using traditional logics is probably not sufficient, either for satisfying a long term goal of supporting general intelligence, or even for shorter term goals, like recognizing, interpreting, and elaborating descriptions of piracy events.

In this talk, I'll briefly describe what Cyc is, and has been, touch on an early approach to abductive reasoning and classification in a traditional logical framework, and some difficulties with that approach, and then describe recent, very initial work on integrating Markov Logic within the millions of axioms of the Cyc KB. Finally I'll sketch a vision for a system that truly integrates both sound, deductive reasoning, and the bounded unsoundness of probabilistic classification, induction, abduction and deduction. [Top]

11) Statistical Learning and Probabilistic Abduction
Padhraic Smyth, University of California, Irvine

The probabilistic approach to abduction searches for the most probable hypotheses to explain a set of observed evidence. This probabilistic framework is now a core component of modern-day artificial intelligence, in areas such as reasoning, vision, language understanding, speech recognition, and more. An implicit assumption in probabilistic abduction is the existence of a probability model, often described as a graphical model such as a Bayesian or Markov network. A natural question is where do these models come from? In the early days of artificial intelligence such models were often built by hand, but this does not scale well to large domains. The availability of massive archives of data has accelerated development of statistical learning techniques that can construct very large probabilistic models automatically from such data. In this talk I will describe recent work with models that require learning and reasoning about millions of variables, with illustrative applications involving language modeling and sensor networks. The talk will conclude with brief comments on future research directions. [Top]

12) Searching for Interesting Plans as They Happen
Paul Cohen and Aram Galstyan, University of Southern California

One application of abduction is inferring plans from observed actions. Most researchers can handle only individual plans or small group plans, and they assume that all or most observed actions are relevant. I will describe a problem in which O(10^5) entities act according to their plans but only some of their plans are worth inferring. In our Hats simulator, almost all the actors are benign but some of them -- and we don't know which -- are terrorists. The problem is to manage inference so we discover terrorist plans without having to infer all the plans of all the actors. We frame this as a search problem. I will present preliminary results. [Top]

13) Abductive Inference of Opponent Models
Dana Nau, University of Maryland

Opponent modeling -- i.e., building models of other agents' behavior -- is becoming increasingly important in military operations, economic modeling, cultural modeling, computer games, and other applications. The University of Maryland's Laboratory for Computational Cultural Dynamics (LCCD) has a major research effort on abductive techniques for inferring opponent models. I'll summarize our SOMA (Stochastic Opponent Modeling Agents) formalism. SOMA lets us express the probability that a given person or group will act in a certain way in a given situation, and we have techniques for abductively inferring SOMA rules from observed behavior.

As a concrete example, I'll discuss the Noisy IPD, a modified version of the famous Iterated Prisoner's Dilemma (IPD) that includes the possibility of accidents or miscommunications. The noise causes standard IPD strategies, such as Tit-for-Tat, to do quite badly. Our DBS algorithm uses abduction to build a model of the other player's strategy by observing the player's behavior. DBS uses this model to help decide its own moves, and automatically detects whether anomalies in the other player's behavior represent noise or deliberate changes of strategy. I'll discuss DBS's performance in the 2005 Iterated Prisoner's Dilemma competition, where it did extremely well. [Top]

Michael J. Pazzani
Rutgers, The State University of New Jersey
VP for Research and Graduate and Professional Education
3 Rutgers Plaza, ASB III, 3rd Floor
New Brunswick, NJ 08901
pazzani@rutgers.edu
732-932-1500
732-932-0145 (fax)

Assistants:
Rennie Roberson
vpr-admin@orsp.rutgers.edu
732-932-1500

Mary Feldenkreiss
felden@orsp.rutgers.edu
732-932-0150 x3015