ptabdata.blob.core.windows.net · 2020. 9. 11. · prenticehall series in artificial intelligence...

Artificial IntelligenceA Modern Approach

SECOND EDITION

Stuart Russell PeterPrentice Hall Series in Artificial Intelligence

Netflix, Inc. - Ex. 1010, Page 000001 IPR2020-01582 (Netflix, Inc. v. Avago Technologies International Sales PTE. Limited)

Artificial Intelligence A Modern Approach

Second Edition


PRENTICE HALL SERIESIN ARTIFICIAL INTELLIGENCE Stuart Russell and Peter Editors

PONCE Computer Vision: A Modern ApproachGRAHAM ANSI Common Lisp

MARTIN and ProcessingNEAPOLITAN Learning Bayesian Networks RUSSELL NORVIG Intelligence: A Modern Approach

FORSYTH &

JURAFSKY &

&

Norvig,

Speech

Artificial

Language


Artificial IntelligenceA Modern Approach

Second Edition

Stuart J. Russell and Peter Norvig

Contributing writers: John F. Canny

Douglas D. EdwardsJitendra M. Malik Sebastian

Education, Upper Saddle River; New Jersey 07458

Thrun

- Pearson Inc.,


Library of Congress Data

CIP Data on file.

Vice President and Editorial Director, ECS: Marcia J.Publisher: Alan R. Apt Associate Editor: Toni Dianne Holm Editorial Assistant: Patrick LindnerVice President and Director of Production and Manufacturing, ESM: David RiccardiExecutive Managing Editor: Vince Assistant Managing Editor: Camille TrentacosteProduction Editor: Irwin Zucker Manufacturing Manager: Trudy Pisciotti Manufacturing Buyer: LisaDirector, Creative Services: Paul Creative Director: Carole Art Editor: Greg Art Director: Heather ScottAssistant to Art Director: Geoffrey Cassar Cover Designers: Stuart Russell and Peter Norvig Cover Image Creation: Stuart Russell and Peter Norvig; Tamara Newnam and Van AckerInterior Designer: Stuart Russell and Peter Norvig Marketing Manager: Pamela Shaffer Marketing Assistant:

2003, 1995 Education, Inc. Education, Inc.,

Upper Saddle New Jersey 07458

All rights reserved. No part of this book may he reproduced, in any form or by any means, without permission in writing from the

The author and publisher of this hook have their best efforts in preparing this hook. These effortsinclude the development, research, and testing of the theories and programs to determine their effectiveness. The author and publisher make no warranty of any kind, express or implied, with regard to these programsor the documentation contained in this hook. The author and publisher shall not be liable in any event forincidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.

Printed in the United States of America

I S B N

Education Ltd., LondonEducation Australia Pty. Ltd., SydneyEducation Singapore, Pte. Ltd.Education North Asia Ltd., Hong Kong Education Canada, Inc., Toronto

d e Mexico. S.A. d eEducation-Japan, TokyoEducation Malaysia, Pte. Ltd.Education, Inc., Upper New Jersey

Cataloging-in-Publication

Dulles

O'Brien

McDowell Belfanti

Anson

Barrie Reinhold

-

© Pearson

by Pearson

River,

publisher.

used

10 9 8 7

0-13-790395-2

Pearson Pearson Pearson Pearson Pearson Pearson Educaci6n Pearson Pearson Pearson

c.v.

Saddle River,

Horton

w.

Patrice


For Loy, Gordon, and Lucy- S.J.R.

For and Juliet- P.N.Kris, Isabella,


PrefaceArtificial Intelligence (AI) is a big field, and this is a big book. We have tried to explore the full breadth of the field, which encompasses logic, probability, and continuous mathematics; perception, reasoning, learning, and action; and everything from devices to robotic planetary explorers. The book is also big because we go into some depth in presenting results, although westrive to cover only the most central ideas in the main part of each chapter. Pointers are given tofurther results in the bibliographical notes at the end of each chapter.

The subtitle of this book is "A Modern Approach." The intended meaning of this rather emptyphrase is that we have tried to synthesize what is now known into a common framework, rather thantrying to explain each of in its own historical context. We apologize to those whose subfields are, as a result, less recognizable than they might have been.

The main unifying theme is the idea of an intelligent agent. We define as the study ofagents that receive percepts from the environment and perform actions. Each such agent implements a function that maps percept sequences to actions, and we cover different ways to represent these func-tions, such as production systems, reactive agents, real-time cortditional planners, neural networks, and decision-theoretic systems. We explain the role of learning as extending the reach of the designer into unknown environments, and we show how that role constrains agent design, favoring explicitknowledge representation and reasoning. We treat robotics and vision not as independently definedproblems, but as occurring in the service of achieving goals. We stress the importance of the taskenvironment in determining the appropriate agent design.

Our primary aim is to convey the ideas that have emerged over past fifty years of researchand the past two millenia of related work. We have tried to avoid excessive formality in the presen-tation of these ideas while retaining precision. Wherever appropriate, we have included pseudocode algorithms to make the ideas concrete; our pseudocode is described briefly in Appendix B. Implemen-tations in several languages are available on the book's Web site, aima.cs.berkeley.edu.

This book is primarily intended for use in an undergraduate course or course sequence. It can also be used in a graduate-level course (perhaps with the addition of some of the primary sourcessuggested in the bibliographical notes). Because of its comprehensive coverage and large number ofdetailed algorithms, it is useful as a primary reference volume for graduate students and profes-sionals wishing to branch out beyond their own subfield. The only prerequisite is familiarity withbasic concepts of computer science (algorithms, data structures, complexity) at a sophomore level. Freshman calculus is useful for understanding neural networks and statistical learning in detail. Some of the required mathematical background is supplied in Appendix A.

Overview of the bookThe book is divided into eight parts. Part I, Artificial Intelligence, offers a view of the enterprisebased around the idea of intelligent agents-systems that can decide what to do and then do it. Part

Problem Solving,concentrates on methods for deciding what to do when one needs to think ahead several steps-for example in navigating across a country or playing chess. Part Knowledge andReasoning,discusses ways to represent knowledge about the world-how it works, what it is currently like, and what one's actions do-and how to reason logically with that knowledge. Part IV,Planning, then discusses how to use these reasoning methods to decide what to do, particularly byconstructing plans. Part Uncertain Knowledge and Reasoning, is analogous to Parts and IV,but it concentrates on reasoning and decision making in the presence of uncertainty about the world,as might be faced, for example, by a system for medical diagnosis and treatment.

Together, Parts describe that part of the intelligent agent responsible for reaching decisions. Part VI, Learning,describes methods for generating the knowledge required by these decision-making

microelectronic

subfield AI otherwise

AI

the AI

programming

AI

Al

II, III,

might

V, III

II-V


... Preface

components. Part VII, Communicating, Perceiving, and Acting, describes ways in which an intel-ligent agent can perceive its environment so as to know what is going on, whether by vision, touch, hearing, or understanding language, and ways in which it can turn its plans into real actions, either asrobot motion or as natural language utterances. Finally, Part VIII, Conclusions,analyzes the past andfuture of and the philosophical and ethical implications of artificial intelligence.

Changes from the first edition Much has changed in since the publication of the first edition in 1995, and much has changed in thisbook. Every chapter has been significantly rewritten to reflect the latest work in the field, to reinterpret old work in a way that is more cohesive with new findings, and to improve the pedagogical flow ofideas. Followers of should be encouraged that current techniques are much more practical than those of 1995; for example the planning algorithms in the first edition could generate plans of onlydozens of steps, while the algorithms in this edition scale up to tens of thousands of steps.orders-of-magnitude improvements are seen in probabilistic inference, language processing, and othersubfields. The following are the most notable changes in the book:

In Part I, we acknowledge the historical contributions of control theory, game theory, economics,and neuroscience. This helps set the tone for a more integrated coverage of these ideas insubsequent chapters.In Part online search algorithms are covered and a new chapter on constraint satisfaction hasbeen added. The latter provides a natural connection to the material on logic.In Part propositional logic, which was presented as a stepping-stone to first-order logic in the first edition, is now presented as a useful representation language in its own right, with fast inference algorithms and circuit-based agent designs. The chapters on first-order logic havebeen reorganized to present the material more clearly and we have added the Internet shopping domain as an example.In Part IV, we include newer planning methods such as GRAPHPLAN and satisfiability-basedplanning, and we increase coverage of scheduling, conditional planning, planning,and multiagent planning. In Part we have augmented the material on Bayesian networks with new algorithms, suchas variable elimination and Markov Chain Monte we have created a new chapter onuncertain temporal reasoning, covering hidden models, Kalman filters, and dynamicBayesian networks. The coverage of decision processes is deepened, and we add sec-tions on game theory and mechanism design.

Part VI, we tie together work in statistical, symbolic, and neural learning and add sections onboosting algorithms, the EM algorithm, instance-based learning, and kernel methods (supportvector machines). In Part VII, coverage of language processing adds sections on discourse processing and gram-mar induction, as well as a chapter on probabilistic language models, with applications to in-formation retrieval and machine translation. The coverage of robotics stresses the integration ofuncertain sensor data, and the chapter on vision has updated material on object recognition.In Part VIII, we introduce a section on the ethical implications of AI.

Using this bookThe book has 27 chapters, each requiring about a week's worth of lectures, so working through the whole book requires a two-semester sequence. Alternatively, a course can be tailored to suit the inter-ests of the instructor and student. Through its broad coverage, the book can be used to support such

Vlll

AI

•

• II,

• III,

•

• V,

• In

•

•

AI

AI

Markov

Carlo, and Markov

hierarchical

Similar


Preface

courses, whether they are short, introductory undergraduate courses or specialized graduate courses on advanced topics. Sample syllabi from the more than 600 universities and colleges that have adoptedthe first edition are shown on the Web at aima.cs.berkeley.edu,along with suggestions to help you finda sequence appropriate to your needs.

The book includes 385 exercises. Exercises requiring significant programming are witha keyboard icon. These exercises can best be solved by taking advantage of the code repository at

Some of them are large enough to be considered term projects. A. number ofexercises require some investigation of the literature; these are marked with a book icon.

Throughout the book, important points are marked with a pointing icon. We have included anextensive index of around 10,000 items to make it easy to ffind things in the book. Wherever a new

NEW TERM term is first defined, it is also marked in the margin.

Using the Web siteAt the aima.cs.berkeley.eduWeb site you will find:

implementations of the algorithms in the book in several programming languages,a list of over 600 schools that have used the book, many with links to online course materials, an annotated list of over 800 links to sites around the with useful content,a chapter by chapter list of supplementary material and links,instructions on how to join a discussion group for the book, instructions on how to contact the authors with questions or comments,

0 instructions on how to report errors in the book, in the likely event that some exist, andcopies of the figures in the book, along with slides and other material for instructors.

AcknowledgmentsJitendra Malik wrote most of Chapter 24 (on vision). Most of Chapter 25 (on robotics) writtenby Sebastian Thrun in this edition and by John Canny in the first edition. Doug Edwards researched the historical notes for the first edition. Tim Huang, Mark and Cynthia Bruyns helped withformatting of the diagrams and algorithms. Alan Apt, Sondra Chavez, Toni Holm, Jake Warde, Irwin Zucker, and Camille Trentacoste at Prentice Hall tried best to keep us on schedule and mademany helpful suggestions on the book's design and content.

Stuart would like to thank his parents for their continued support and encouragement and his wife, Loy Sheflott, for her endless patience and boundless He hopes that Gordon and Lucywill soon be reading this. RUGS (Russell's Unusual Group of Students) have been unusually helpful.

Peter would like to thank his parents and Gerda) for getting him started, and his wife(Kris), children, and friends for encouraging and tolerating him through the long hours of andlonger hours of rewriting.

We are indebted to the librarians at Berkeley, Stanford, MI?; and NASA, and to the developers of and who have revolutionized the way we do research.

We can't thank all the people who have used the book and made suggestions, but we wouldlike to acknowledge the especially helpful comments of Eyal Amnr, Krzysztof Apt, Aziel, Jeff

Baalen, Brian Baker, Don Barker, Tony James Newton Bass, Don Beal, Howard Beck,Wolfgang John Binder, Larry David R. Gerhard Brewka, Selmer Carla Brodley, Chris Brown, Wilhelm Burger, Lauren Joao Cachopo, Murray Campbell, Nor-man Carver, Anil Chakravarthy, Dan Roberto Cipolla, David Cohen,James Coleman, Julie Ann Comparini, Gary Cottrell, Ernest Rina Dechter, Tom Dietterich,Chuck Dyer, Barbara Engelhardt, Doug Edwards, Kutluhan Etzioni, Hana Filip, Douglas

airna.cs. berkeley.edu .

• • • • • •

•

CiteSeer Google,

Van Bibel,

Emmanuel Castro,

(Torsten

Barrett, Bookman,

web

Paskin,

their

wisdom.

Boxall, Burka,

Chisarick, Davis,

Erol, Oren

AI

lX

marked

was

writing

Ellery

Bringsjord,


X Preface

Fisher, Jeffrey Forbes, Ken Ford, John Fosler, Alex Franz, Bob Futrelle, Marek Stefanberding, Stuart Gill, Sabine Glesner, Seth Gosta Grahne, Russ Greiner, EricGrosz, Larry Hall, Steve Hanks, Othar Hansson, Jim Hendler, Herrmann,ant Honavar, Tim Huang, Seth Hutchinson, Joost Jacob, Johansson, Dan Jurafsky, LeslieKaelbling, Keiji Kanazawa, Surekha Kasibhatla, Simon Kasif, Henry Kautz, Kerschbaumer,Richard Kirby, Kevin Knight, Sven Koenig, Daphne Koller, Rich Korf, James Kurien, John Lafferty, Gus Larsson, John Lazzaro, Jon Jason Leatherman, Frank Lee, Edward Lim, Pierre veaux, Don Loveland, Mahadevan, Jim Martin, Andy Mayer, David Jaysohn, Brian Milch, Steve Vibhu Mittal, Leora Morgenstern, Stephen Muggleton, Kevin Mur-phy, Ron Sung Myaeng, Lee Naish, Pandu Bernhard Stuart Nelson,Nguyen, Nourbakhsh, Steve Omohundro, David Page, David Palmer, David Ron Parr,Mark Tony Michael Wim Ira Pohl, Martha Pollack, David Poole, Bruce Porter, Malcolm Pradhan, Bill Pringle, Lorraine Prior, Greg William Rapaport, Philip Resnik, Francesca Rossi, Jonathan Schaeffer, Richard Scherl, Lars Schuster, Soheil Shams, Stuart Shapiro, Jude Shavlik, Satinder Singh, Daniel Sleator, David Smith, Bryan So, Robert Sproull, Lynn Stein,Larry Stephens, Stolcke, Paul Stradling, Devika Subramanian, Rich Sutton, Jonathan Tash,Austin Tate, Michael Thielscher, William Thompson, Sebastian Eric Tiedemann, Mark Tor-rance, Randall Paul Utgoff, Peter van Beek, Hal Varian, Sunil Vemuri, Jim Waldo, BonnieWebber, Dan Weld, Michael Michael Dean White, Whitehouse, Brian Williams,David Wolfe, Bill Woods, Alden Wright, Richard Yen, Weixiong Zhang, Shlomo Zilberstein, and theanonymous reviewers provided by Prentice Hall.

About the CoverThe cover image was designed by the authors and executed by Lisa Marie Sardegna and MaryannSimmons using SGI InventorTM and Adobe The cover depicts the following items from the history of AI:

1. Aristotle's planning algorithm from De Motu Animalium (c. 400 2. Ramon Lull's concept generator from (c. 1300 A.D.).

3. Charles Babbage's Difference Engine, a prototype for the first universal computer (1848). 4. Gottlob Frege's notation for first-order logic (1789).

Lewis Carroll's diagrams for logical reasoning (1886). 6. Wright's probabilistic network notation (1921). 7. Alan Turing (1912-1954).8. Shakey the Robot (1969-1973).9. A modern diagnostic expert system (1993).

5.

Golub,

Sridhar Minton,

Musick, Illah

Paskin, Passera,

Andreas

Upham,

LeBlanc.

Pazzani,

Wellman,

Ernst Heinz, Magnus

Nayak,

Pijls, Provan,

Nebel,

Thrun,

Kamin

Photoshop TM .

Galecki, Ger-Grimson, Barbc1ra

Christoph Vas-

Gemot

McGrane, Lou

Mendel-

XuanLong Parkes,

B.C .).

Ars Magna

Sewall


About the AuthorsStuart Russell was born in 1962 in Portsmouth, England. He received his B.A. with first-class hon-ours in physics from Oxford University in 1982, and his in computer science from Stanford in1986. He then joined the faculty of the University of California at Berkeley, where he is a professor of computer science, director of the Center for Intelligent Systems, and holder of theChair in Engineering. In 1990, he received the Presidential Young Investigator Award of the National Science Foundation, and in 1995 he was of the Computers and Thought Award. He was a1996 Miller Professor of the University of California and was appointed to a Chancellor's Professor-ship in 2000. In 1998, he gave the Forsythe Memorial Lectures at Stanford University. He is a Fellowand former Executive Council member of the American Association for Artificial Intelligence. He haspublished over 100 papers on a wide range of topics in artificial intelligence. His other books include The Use of Knowledge in Analogy and Induction and (with Eric Do the Right Studiesin Limited Rationality.

Peter is director of Search Quality at Inc. He a Fellow and Executive Councilmember of the American Association for Artificial Intelligence. Previously, he was head of the Com-putational Sciences Division at NASA Ames Research Center, where he oversaw NASA's researchand development in artificial intelligence and robotics. Before that. he served as chief scientist atglee, where he helped develop one of the first Internet information extraction services, and as a senior scientist at Sun Microsystems Laboratories working on intelligent information retrieval. He receiveda B.S. in applied mathematics from Brown University and a in computer science from the Uni-versity of California at Berkeley. He has been a professor at the University of Southern California anda research faculty member at Berkeley. He has over 50 publications in computer science including the books Paradigms of Programming: Case Studies in Common Lisp, Verbmobil: A TranslationSystem for Face-to-Face Dialog, and Intelligent Help Systems for

Ph.D.

Smith-Zadeh

cowinner

Wefald) Thing:

Norvig Google, IS

Jun-

Ph.D.

Al UNIX.


-


Summary of ContentsI Artificial Intelligence

1 Introduction2 Intelligent Agents 32

Problem-solving..................................................3 Solving Problems by Searching 59

................................................4 Informed Search and Exploration 94................................................5 Constraint Satisfaction Problems 137............................................................6 Adversarial Search , 161Knowledge and reasoning

Logical Agents 1948 First-Order Logic 2409 Inference in First-Order Logic 27210 Knowledge Representation 320

IV Planning11 Planning 375

.........................................12 Planning and Acting in the Real World 417V Uncertain knowledge and reasoning

13 Uncertainty 46214 Probabilistic Reasoning 492

................................................15 Probabilistic Reasoning over Time 53716 Making Simple Decisions 58417 Making Complex Decisions 613

VI Learning 18 Learning from Observations 64919 Knowledge in Learning 67820 Statistical Learning Methods 71221 Reinforcement Learning 763

VII Communicating, perceiving, and acting22 Communication 79023 Probabilistic Language Processing 83424 Perception25 Robotics

VIII Conclusions26 Philosophical Foundations 94727 AI: Present and Future 968A Mathematical background 977B Notes on Languages and Algorithms 984

Bibliography 987Index 1045

II

III

• • • • • • • • • • • • • • • • • • • • • • • • , • •, • • • • m "• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1

...................................................................... 863 ........................................................................ 901


Contents

I Artificial Intelligence

1 Introduction1.1

. . . . . . . . . . . . . . . . . . . . . .Acting humanly: The Turing Test approach. . . . . . . . . . . . . . . .Thinking humanly: The cognitive modeling approach

. . . . . . . . . . . . . . . . .Thinking rationally: The "laws of thought" approach. . . . . . . . . . . . . . . . . . . .Acting rationally: The rational agent approach

. . . . . . . . . . . . . . . . . . . . . .1.2 The Foundations of Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . ..Philosophy (428 B .-present)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . ..Mathematics (c 800-present). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Economics (1776-present)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Neuroscience (1861-present). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Psychology (1879-present)

. . . . . . . . . . . . . . . . . . . . . . . .Computer engineering (1940-present). . . . . . . . . . . . . . . . . . . .Control theory and Cybernetics (1948-present)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Linguistics (1957-present)1.3 The History of Artificial Intelligence

. . . . . . . . . . . . . . . . . .The gestation of artificial intelligence (1943-1955)The birth of artificial intelligence (1956)

. . . . . . . . . . . . . . . . . .Early enthusiasm, great expectations (1952-1969)A dose of reality (1966-1973)

. . . . . . . . . . . .Knowledge-basedsystems: The key to power? (1969-1979)becomes an industry (1980-present)

. . . . . . . . . . . . . . . . . . .The return of neural networks (1986-present)becomes a science (1987-present)

. . . . . . . . . . . . . . . . .The emergence of intelligent agents (1995-present)1.4 The State of the Art1.5 SummaryBibliographical and Historical NotesExercises

Intelligent Agents 2.1 Agents and Environments

. . . . . . . . . . . . . . . . . . . . .2.2 Good Behavior: The Concept of RationalityPerformance measures

Omniscience. learning. and autonomy 2.3 The Nature of Environments

Specifying the task environment Properties of task environments

2.4 The Structure of Agents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Agent programs

Simple reflex agentsModel-based reflex agents

hat is AP

C

AI

Rationality · · · · ·

2 3 4

5 7 9

14 15 16

l7 18 21

25 25 27

29 30

y:;

JS 36 38 38 --1-0 J.--1-

44 46 48


xvi Contents

Goal-based agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Utility-based agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1 Learning agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . , . . . . . . . 51

2.5 Summa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Bibliographicaland Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3 Solving Problems by Searching 593.1 Problem-Solving Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Well-defined problems and solutions . . . . . . . . . . . . . . . . . . . . . . . . . 62Formulating problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.2 Example Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64Toy problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64Real-world problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.3 Searching for Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69Measuring problem-solving performance . . . . . . . . . . . . . . . . . . . . . . . 71

3.4 Uninformed Search Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Breadth-first search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Depth-first search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75Depth-limited search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Iterative deepening depth-first search . . . . . . . . . . . . . . . . . . . . . . . . . 78Bidirectional search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79Comparing uninformed search strategies . . . . . . . . . . . . . . . . . . . . . . . 8 1

3.5 Avoiding Repeated States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1 3.6 Searching with Partial Information . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Sensorless problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84Contingency problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4 Informed Search and Exploration4.1 (Heuristic) Search Strategies . . . . . . . , . . . . . . . . . . . . . . . .

Greedy best-first search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A* search: Minimizing the total estimated solution cost . . . . . . . . . . . . . . . Memory-bounded heuristic search . . . . . . . . . . . . . . . . . . . . . . . . . .Learning to search better . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.2 Heuristic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The effect of heuristic accuracy on performance . . . . . . . . . . . . . . . . . . .Inventing admissible heuristic functions . . . . . . . . . . . . . . . . . . . . . . . Learning heuristics from experience . . . . . . . . . . . . . . . . . . . . . . . . .

4.3 Local Search Algorithms and Optimization Problems . . . . . . . . . . . . . . . .Hill-climbing search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulated annealing search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Local beam search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.4 Local Search in Continuous Spaces . . . . . . . . . . . . . . . . . . . . . . . . . .

II Problem-solving

94 Informed 94

95 97

101 104 105 106 107 109 110 111 115 ll5 116 119


Contents

4.5 Online Search Agents and Unknown 122Online search problems 123Online search agents 125Online local search 126Learning in online search 127

4.6 129Bibliographicaland Historical Notes 130Exercises 134

5 Constraint Satisfaction Problems 1375.1 Constraint Satisfaction Problems 1375.2 Backtracking Search for 141

Variable and value ordering 143Propagating information through constraints 144Intelligent backtracking: looking backward 148

5.3 Local Search for Constraint Satisfaction Problems 1505.4 The Structure of Problems 1515.5 Summary 155Bibliographical and Historical Notes 156Exercises 158

6 Adversarial Search 1616.1 Games 161. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2 Optimal Decisions in Games 162

Optimal strategies 163The minimax algorithm 165Optimal decisions in multiplayer games 165

6.3 Alpha-Beta Pruning 1676.4 Imperfect. Real-Time Decisions 171

Evaluation functions 171Cutting off search 173

6.5 Games That Include an Element of Chance 175Position evaluation in games with chance nodes 177Complexity of expectiminimax 177Card games 179

6.6 State-of-the-Art Game Programs 1806.7 Discussion 183. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.8 Summary 185. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Bibliographical and Historical Notes 186Exercises 189

Knowledge and reasoning7 Logical Agents 194

7.1 Knowledge-Based Agents 1957.2 The Wumpus World 1977.3 Logic 200. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.4 Propositional Logic: A Very Simple Logic 204

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Syntax 204

xvii

Environments

Summary ..... • . • •

CSPs .

III


xviii Contents

Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A simple knowledge base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Equivalence, validity, and satisfiability . . . . . . . . . . . . . . . . . . . . . . . .

7.5 Reasoning Patterns in Propositional Logic . . . . . . . . . . . . . . . . . . . . . .Resolution . . . . . . . . . . . . . . . . . . . . , , . . , . . . . . . . . . . . . . .Forward and backward chaining . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7.6 Effective propositional inference . . . . . . . . . . . . . . . . . . . . . . . . . . .A complete backtraclung algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . Local-search algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Hard satisfiability problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7.7 Agents Based on Propositional Logic . . . . . . . . . . . . . . . . . . . . . . . . .Finding pits and wumpuses using logical inference . . . . . . . . . . . . . . . . . . Keeping track of location and orientation . . . . . . . . . . . . . . . . . . . . . . . Circuit-based agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Bibliographicaland Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8 First-Order Logic 8.1 Representation Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Syntax and Semantics of First-Order Logic . . . . . . . . . . . . . . . . . . . . . .

Models for first-order logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Symbols and interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Atomic sentences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Complex sentences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8.3 Using First-Order Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Assertions and queries in first-order logic . . . . . . . . . . . . . . . . . . . . . . .The kinship domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Numbers, sets, and lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .The wumpus world . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8.4 Knowledge Engineering in First-Order Logic . . . . . . . . . . . . . . . . . . . . .The knowledge engineering process . . . . . . . . . . . . . . . . . . . . . . . . . . The electronic circuits domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Bibliographical and Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9 Inference in First-Order Logic 9.1 Propositional vs. First-Order Inference . . . . . . . . . . . . . . . . . . . . . . . .

Inference rules for quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Reduction to propositional inference . . . . . . . . . . . . . . . . . . . . . . . . .

9.2 Unification and Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A first-order inference rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Unification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

206 208 208 210 211 213 217 220 221 222 224 225 225 227 227 231 232 233 236

240 240 245 245 246 248 248 249 249 253 253 253 254 256 258 260 261 262 266 267 268

272 272 273 274 275 275 276


Contents xix

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Storage and retrieval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.3 Forward Chaining

First-order definite clauses. . . . . . . . . . . . . . . . . . . . . . . . .A simple forward-chaining algorithm

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Efficient forward chaining 9.4 Backward Chaining

. . . . . . . . . . . . . . . . . . . . . . . . . . . .A backward chaining algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Logic programming

Efficient implementationof logic programs. . . . . . . . . . . . . . . . . . . . . . . .Redundant inference and infinite loops

Constraint logic programming 9.5 Resolution

. . . . . . . . . . . . . . . . . . . . .Conjunctive normal form for first-order logic The resolution inference rule

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Example proofs Completeness of resolution

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Dealing with equality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Resolution strategies

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Theorem provers9.6 SummaryBibliographicaland Historical NotesExercises

10 Knowledge Representation 10.1 Ontological Engineering 10.2 Categories and Objects

Physical composition MeasurementsSubstances and objects

10.3 Actions, Situations. and EventsThe ontology of situation calculus Describing actions in situation calculus Solving the representational frame problemSolving the inferential frame problem Time and event calculusGeneralized eventsProcessesIntervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Fluents and objects

10.4 Mental Events and Mental Objects A formal of beliefsKnowledgeandbeliefKnowledge. time. and action

10.5 The Internet Shopping WorldComparing offers

10.6 Reasoning Systems for Categories Semantic networks Description logics

10.7 Reasoning with Default Information

278 280 280 281 283 287 287 289 290 292 294 295 295 297 297 300 303 304 306 310 310 315

320 320 322 324 325 327 328 329 330 332 333 334 335 337 338 339

· · · · · · · · · · · · · · · · · · · · · · · . · . 341 theory • • · • • • • • • • · • • . • . .. . • .. . • .. • . . . . 341

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 343 ... · · · · · · · · · · · · · · · · · · · · · · · · · · 344 ... . . · · · · · · · · · · · · · · · · · · · · · · · · 344

...... . ...... · . · · · · · · · · · · · · · · · · · · · · 348 ..... . .. .. .. . ........... .. 349

...... . ... . ... . . .. ... . ...... . · .. · . · 350

. ............ . .. . .... . .. . ... . .. . · · · 353 · · · · · · · · · · · · · · · · · · · · · · · · · 354


Contents

Open and closed worlds 354. . . . . . . . . . . . . . . . . . . .Negation as failure and stable model semantics 356

. . . . . . . . . . . . . . . . . . . . . . . . . . .Circumscription and default logic 358. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.8 Truth Maintenance Systems 360

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.9 Summary 362. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Bibliographical and Historical Notes 363

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Exercises 369

IV PlanningPlanning 375

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.1 The Planning Problem 375. . . . . . . . . . . . . . . . . . . . . . . . . .The language of planning problems 377

. . . . . . . . . . . . . . . . . . . . . . . . . . . .Expressiveness and extensions 378. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Example: Air cargo transport 380

. . . . . . . . . . . . . . . . . . . . . . . . . . .Example: The spare tire problem 381. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Example: The blocks world 381

. . . . . . . . . . . . . . . . . . . . . . . . . . .1.2 Planning with State-SpaceSearch 382. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Forward state-space search 382

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Backward state-spacesearch 384. . . . . . . . . . . . . . . . . . . . . . . . . . . .Heuristics for state-space search 386

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.3 Partial-Order Planning 387. . . . . . . . . . . . . . . . . . . . . . . . . . .A partial-order planningexample 391

. . . . . . . . . . . . . . . . . . . .Partial-order planning with unbound variables 393. . . . . . . . . . . . . . . . . . . . . . . . .Heuristics for partial-order planning 394

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.4 Planning Graphs 395. . . . . . . . . . . . . . . . . . . . . . .Planning graphs for heuristic estimation 397

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .The GRAPHPLAN algorithm 398. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Termination of GRAPHPLAN 401

. . . . . . . . . . . . . . . . . . . . . . . . . .11.5 Planning with PropositionalLogic 402. . . . . . . . . . . . . . . . .Describing planning problems in propositional logic 402

. . . . . . . . . . . . . . . . . . . . . . . .Complexity of propositional encodings 405. . . . . . . . . . . . . . . . . . . . . . . . . . .11.6 Analysis of Planning Approaches 407

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.7 Summary. 408. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Bibliographicaland Historical Notes 409

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412

12 Planning and Acting in the Real World 417. . . . . . . . . . . . . . . . . . . . . . . . . . . .12.1 Time. Schedules. and Resources 417

. . . . . . . . . . . . . . . . . . . . . . . . .Scheduling with resource constraints 420

. . . . . . . . . . . . . . . . . . . . . . . . .12.2 Hierarchical Task Network Planning 422

. . . . . . . . . . . . . . . . . . . . . . . . .Representing action decompositions 423. . . . . . . . . . . . . . . . . . . . . .Modifying the planner for 425

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Discussion 427. . . . . . . . . . . . . . . . . .12.3 Planning and Acting in Nondeterministic Domains 430

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.4 Conditional Planning 433. . . . . . . . . . . . . . . .Conditional planning in fully observable environments 433

. . . . . . . . . . . . . .Conditional planning in partially observable environments 437. . . . . . . . . . . . . . . . . . . . . . . .12.5 Execution Monitoring and Replanning 441

xx

11

Exercises

decompositions


Contents xxi

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.6 ContinuousPlanning 44512.7 Planning 449

Cooperation: Joint goals and plans 450Multibody planning 451Coordinationmechanisms 452

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Competition 45412.8 Summary 454

and Historical Notes 455Exercises 459

V Uncertainknowledge and reasoning13 Uncertainty 462

13.1 under Uncertainty 462Handling uncertain knowledge 463Uncertainty and rational decisions 465Design for a decision-theoretic agent 466

13.2 Basic Probability Notation 466Propositions 467Atomic events 468Prior probability 468Conditionalprobability 470

13.3 The Axioms of Probability 471Using the axioms of probability 473Why the axioms of probability are reasonable 473

13.4 Inference Using Full Joint Distributions 47513.5 Independence 47713.6 Bayes' Rule and Its Use 479

Applying Bayes' rule: The simple case 480Using Bayes' rule: Combining evidence 481

13.7 The Wumpus World Revisited 48313.8 Summary 486Bibliographicaland Historical Notes 487Exercises 489

14 Probabilistic Reasoning 49214.1 Representing Knowledge in an Uncertain Domain 49214.2 The Semantics of Bayesian Networks 495

Representingthe full joint distribution 495Conditional independence relations in Bayesian networks 499

14.3 Efficient Representation of Conditional 50014.4 Exact Inference in Bayesian Networks 504

Inference by enumeration 504The variable elimination algorithm 507The complexity of exact inference 509Clusteringalgorithms 510

14.5 Approximate Inference in Bayesian Networks 511Direct sampling methods 511Inference by Markov chain simulation 516

MultiAgent

Bibliographical

Acting

Distributions .


xxii Contents

. . . . . . . . . . . . . . . .14.6 Extending Probability to First-Order Representations 519. . . . . . . . . . . . . . . . . . . . . .14.7 Other Approaches to Uncertain Reasoning 523

. . . . . . . . . . . . . . . . . . . . .Rule-based methods for uncertain reasoning 524. . . . . . . . . . . . . . . . . .Representing ignorance: Dempster-Shafer theory 525

. . . . . . . . . . . . . . . . .Representing vagueness: Fuzzy sets and fuzzy logic 526. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14.8 Summary 528

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Bibliographicaland Historical Notes 528. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Exercises 533

15 Probabilistic Reasoning over Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15.1 Time and Uncertainty

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .States and observations . . . . . . . . . . . . . . . . . . .Stationary processes and the Markov assumption

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .15.2 Inference in Temporal Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Filtering and prediction

Smoothing. . . . . . . . . . . . . . . . . . . . . . . . . . .Finding the most likely sequence

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15.3 Hidden Markov Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Simplified matrix algorithms

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15.4 Kalman Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . .Updating Gaussian distributions

. . . . . . . . . . . . . . . . . . . . . . . . . .A simple one-dimensional example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .The general case

. . . . . . . . . . . . . . . . . . . . . . . . . . .Applicabilityof Kalman filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15.5 Dynamic Bayesian Networks

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Constructing DBNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Exact inference in DBNs

. . . . . . . . . . . . . . . . . . . . . . . . . . .Approximate inference in DBNs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15.6 Speech Recognition

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Speech sounds Words

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Sentences. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Building a speech recognizer

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Bibliographicaland Historical Notes

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Exercises

16 Making Simple Decisions 584. . . . . . . . . . . . . . . . . .16.1 Combining Beliefs and Desires under Uncertainty 584

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16.2 The Basis of Utility Theory 586. . . . . . . . . . . . . . . . . . . . . . . . . .Constraints on rational preferences 586

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .And then there was Utility 588. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16.3 Utility Functions 589. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .The utility of money 589

. . . . . . . . . . . . . . . . . . . . . . . . . .Utility scales and utility assessment 591. . . . . . . . . . . . . . . . . . . . . . . . . . . .16.4 Multiattribute Utility Functions 593. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Dominance 594

. . . . . . . . . . . . . . . . . . . . .Preference structure and multiattributeutility 596. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16.5 Decision Networks 597

537 537 538 538 541 542 544 547 549 549 551 553 554 556 557 559 560 563 565 568 570 572 574 576 578 578 581


Contents

Representing a decision problem with a decision network 598. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Evaluating decision networks 599

16.6 The Value of Information 600A simple example 600A general formula 601Properties of the value of information 602Implementing an information-gathering agent 603

16.7 Decision-Theoretic Expert Systems 60416.8 Summary 607Bibliographical and Historical Notes 607Exercises 609

17 Making Complex Decisions. . . . . . . . . . . . . . . . . . . . . . . . . . . . .17.1 Sequential Decision Problems

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .An example Optimality in sequential decision problems

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17.2 Value Iteration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Utilities of states

The value iteration algorithm Convergence of value iteration

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17.3 Policy Iteration 17.4 Partially observable 17.5 Decision-Theoretic Agents 17.6 Decisions with Multiple Agents: Game Theory 17.7 Mechanism Design

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17.8 Summaryand Historical Notes

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Exercises

VI Learning18 Learning from Observations

18.1 Forms of Learning18.2 Inductive Learning 18.3 Learning Decision Trees

. . . . . . . . . . . . . . . . . . . . . . . .Decision trees as performance elements . . . . . . . . . . . . . . . . . . . . . . . . . . . .Expressiveness of decision trees

. . . . . . . . . . . . . . . . . . . . . . . .Inducing decision trees from examples Choosing attribute tests Assessing the performance of the learning algorithm Noise and overfitting Broadening the applicability of decision trees

18.4 Ensemble Learning18.5 Why Learning Works: Computational Learning Theory

How many examples are needed? Learning decision lists Discussion

18.6 Summary Bibliographical and Historical Notes

MDPs

Bibliographical

..... . ............................

...................... . ......... . ........................ . ............. .

............. . ............. . ........... .

xxiii

613 613 613 616 618 619 620 620 624 625 629 631 640 643 644 646

649 649 651 653 653 655 655 659 660 661 663 664 668 669 670 672 673 674


xxiv Contents

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Exercises 676

19 Knowledge in Learning 678. . . . . . . . . . . . . . . . . . . . . . . . . .19.1 A Logical Formulation of Learning 678

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Examples and hypotheses 678. . . . . . . . . . . . . . . . . . . . . . . . . . . .Current-best-hypothesis search 680

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Least-commitment search 683. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19.2 Knowledge in Learning 686. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Some simple examples 687

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Some general schemes 688. . . . . . . . . . . . . . . . . . . . . . . . . . . . .19.3 Explanation-Based Learning 690

. . . . . . . . . . . . . . . . . . . . . . . .Extracting general rules from examples 691. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Improving efficiency 693

. . . . . . . . . . . . . . . . . . . . . . . .19.4 Learning Using Relevance Information 694. . . . . . . . . . . . . . . . . . . . . . . . . . .Determining the hypothesis space 695

. . . . . . . . . . . . . . . . . . . . . .Learning and using relevance information 695. . . . . . . . . . . . . . . . . . . . . . . . . . . . .19.5 Inductive Logic Programming 697

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .An example 699. . . . . . . . . . . . . . . . . . . . . . . . .Top-down inductive learning methods 701

. . . . . . . . . . . . . . . . . . . . . . .Inductive learning with inverse deduction 703. . . . . . . . . . . . . . . .discoveries with inductive logic programming 705

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19.6 Summary 707. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Bibliographicaland Historical Notes 708

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Exercises 710

20 Statistical Learning Methods 712. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20.1 Statistical Learning 712

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .20.2 Learning with Complete Data 716. . . . . . . . . . . . . .Maximum-likelihood parameter learning: Discrete models 716

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Naive Bayes models 718. . . . . . . . . . . .Maximum-likelihood parameter learning: Continuous models 719

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Bayesian parameter learning 720. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Learning Bayes net structures 722

. . . . . . . . . . . . . . . . .20.3 Learning with Hidden Variables: The EM Algorithm 724. . . . . . . . . . . . . .Unsupervised clustering: Learning mixtures of Gaussians 725

. . . . . . . . . . . . . . . . . .Learning Bayesian networks with hidden variables 727. . . . . . . . . . . . . . . . . . . . . . . . . . .Learning hidden Markov models 731

. . . . . . . . . . . . . . . . . . . . . . . .The general form of the EM algorithm 731. . . . . . . . . . . . . . . . .Learning Bayes net structures with hidden variables 732

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20.4 Instance-Based Learning 733

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Nearest-neighbor models 733. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Kernel models 735

20.5 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Units in neural networks 737

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Networkstructures 738. . . . . . . . . . . . . . .Single layer feed-forward neural networks (perceptrons) 740

. . . . . . . . . . . . . . . . . . . . . . .Multilayer feed-forward neural networks 744. . . . . . . . . . . . . . . . . . . . . . . . . .Learning neural network structures 748

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20.6 Kernel Machines 749

Making


Contents

20.7 Case Study: Handwritten Digit Recognition 20.8 Summary Bibliographical and Historical NotesExercises

21 Reinforcement Learning 21.1 Introduction 21.2 Passive Reinforcement Learning

Direct utility estimationAdaptive dynamic programming Temporal difference learning

21.3 Active Reinforcement Learning ExplorationLearning an Action-Value Function

21.4 Generalization in Reinforcement Learning Applications to game-playingApplication to robot control

21.5 Policy Search21.6 Summary Bibliographical and Historical NotesExercises

Communicating. perceiving. and acting22 Communication

22.1 Communication as ActionFundamentals of languageThe component steps of communication

22.2 A Formal Grammar for a Fragment of EnglishThe Lexicon ofThe Grammar of

22.3 Syntactic Analysis (Parsing) Efficient parsing

22.4 Augmented GrammarsVerb subcategorizationGenerative capacity of augmented grammars

22.5 Semantic Interpretation The semantics of an English fragment Time and tenseQuantificationPragmatic Interpretation Language generation with

22.6 Ambiguity and DisambiguationDisambiguation

22.7 Discourse Understanding Reference resolution The structure of coherent discourse

22.8 Grammar Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22.9 Summary

VII

XXV

752 754 755 759

763 763 765 766 767 767 771 771 775 777 780 780 781 784 785 788

790 . .. . . . 790 . .. . ...

. . · · · · · · · · · · . . . . . . . . . . . . . . 791

...... 792 .. . .. . · · · · . . . . . . . 795 . .. . ... . . . . . .

· · · . . . . . . . . . 795 .. . ..... eo · · · · · · · · · · . . . . . . . . . . . . . . . . . . . . . . 796 eo · · · · · · · . . . . . . . . . . . . . . . . . . . . . . . 798

· · · · · · · · · . . . . . . . . . . . . . . 800

. . ..... . 806 .. .. ... . .. . · · · · · · · · · · · . . . . . . . . . . . 808 .. .. . . . · · · · · · · · · · . . . . . . . . .. . 809

810 . . . . . . . . . 811

. . . . . . . . . . . . . . . . . 812 .. . ... · · · · . . . . . . . . . . . . . . . . 813 .. .. ...

. . . . . . . . 815 ...... . . . . . . . . 817 . .......

DCGs · · · · · · · . . . . . . . . . . 818 ... . ..... . ..

· · · . . . . . .... 820 . .... . .... · · · · · · · · · · . . . . . . . . . . . . . . 821 . . ........ · · · · · · · · · . . . . . . . 821

... . ...... . . . · · · · · · · · · · · . . . . . . . . . . . . . . 823 .. . ... . .

· . . . . . . . ... . .. . 824

826


xxvi Contents

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Bibliographical and Historical Notes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Exercises

23 Probabilistic Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . .23.1 Probabilistic Language Models

. . . . . . . . . . . . . . . . . . . . . . . . . .Probabilistic context-free grammars . . . . . . . . . . . . . . . . . . . . . . . . . . .Learning probabilities for PCFGs

. . . . . . . . . . . . . . . . . . . . . . . . . .Learning rule structure for. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23.2 Information Retrieval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Evaluating IR systems

IR refinements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Presentation of result sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Implementing IR systems

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23.3 Information Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23.4 Machine Translation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Machine translation systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Statistical machine translation

. . . . . . . . . . . . . . . . . . . .Learning probabilities for machine translation 23.5 Summary

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Bibliographical and Historical Notes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Exercises

24 Perception. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24.1 Introduction

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24.2 Image Formation . . . . . . . . . . . . . . . . . . . . . .Images without lenses: the pinhole camera

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Lens systems . . . . . . . . . . . . . . . . . . . . . .Light: the photometry of image formation

. . . . . . . . . . . . . . . . . .Color: the spectrophotometry of image formation . . . . . . . . . . . . . . . . . . . . . . . . . .24.3 Early Image Processing Operations

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Edge detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Image segmentation

. . . . . . . . . . . . . . . . . . . . . .24.4 Extracting Three-Dimensional Information Motion

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Binocular stereopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Texture gradients

ShadingContour

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24.5 Object Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Brightness-based recognition

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Feature-based recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Pose Estimation

. . . . . . . . . . . . . . . . . . . .24.6 Using Vision for Manipulation and Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24.7 Summary

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Bibliographicaland Historical Notes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Exercises

25 Robotics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25.1 Introduction

PCFGs

827 831

834 834 836 839 840 840 842 844 845 846 848 850 852 853 856 857 858 861

863 863 865 865 866 867 868 869 870 872 873 875 876 879 880 881 885 887 888 890 892 894 895 898

901 901


Contents

25.2 Robot Hardware SensorsEffectors

25.3 Robotic Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Localization

Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Other types of perception

25.4 Planning to Move Configuration spaceCell decomposition methods Skeletonization methods

25.5 Planning uncertain movements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Robust methods

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25.6 Moving Dynamics and control Potential field controlReactive control

25.7 Robotic Software Architectures Subsumption architecture Three-layer architecture Robotic programming languages

25.8 Application Domains 25.9 Summary Bibliographicaland Historical NotesExercises

VIII Conclusions

26 Philosophical Foundations 26.1 Weak AI: Can Machines Act Intelligently?

The argument from disability The mathematical objection The argument from informality

26.2 Strong AI: Can Machines Really Think?The mind-body problemThe "brain in a vat" experimentThe brain prosthesis experiment The Chinese room

26.3 The Ethics and Risks of Developing Artificial Intelligence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26.4 ary

Bibliographical and Historical NotesExercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27 AI: Present and Future Agent Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27.2 Agent Architectures 27.3 Are We Going in the Right Direction?

Summ

27.1

xxvii

903 903 904 907 908 913 915 916 916 919 922 923 924 926 927 929 930 932 932 933 934 935 938 939 942

947 947 948 949 950 952 954 955 956 958 960 964 964 967

968 968

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 970 · · · · · · · · · · · · · · · · · · · · · · · · 972


Contents

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27.4 What if Does Succeed? 974

A Mathematical background 977. . . . . . . . . . . . . . . . . . . . . . . .A.1 Complexity Analysis and Notation 977

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Asymptotic analysis 977. . . . . . . . . . . . . . . . . . . . . . . . . . .NP and inherently hard problems 978

. . . . . . . . . . . . . . . . . . . . . . . .A.2 Vectors. Matrices. and Linear Algebra 979. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A.3 Probability Distributions 981

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Bibliographicaland Historical Notes 983

B Notes on Languages and Algorithms 984. . . . . . . . . . . . . . . . .B.l Defining Languages with Backus-Naur Form (BNF) 984

. . . . . . . . . . . . . . . . . . . . . . .B.2 Describing Algorithms with Pseudocode 985. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B.3 985

Bibliography

Index

xxviii

AI

Online Help

0()

987

1045


INFERENCE INFIRST-ORDER

In which we procedures for answering questions posed inorder logic.

Chapter 7 defined the notion of inferenceand showed how sound and complete inference canbe achieved for propositional logic. In this chapter, we extend those results to obtain algo-rithms that can answer any answerable question stated in first-order logic. This is significant,because more or less anything can be stated in first-order logic if you work hard enough at it.

Section 9.1 introduces inference rules for quantifiers and shows how to reduceorder inference to propositional inference, albeit at great expense. Section 9.2 describes the idea of unification, showing how it can be used to construct inference rules that work di-rectly with first-order sentences. We then discuss three major families of first-order inference algorithms: forward chaining and its applications to deductive databases and productionsystems are covered in Section 9.3; backward chaining and logic programming systemsare developed in Section 9.4; and resolution-based theorem-proving systems are described in Section 9.5. In general, one tries to use the most efficient method that can accommodate thefacts and axioms that need to be expressed. Reasoning with fully general first-order sentences using resolution is usually less efficient than reasoning with definite clauses using forward orbackward chaining.

This section and the next introduce the ideas underlying modern logical inference systems. We begin with some simple inference rules that can be applied to sentences with quantifiers to obtain sentences without quantifiers. These rules lead naturally to the idea inference can be done by converting the knowledge base to propositional logic and using propositional inference, which we already know how to do. The next section points out anobvious shortcut, leading to inference methods that manipulate first-order sentences directly.

9 LOGIC

define effective first-

first-

9 .1 PROPOSITIONAL VS. FIRST-ORDER INFERENCE

that first-order


Section 9.1. Propositional vs. First-Order Inference 273

Inference rules for quantifiersLet us begin with universal quantifiers. Suppose our knowledge base contains the standard folkloric axiom stating that all greedy kings are evil:

x A Greedy( x ) .Then it seems quite permissible to infer any of the following sentences:

A Evil(John).A .

A Greedy John)).

UNIVERSAL The rule of Universal Instantiation (UI for short) that we can infer any sentence ob-tained by substituting a ground term (a term without for the variable. To writeout the inference rule formally, we use the notion of substitutions introduced in Section 8.3. Let a ) denote the result of applying the substitution to the sentence a . Then the rule is written

a

for any variable and ground term g. For example, the three sentences given earlier are obtained with the substitutions John),

EXISTENTIALINSTANTIATION The corresponding Existential Instantiation rule: for the existential quantifier is slightly

more complicated. For any sentence variable constant symbol that does not appear elsewhere in the knowledge base,

For example, from the sentence

3 A ( x ,John)we can infer the sentence

A John)as as does not appear elsewhere in the knowledge base. Basically, the existential sentence says there is some object satisfying a and the instantiation process is justgiving a name to that object. Naturally, that name musl.not already belong to another object. Mathematics provides a nice example: suppose we discover that there is a number that is alittle bigger than 2.71828 and that satisfies the equation = for x. We can give this number a name, such as e, but it would be a mistake to give it the name of an existing object,

as In logic, the new name is called a Skolem constant. Existential Instantiation is a special case of a more general process called skolemization,which we cover in Section 9.5.

Do not confuse these substitutions with the extended interpretations used to define the semantics of quantifiers.The substitution replaces a variable with a term (a piece of syntax) to produce a new sentence, aninterpretation maps a variable to an object in the domain.

INSTANTIATION

V King(x) =:, Evil(x )

King( John) Greedy( John) =:,

King(Richard) Greedy(Richard) =} Evil(Richa,rd) King(Father(John)) (Father(John )) • Evil(Father(

SUBST(0,

Vv

SUB ST( { v/ g} , a) V

says variables)

0

{x/ {x/ Richard}, and {x/ Father(John)}.

a, v,ancl k

:3 V a SUB ST( { v/k }, a) .

x Crown ( x) OnH ead

Crown(C1) OnHead(C1,

long C1 condition,

d(xY)/dy xY

sKoLEM coNsTANT such 1f.

1

whereas


274 Chapter 9. Inference in First-Order Logic

As well as being more complicated than Universal Instantiation, Existential Instanti-ation plays a slightly different role in inference. Whereas Universal Instantiation can be applied many times to produce many different consequences, Existential Instantiation can be applied once, and then the existentially quantified sentence can be discarded. For example,once we have added the sentence Victim),we no longer need the sentence 3 x Victim). Strictly speaking, the new knowledge base is not logically equivalent

INFERENTIALEQUIVALENCE to the old, but it can be shown to be inferentially equivalent in the sense that it is satisfiable

exactly when the original knowledge base is satisfiable.

Reduction to propositional inference

Once we have rules for inferring nonquantified sentences from quantified sentences, it be-comes possible to reduce first-order inference to propositional inference. In this section wewill give the main ideas; the details are given in Section 9.5.

The first idea is that, just as an existentially quantified sentence can be replaced byone instantiation, a universally quantified sentence can be replaced by the set of all possibleinstantiations. For example, suppose our knowledge base contains just the sentences

x ( x )A Evil ( x )King(John)Greedy(John)

John) .Then we apply UI to the first sentence using all possible ground term substitutions from the vocabulary of the knowledge base-in this case, John) and We obtain

King( John) A Greedy(John) Evil(John),King(Richard) A Greedy(Richard) Evil(Richard),

and we discard the universally quantified sentence. Now, the knowledge base is essentiallypropositional if we view the ground atomic (John),Greedy(John),andso on-as proposition symbols. Therefore, we can apply any of the complete propositionalalgorithms in Chapter 7 to obtain conclusions such as Evil(John).

This technique of propositionalization can be made completely general, as we showin Section 9.5; that is, every first-order knowledge base and query can be propositionalizedin such a way that entailment is preserved. Thus, we have a complete decision procedurefor entailment . . . or perhaps not. There is a problem: When the knowledge base includesa function symbol, the set of possible ground term substitutions is infinite! For example, ifthe knowledge base mentions the Father symbol, then infinitely many nested terms such as

can be constructed. Our propositional algorithms will have difficulty with an infinitely large set of sentences.

Fortunately, there is a famous theorem due to Jacques (1930) to the effectthat if a sentence is entailed by the original, first-order knowledge base, then there is a proofinvolving just subset of the propositionalized knowledge base. Since any such subset has a maximum depth of nesting among its ground terms, we can find the subset by firstgenerating all the instantiations with constant symbols (Richard and John),then all terms of

Kill (Murderer, Kill(x ,

V King Greedy(x) • (9.1)

Brother(Richard,

{x/ {x/ Richard}.

sentences-King

PROPOSITIONALIZATION

Father ( Father ( Father (John)))

Herbrand

a finite


Section 9.2. Unification and Lifting 275

depth 1 and then all tenns of depth 2, and so on, until we are able to construct a propositional proof of the entailed sentence.

We have sketched an approach to first-order inference via propositionalization that iscomplete-that is, any entailed sentence can be proved. This is a major achievement, giventhat the space of possible models is infinite. On the other hand, we do not know until the proof is done that the sentence is entailed! What happens when the sentence is not entailed?Can we tell? Well, for first-order logic, it turns out we cannot. Our proof procedure can go on and on, generating more and more deeply nested terms, but we will not know whetherit is stuck in a hopeless loop or whether the proof is just about to pop out. This is very muchlike the halting problem for Turing machines. Alan fir ing (1936) and Alonzo Church (1936) both proved, in rather different ways, the inevitability of this state of affairs. The question ofentailment for first-order logic is is, algorithms exist that say yes to everyentailed sentence, but no algorithm exists that also says no every nonentailed sentence.

9.2 UNIFICATION AND LIFTING

The preceding section described the understanding of first-order inference that existed up to the early 1960s. The sharp-eyed reader (and certainly the computational logicians of theearly 1960s) will have noticed that the propositionalization approach is rather inefficient. For example, given the query and the knowledge: base in Equation it seems per-verse to generate sentences such as A + Evil (Richard).Indeed, the inference of Evil (John)from the sentences

x ( x )A Greedy( x ) Evil(x)King(John)Greedy(John)

seems completely obvious to a human being. We now show how to make it completely obvious to a computer.

A first-order inference rule The inference that John is evil works like this: find some such that x is a king and x isgreedy, and then infer that this x is evil. More generally, if there is some substitution 8that makes the premise of the implication identical to sentences already in the knowledge base, then we can assert the conclusion of the implication, after applying 8. In this case, the substitution John) achieves that aim.

We can actually make the inference step do more work. Suppose that instead ofknowing we know that everyone is greedy:

y Greedy( y ).Then we would still like to be able to conclude that because we know thatJohn is a king (given) and John is greedy (because everyone is greedy). What we needfor this to work is find a substitution both for the variables in the implication sentence

(Father( Richard) Father( John )),

that

semidecidable-that to

Evil(x) (9.1), King(Richard) Greedy(Richard)

V King •

:c

{x/ even

Greedy (John),

(9.2)

Evil( John),



and for the variables in the sentences to be matched. In this case, applying the substitutionJohn, y J o h n ) to the implication premises King ( x ) and Greedy( x ) and the knowledge

base sentences and will make them identical. Thus, we can infer theof the implication.

This inference process can be captured as a single inference rule that we callGENERALIZED ized Ponens: For atomic sentences p,, where there is a substitution 8 such

that = for all i,

There are n+1 premises to this rule: the n atomic sentences and the one implication. Theconclusion is the result of applying the substitution to the consequent q. For our example:

is isis Greedy ( y ) is

8 is John, J o h n ) q is Evil( x )q ) is Evil( J o h n ) .

It is easy to show that Generalized Ponens is a sound inference rule. First, we observethat, for any sentence p (whose variables are assumed to be universally quantified) and forany substitution 8 ,

p .This holds for the same reasons that the Universal Instantiation rule holds. It holds in partic-ular for a 8 that satisfies the conditions of the Generalized Ponens rule. Thus, from

. . we can inferA . . . A

and from the implication A . . . A q we can inferA . . . A .

Now, 8 in Generalized Ponens is defined so that = for alli;therefore the first of these two sentences matches the premise of the second exactly. Hence,

q) follows by Ponens.LIFTING Generalized Ponens is a lifted version of Ponens-it raises

nens from propositional to first-order logic. We will see in the rest of the chapter that we can develop lifted versions of the forward chaining, backward chaining, and resolution algorithms introduced in Chapter 7. The key advantage of lifted inference rules over propositionalization is that they make only those substitutions which are required to allow particular inferences to proceed. One potentially confusing point is that in one sense Generalized Ponensis less general than Ponens (page 211): Ponens allows any single a on the left-hand side of the implication, while Generalized Ponens requires a special formatfor this sentence. It is generalized in the sense that it allows any number of

Unification

Lifted inference rules require finding substitutions that make different logical expressionsUNIFICATION look identical. This process is called unification and is a key component of all first-order

MODUS PONENS

{x/ I King( John ) Greedy(y)

conclusion

Modus p/, and q, SUBST(0,p/) S UBST(0,pi) ,

P11 , P21, ·. ·, Pn' , (p1 /\p2 /\ · .. /\ pn • q) SUBST(0, q)

P1 1 King(John) P2 1

{x/ y/ SUBST(0,

I= SUBST(0,p)

P1 1,. ,Pn'

0

Pl King(x) P2 Greedy(x)

Modus

SUBST(0,p1 ' ) SUBST(0,pn')

Pl Pn =}

SUBST(0,pi) SUBST(0,pn) ==> SUBST(0, q)

General-

p/

Modus

Modus SUBST(0,p/) SUBST(0,pi),

SUBST(0, Modus Modus

Modus

Modus

Modus Modus

Modus Po-

Modus

Pf.



UNIFIER inference algorithms. The U N I F Y algorithm takes tcvo sentences and returns a unifier forthem if one exists:

q )= where = q) .

Let us look at some examples of how U N I F Y should behave. Suppose we have a query Knows(John,x ): whom does John know? Some to this query can be found by find-ing all sentences in the knowledge base that unify with (John,x ) . Here are the resultsof unification with four different sentences that might be in lcnowledgebase.

x ) , Jane))= Jane)x), Bi l l ) )= John)

(John,x ) , = John,U N I F Y (Knows(John, x ) , Knows( x ,Elizabeth))= Jail .

The last unification fails because x cannot take on the values John and Elizabeth at the same time. Now, remember that Elizabeth) "Everyone knows Elizabeth,"so we should be able to infer that John knows Elizabeth. The problem arises only becausethe two sentences happen to use the same variable name, The problem can be avoidedby standardizing apart one of the two sentences unified, which means renaming its APART

variables to avoid name clashes. For example, we can rename x in Elizabeth) to(a new variable name) without changing its meaning. Now the unification will work:

x ) , Elizabeth))= John) .

Exercise 9.7 delves further into the need for standardizing apart. There is one more complication: we said that U N I F Y should return a substitution

that makes the two arguments look the same. But there be more than one such uni-fier. For example, x), z ) ) could return John, or

John, John, John). The first unifier gives z ) as the result of unifi-cation, whereas the second gives Knows(John, John), The second result could be obtainedfrom the first by an additional substitution John); we say that the first unifier is moregeneral the second, because it places fewer restrictions on the values of the variables. It

UNIFIER turns out that, for every unifiable pair of expressions, there is a single most general unifier(or MGU) that is unique up to renaming of variables. this case it is John,

An algorithm for computing most general unifiers is shown in Figure 9.1. The process isvery simple: recursively explore the two expressions simultaneously "side by side," buildingup a unifier along the way, but failing if two corresponding points in the structures do not match. There is one expensive step: when matching a variable against a complex term,one must check whether the variable itself occurs inside the term; if it does, the match fails because no consistent unifier can be constructed. This so-called occur check makes the complexity of the entire algorithm quadratic in the size of the expressions being unified. Some systems, including all logic programming systems, omit the occur check andsometimes make unsound inferences as a result; other use more complex algorithms with linear-time complexity.

STANDARDIZING

{y/

MOST GENERAL

OCCUR CHECK

UNIFY(p, 0

U IFY(Knows(John, UNJFY(Knows(John , UNIFY (Knows

SUBST(0,p) SUBST(0,

answers Knows

the

Knows(John, {x/ Knows(y, {x/ Bill , y/ Knows(y, Mother(y))) {y/

Knows(x, means

x. being

x/ Mother(John)}

Knows(x,

UNIFY(Knows(John , Knows(z17, { x /Elizabeth, z11 /

x/

than

UNTFY(Knows(John , z/

could Knows (y,

Knows(John ,

{z/

In

simply systems

{y/ x/z}

{y/ x/z}.


278 Chapter 9. Inference in First-Order

function returns a substitution to make x and y identicalinputs: x , a variable, constant, list, or compound

y, a variable, constant, list, or compound the substitution built up so far (optional, defaults to empty)

if 0 = failure then return failureelse if x = y then returnelse if then return y,0 )else if then return x,0 )else if and then

returnelse if and then

returnelse return failure

function x ,0) returns a substitutioninputs: var, a variable

x , any expression0, the substitution built up so far

if 6' then returnelse if then returnelse if then return failureelse return add {varlx) to 6'

Figure 9.1 The unification algorithm. The algorithm works by comparing the structures of the inputs, element by element. The substitution that is the argument to U N I F Y is builtup along the way and is used to make sure that later comparisons are consistent with bindings that were established earlier. In a compound expression, such as B), the function picks out the function symbol and the picks out the argument list (A, B).

Storage and retrieval

Underlying the TELL and ASK functions used to inform and interrogate a knowledge baseare the more primitive STORE and FETCH functions. stores a sentence into theknowledge base and returns all unifiers such that the query q unifies with some sentence in the knowledge base. The problem we used to illustrate unification-finding allfacts that unify with an instance of

The simplest way to implement STORE and FETCH is to keep all the facts in the knowl-edge base in one long list; then, given a query q, call s ) for every sentence in the list. Such a process is inefficient, but it works, and it's all you need to understand the rest ofthe chapter. The remainder of this section outlines ways to make retrieval more efficient, and can be skipped on first reading.

We can make FETCH more efficient by ensuring that unifications are attempted only with sentences that have some chance of unifying. For example, there is no point in trying

VNIFY(x, y, 0)

0,

0 VARIABLE?(x) UNIFY-VAR(x, VARIABLE ?(y) UNIFY-VAR(y, C0MP0UND?(x) C0MP0UND?(y)

VNIFY(ARGS[x], ARGS[y], UNIFY(OP[x], OP[y], 0))

LIST?(x) LIST?(y) UNIFY(REST[x], REST[y] , UNIFY(FIRST[x], FIRST[y], 0))

VNIFY-VAR(var,

{var/val} E UNIFY( val, x, 0) UNIFY( var, val, 0) {x/val} E 0

OCCUR-CHECK?( var, x)

0

F function ARGS

STORE(s)

F(A,

FETCH(q)

Knows(John, x)-is FETCHiog.

UNIFY(q,

Logic

OP

8

8



to unify x) with John).We can avoid such unifications byINDEXING indexing the facts in the knowledge base. A simple scheme called predicate indexing putsPREDICATEINDEXING all the Knows facts in one bucket and all the Brother facts in another. The buckets can be

stored in a hash table2 for efficient access. Predicate indexing is useful when there are many predicate symbols but only a few

clauses for each symbol. In some applications, there are many clauses for a given predicate symbol. For example, suppose that the authorities want to keep track of whoemploys whom, using a predicate y). would be a very large bucket with perhaps millions of employers and tens of millions of employees. Answering a query such as

Richard) with predicate indexing would require scanning the entire bucket.For this particular query, it would help if facts indexed both by predicate and by

second argument, perhaps using a combined hash table key. Then we could simply constructthe key from the query and retrieve exactly those facts that unify with the query. For other queries, such as (AIMA.org , y ) , we would need indexed the facts by com-bining the predicate with the first argument. can be stored under multiple index keys, rendering them instantly accessible to various queries that they might unify with.

Given a sentence to be stored, it is possible to construct indices for possible queriesthat unify with it. For the fact Employs(AIMA.org, the queries are

Employs( AIMA. Richard) Does AIMA.org Richard?Employs( x ,Richard) Who employs Richard?Employs(AIMA.org,y ) Whom does AIMA..org employ?

Who employs whom?LATTICE These queries form a subsumption lattice, as shown in Figure The lattice has some

interesting properties. For example, the child of any node in the lattice is obtained from its parent by a single substitution; and the "highest" common descendant of any two nodes is the result of applying their most general unifier. The portion of the lattice above any ground fact can be constructed systematically (Exercise 9.5). A sentence with repeated constants hasa slightly different lattice, as shown in Figure Function symbols and variables in the sentences to be stored introduce still more interesting lattice structures.

The scheme we have described works very well whenever the lattice contains a small number of nodes. For a predicate with n arguments, the lattice contains nodes. Iffunction symbols are allowed, the number of nodes is also exponential in the size of the terms in the to be stored. This can lead to a huge of indices. At some point, the benefits of indexing are outweighed by the costs of storing and maintaining all the indices. Wecan respond by adopting a fixed policy, such as maintaining indices only on keys composed ofa predicate plus each argument, or by using an adaptive that creates indices to meet the demands of the kinds of queries being asked. For most the number of facts to be stored is small enough that efficient indexing is considered a solved problem. For industrial and commercial databases, the problem has received technology development.

A hash is a data structure for storing and retrieving information indexed by fixed keys. For practical purposes, a hash table can be considered to have constant storage and retrieval times, even when the table contains a very large number of items.

Knows(John,

Employs(x ,

Employs

org,

Em_ploys(x, y) SUBSUMPTION

sentence

2 table

Brother( Richard,

however, tax

Employs(x , This

were

to have Therefore, facts

Richard),

employ

9.2(a).

9.2(b).

number

policy AI systems,

substantial

all



John)

Figure 9.2 (a) The subsumption lattice whose lowest node is the sentence Employs(AIMA.org,Richard). (b) The subsumption lattice for the sentence

John).

A forward-chaining algorithm for propositional definite clauses was given in Section 7.5. The idea is simple: start with the atomic sentences in the base and apply Ponens in the forward direction, adding new atomic sentences, until no further inferences canbe made. Here, we explain how the algorithm is applied to first-order definite clauses and how it can be implemented efficiently. Definite clauses such as Situation Response areespecially useful for systems that make inferences in response to newly arrived information. Many systems can be defined this way, and reasoning with forward chaining can be much more efficient than resolution theorem proving. Therefore it is often worthwhile to try to builda knowledge base using only definite clauses so that the cost of resolution can be avoided.

First-orderdefinite clauses

First-order definite clauses closely resemble propositional definite clauses (page 217): they are disjunctions of literals of which exactly one is positive. A definite clause either is atomicor is an implication whose antecedent is a conjunction of positive literals and whose conse-quent is a single positive literal. The following are first-order definite clauses:

King(x) A Greedy(x) Evil(x) .John) .

.Unlike propositional literals, first-order literals can include variables, in which case those variables are assumed to be universally quantified. (Typically, we omit universal quantifiers when writing definite clauses.) Definite clauses are a suitable normal form for use withGeneralized Ponens.

Not every knowledge base can be converted into a set of definite clauses, because of thesingle-positive-literal restriction, but many can. Consider the following problem:

The law says that it is a crime for an American to sell weapons to hostile nations. The country an enemy of America, has some missiles, and all of its missiles were sold to it by Colonel West, who is American.

Employs(x,y)

Employs(x,Richord) Employs(AIMA.org,y)

---------------Employs(AJMA.org,Richard)

(a)

Employs( John,

9.3 FORWARD CHAINING

King( Greedy(y)

Modus

Nono,

Employs(:,;,Jolm)

Employs(x,y)

Employs(x,x )

Employs(John,

(b)

knowledge

Employs(John,y)

Modus


9.3. Forward Chaining 281

We will prove that West is a criminal. we will represent these facts as first-order definite clauses. The next shows how the forward-chaining algorithm solves the

. . it is a crime for an American to sell weapons to hostile nations":

. . . has some missiles." The sentence 3x Owns A Missile (x) is transformedinto two definite clauses by Existential Elimination, introducing a new constant :

Owns

"All of its missiles were sold to it by Colonel West":

( x ) A Owns x ) Sells(West , .We will also need to know that missiles are weapons:

Missile ( x ) Weapon( x )and we must know that an enemy of America counts as "hostile":

America) ."West, who is American . .

American(Wes t ) ."The country an of America . .

Anzerica) . (9.10)

This knowledge base contains no function symbols and is therefore an instance of' the class of knowledge is, sets of first-order definite clauses with no functionsymbols. We will see the absence of function symbols makes inference much easier.

A simple forward-chaining algorithm

The first chaining algorithm we will consider is a very simple one, as shown inFigure 9.3. Starting from the known facts, it triggers all the rules whose premises are satisfied, adding their conclusions the known facts. The process repeats until the query is answered (assuming that just one answer is required) or no new facts are added. Notice that a fact is

RENAMING not "new" if it is just a renaming of a known fact. One sentence is a renaming of another ifthey are identical except for the names of the variables. For example, and are of each other because they differ only in choiceof x or y; their meanings are identical: everyone likes ice cream.

We will use our problem to illustrate how FOL-FC-ASKworks. The implicationsentences are and (9.8). Two iterations are required:

On the first iteration, rule (9.3) has unsatisfied premises. Rule (9.6) is satisfied with and West , is added. Rule (9.7) is satisfied with and is added.Rule (9.8) is satisfied with and is added.

Section

DATALOC?,

First, section problem.

American(x) I\ Weapon(y) A Sells(x, y, z) A Hostile (z) ==> Criminal(x). (9.3)

"Nono (Nono, x)

(Nono, .1111)

Missile(M1)

Missile

Enemy(x ,

(Nono , •

=} Hostile(x )

".

Nono, enemy

Enemy(Nono,

Datalog bases-that that

forward

to

Likes(y , Ice Cream) renamings

•

crime (9.3), (9.6), (9.7),

{x/M1}, {x/M1}, {x / N ono},

x ,Nono)

" ·

Sells ( iVl1, Nono) Weapon (lvli)

Hostile (Nono)

ivf1

(9.4)

(9.5)

(9.6)

(9.7)

(9.8)

(9.9)

Likes(x, IceCream) the



function a)returns a substitution or falseinputs: KB, the knowledge base, a set of first-order definite clauses

a, the query, an atomic sentence local variables: new, the new sentences inferred on each iteration

repeat until new is emptynewfor each sentence in KB do

A . . .for each such that A . . . A = A . . . A

for some .. . , in KB

if is not a renaming of some sentence already in or new then doadd q' to new

a)if is not fail then return

add new to KBreturn false

Figure 9.3 A conceptually straightforward, but very inefficient, forward-chaining algorithm. On each iteration, it adds to KB all the atomic sentences that can be inferred in onestep from the implication sentences and the atomic sentences already in KB.

West)

Figure 9.4 The proof tree generated by forward chaining on thecrime example. The initialfacts appear at the bottom level, facts inferred on the first iteration in the middle level, andfacts inferred on the second iteration at the top level.

On the second iteration, rule (9.3) is satisfied with West, andCriminal(West) is added.

Figure 9.4 shows the proof tree that is generated. Notice that no new inferences are possibleat this point because every sentence that could be concluded by forward chaining is alreadycontained explicitly in the Such a knowledge base is called a fixed point of the inferenceprocess. Fixed points reached by forward chaining with first-order definite clauses are similar

•

FOL-FC-AsK(KB,

- o r

I\ Pn ;c:} q) r- STANDARDIZE-APART(r) 0 SUBST(0, P1 Pn) SUBST(0, pf

PL q' .- SUBST(0, q)

q'

efJ .- UNIFY(q',

¢

KB

Criminal(West)

American(West) Enemy(Nono,Americo)

{x/ y/M1, z / Nono},

KB.


Section 9.3. Forward Chaining 283

to those for propositional forward chaining (page 219); the principal difference is that a order fixed point can include universally quantified atomic sentences.

FOL-FC-ASKis easy to analyze. First, it is sound, because every inference is just anapplication of Generalized Ponens, which is sound. Second, it is complete definiteclause knowledge bases; that is, it answers every query whose answers are entailed by any knowledge base of definite clauses. For knowledge bases, which contain functionsymbols, the proof of completeness is fairly easy. We begin by the number ofpossible facts that can be added, which determines the number of iterations. Letbe the maximum arity (number of arguments) of any predicate, p be the number of predicates,and n be the number of constant symbols. Clearly, there can be no more than distinctground facts, so after this many iterations the algorithm must have reached a fixed point. Then we can make an argument very similar to the proof of completeness for propositional forward chaining. (See page 219.) The details of how to make the transition from propositional to first-order completeness are given for the resolution algorithm in Section 9.5.

For general clauses with function symbols, FOL-FC-ASKcan generate in-finitely many new facts, so we need to be more careful. For the case in which an answer to the query sentence q is entailed, we must appeal to Herbrand's theorem to establish that the algorithm will find a proof. (See Section 9.5 for the resolution case.) If the query has noanswer, the algorithm could fail to terminate in some cases. For example, if the knowledge base includes the axioms

n

then forward chaining and so on. This problem is unavoidable in general. As with general first-order logic, entail-ment with definite clauses is semidecidable.

Efficient forward chaining

The forward chaining algorithm in Figure 9.3 is designed for ease of understanding rather than for efficiency of operation. There are three possible sources of complexity. First, the "inner loop" of the algorithm involves finding all possible unifiers such that the ofa rule unifies with a suitable set of facts in the knowledge base. This is often called patternmatching and can be very expensive. Second, the algorithm rechecks every rule on everyiteration to see whether its premises are satisfied, even if very few additions are to the knowledge base on each iteration. Finally, the algorithm might generate many facts that are irrelevant to the goal. We will address each of these sources in turn.

Matching rules against facts

The problem of matching the premise of a rule against the facts in the knowledge base might seem simple enough. For example, suppose we want to apply the rule

PATTERN MATCHING

definite

Peano

NatNum(O)

Modus

Datalog

V NatNum(n) =} NatNum(S(n))

maximum

first-

for

no counting

k

adds NatNum(S(O)), NatNum(S(S(O))), NatNum(S(S(S(O)))),

premise

made

known

Missile(x) • Weapon(x) .



n t ) sa) A

q) A sa) A

nsw) sa) A

v ) A sa) A

sa) Colorable()

(Red,Blue) (Red,Green)(Green,Red) Green, Blue)(Blue,Red) Green)

Figure 9.5 (a) Constraint graph for coloring the map of Australia (from (b)The map-coloring CSP expressed as a single definite clause. Note that the domains of thevariables are defined implicitly by the constants given in the ground facts for Dzf f .

Then we need to find all the facts that unify with in a suitably indexed knowledge base, this can be done in constant time per fact. Now consider a rule such as

x ) Sells(West , x , .

Again, we can find all the objects owned by in constant time per object; then, for each object, we could check whether it is a missile. If the knowledge base contains many objectsowned by and very few missiles, however, it would be better to find all the missiles first

CONJUNCTORDERING and then check whether they are owned by This is the conjunct ordering problem:

find an ordering to solve the conjuncts of the rule premise so that the total cost is minimized. It turns out that finding the optimal ordering is NP-hard, but good heuristics are available. For example, the most constrained variable heuristic used for CSPs in Chapter 5 would suggest ordering the conjuncts to look for missiles first if there are fewer missiles than objects that are owned by

The connection between pattern matching and constraint satisfaction is actually veryclose. We can view each conjunct as a constraint on the variables that it contains-for ex-ample, is a unary constraint on x . Extending this idea, we can express every finite-domain CSP as a single dejnite clause together with some associated ground facts.Consider the map-coloring problem from Figure 5.1, shown again in Figure An equiv-alent formulation as a single definite clause is given in Figure Clearly, the conclusionColorable()can be inferred only if the CSP has a solution. Because CSPs in general include 3SAT problems as special cases, we can conclude that matching a clause against aset of facts is NP-hard.

It might seem rather depressing that forward chaining has an NP-hard matching problem in its inner loop. There are three ways to cheer ourselves up:

0 (a)

Missile(x) I\ Owns(Nono , =}-

Nono

Nono.

Missile(x)

Diff ( wa, I\ Diff ( wa,

Diff

Diff

Diff

Diff ( nt , Diff ( nt,

Diff(q, I\ Diff(q,

Diff ( nsw, Diff ( nsw,

Diff(v, * Diff

Diff(

Diff(Blue,

(b)

Figure5 .1).

Missile ( x);

Nono)

Nono

Nono.

9.5(a). 9.5(b).

definite


Section 9.3. Forward Chaining 285

We can remind ourselves that most rules real-world knowledge bases are small and simple (like the rules in our crime example) rather than large and complex (like the CSP formulation in Figure 9.5). It is common in the database world to assume that both the sizes of rules and the arities of predicates are bounded by a constant and to worry

DATA COMPLEXITY only about data complexity-that is, the complexity of inference as a function of thenumber of ground facts in the database. is easy to show that the data complexity offorward chaining is polynomial.We can consider of rules for which matching is efficient. Essentially every

clause be viewed as defining a CSP, so matching will be tractable just when the corresponding CSP is tractable. Chapter 5 describes several tractable families of For if the constraint graph (the graph whose nodes are variables and whose links are constraints) forms a tree, then the CSP can be in lineartime. Exactly the same result holds for rule matching. For instance, if we remove SouthAustralia from the map in Figure 9.5, the resulting clause is

nt) q) A nsw)A Colorable()

which corresponds to the reduced CSP shown in Figure 5.11. Algorithms for solving tree-structured CSPs can be applied directly to the problem of rule matching.We can work hard to eliminate redundant rule matching attempts in the forward chain-ing algorithm, which is the subject of the next section.

Incremental forward

When we showed how forward chaining works on the crime example, we cheated; in partic-ular, we omitted some of the rule matching done by the algorithm shown in Figure 9.3. Forexample, on the second iteration, the rule

matches against (again), and of course the conclusion is alreadyknown so nothing Such redundant rule matching can be avoided if we make thefollowing observation: Every new fact inferred on iteration t must be derived at leastone new fact inferred on iteration t - 1. This is true because any inference that does not require a new fact from iteration t -1 could have been done at iteration t -1 already.

This observation leads naturally to an incremental forward chaining algorithm where, at iteration t ,we check a rule only if its premise includes a conjunct that unifies with a fact

newly inferred at t -1.The rule matching step then fixes to match with butallows the other conjuncts of the rule to match with facts from any previous iteration. Thisalgorithm generates exactly the same facts at each iteration as the algorithm in Figure 9.3, butis much more efficient.

With suitable indexing, it is easy to identify all the rules that can be triggered by anygiven fact, and indeed marly real systems operate in an "update"mode wherein forward

occurs in response to each new fact that is to the system. Inferences cascade through the set of rules until the fixed point is reached, and then the process begins again for the next new fact.

•

•

•

mg

Datalog

CSPs.

subclasses can

example,

Diff(wa, A Diff(nt ,

chaining

Missile(x) => Weapon(x)

Missile(M1) happens.

iteration

m

It

Difj(q,

solved

Diff ( nsw, v) =>

TELLed

Weapon(.i\lh)

from

Pi Pi

chain-



Typically, only a small fraction of the rules in the knowledge base are actually triggered by the addition of a given fact. This means that a great deal of redundant work is done in con-structing partial matches repeatedly that have some unsatisfied premises. Our crime exampleis rather too small to show this effectively, but notice that a partial match is constructed on the first iteration between the rule

Sells(x,y , A

and the fact American( Wes t ) . This partial match is then discarded and rebuilt on the seconditeration (when the rule succeeds). It would be better to retain and gradually complete thepartial matches as new facts arrive, rather than discarding them.

RETE The rete algorithm3 was the first to address this problem seriously. The algorithmpreprocesses the set of rules in the knowledge base to construct a sort of dataflow network in which each node is a literal from a rule premise. Variable bindings flow through the network and are filtered out when they fail to match a literal. If two literals in a rule share afor example, Sells (x, y , A in the crime example-then the bindings from each literal are filtered through an equality node. A variable binding reaching a node for an n-ary literal such as y , might have to wait for bindings for the other variables to be established before the process can continue. At any given point, the state of a rete network captures all the partial matches of the rules, avoiding a great deal of recomputation.

Rete networks, and various improvements thereon, have been a key component ofPRODUCTIONSYSTEMS called production systems, which were among the earliest forward chaining systems in

widespread The system (originally called 1982) was built us-ing a production system architecture. contained several thousand rules for designing configurations of computer components for customers of the Digital Equipment Corporation. It was one of the first clear commercial successes in the emerging field of expert systems. Many other similar systems have been built using the same underlying technology, which has been implemented in the general-purpose language OPS-5.

COGNITIVEARCHITECTURES Production systems are also popular in cognitive architectures-that is, models of

man reasoning-suchas ACT (Anderson, 1983) and SOAR (Laird et al.,1987). In such sys-tems, the "working memory" of the system models human short-term memory, and the pro-ductions are part of long-term memory. On each cycle of operation, productions are matched against the working memory of facts. A production whose conditions are satisfied can add ordelete facts in memory. In contrast to the typical situation in databases, production systems often have many rules and relatively few facts. With suitably optimized matching technology, some modern systems can operate in real time with over a million rules.

Irrelevant facts

The final source of inefficiency in forward chaining appears to be intrinsic to the approach and also arises in the propositional context. (See Section 7.5.) Forward chaining makes all allowable inferences based on the known facts, even they are irrelevant to the goal athand. In our crime example, there were no rules capable of drawing irrelevant conclusions,

Rete is Latin for net. The English pronunciation rhymes with treaty.The word production in production systems denotes a condition-action rule.

3

4

American(x) I\ Weapon(y) I\

use.4

z) Hostile(z)

Sells(x , z )

XCON XCON

working

z) Hostile(z ) • Criminal(x)

variable-

so-

R 1, McDermott,

hu-

if


9.4. Backward Chaining

so the lack of directedness was not a problem. In other cases if we have several rules describing the eating habits of Americans and the prices of missiles), FOL-FC-ASKwillgenerate many irrelevant conclusions.

One way to avoid drawing irrelevant conclusions is to use backward chaining, as de-scribed in Section 9.4. .Another solution is to restrict forward chaining to a selected subset of rules; this approach was discussed in the propositional context. A third approach has emerged in the database community, where forward chaining is the tool.The idea is to rewrite rule set, using information from the goal, so that only relevant

MAGIC SET variable bindings-those belonging to a so-called magic set-are considered forwardinference. For example, if the goal is Criminal(West),the rule that concludes Criminal( x )will be rewritten to include an extra conjunct that constrains the value of x:

y , z) A .

The fact Magic(West) is also added to the this way, even if the knowledge base contains facts about millions of Americans, only Colonel West will be considered during the forward inference process. The complete process for defining magic sets and rewriting the knowledge base is too complex to go into here, but the basic idea is to perform a sort of"generic" backward inference from the goal in order to work out which variable bindings need to be constrained. The magic sets approach can therefore be thought of as a kind ofhybrid between forward inference and backward preprocessing.

The second major family of logical inference algorithms uses the backward ap-proach introduced in Section 7.5. These algorithms work backward from the goal, chaining through rules to find known facts that support the proof. We describe the basic andthen we describe how it is used in logic programming, which is the most widely form ofautomated reasoning. We will also see that backward chaining has some disadvantages com-pared with forward chaining, and we look at ways to overcome them. Finally, we will look at the close connection between logic programming and constraint satisfaction

A backward chaining algorithm Figure 9.6 shows a simple backward-chaining algorithm, It is with a list of goals containing a single element, the original query, and returns the set of all substi-tutions satisfying the The list of goals can be thought of as a waiting to be

on; if all of them can be satisfied, then the current branch of the proof Thealgorithm takes the first in the list and finds every clause in the knowledge base whose positive literal, or head, unifies with the goal. Each such clause creates a new recursive call in which the premise, or body, of the clause is added to the goal stack. Remember that facts are clauses with a head but no body, so when a goal unifies with a known fact, nogoals are added to the stack and the goal is solved. Figure 9.7 is the proof tree for deriving

West) from sentences (9.3) through (9.10).

Section

deductive the

Magic(x) A American(x) A Weapon(y) /\ Sells(x,

KB. In

9.4 BACKWARD CHAINING

worked

Criminal(

query.

goal

(e.g.,

standard

during

Hostile(z) * Griminal(x)

cha1ining

algorithm, used

problems.

FOL-BC-ASK. called

"stack" succeeds.

new sub-



function goals, returns a set of substitutionsinputs: K B , a knowledge base

goals, a list of conjuncts forming a query already applied) the current substitution, initially the empty substitution

local variables: answers, a set of substitutions, initially empty

if goals is empty then return

for each sentence in K B where STANDARDIZE- APART(^) = A . . . A q)and succeeds

new-goals . . .,answers new-goals, 0 ) ) answers

return answers

Figure 9.6 A simple backward-chaining algorithm.

Enemy

Figure 9.7 Proof tree constructed by backward chaining to prove that West is a criminal. The tree should be read depth first, left to right. To prove Criminal( Wes t ) ,we have to provethe four below it. Some of these are in the knowledge base, and others require further backward chaining. Bindings for each successful unification are shown next to thecorresponding Note that once one in a conjunction succeeds, its substitution is applied to subsequent Thus, by the time FOL-BC-ASKgets to the last conjunct, originally is already bound to

COMPOSITION The algorithm uses composition of substitutions. is the substitution whose effect is identical to the effect of applying each substitution in turn. That is,

In the algorithm, the current variable bindings, which are stored in are composed with the bindings resulting from unifying the goal with the clause head, giving a new set of currentbindings for the recursive call.

FOL-BC-ASK(KB, 0)

0,

{0} q' +- SUBST(0, FIRST(goals))

r 01 +- UNIFY(q, q')

+- [P1, p.,, IREST(goals)]

(0 {}

+- FOL-BC-AsK(KB, COMPOSE(0', U

{ }

{y/Ml}

conjuncts

subgoal. subgoals.

Hostile(z), z

C1iminal(Wes1)

{}

subgoal

Nono.

{ I

SUBST(COMPOSE(01, 02 ),p) = S VBST(02, SUBST(01 ,p)) .

0,

Pn =?

(Nono,America)

! l


Section 9.4. Backward Chaining 289

Backward chaining, as we have written it, is clearly a depth-first search algorithm. Thismeans that its space requirements are linear in the size of the proof (neglecting, for now, thespace required to accumulate the solutions). It also means that backward chaining (unlike forward chaining) suffers from problems with repeated states and incompleteness. We willdiscuss these problems and some potential solutions, but first we will see how backwardchaining is used in logic programming systems.

Logic programming

Logic programming is a technology that comes fairly close to embodying the declarative ideal described in Chapter 7: that systems should be constructed by expressing knowledge in a formal language and that problems should be solved by running inference processes on that knowledge. The ideal is summed up in Robert equation,

Algorithm = Logic Control .PROLOG is by far the most widely used logic programming language. Its users in the

hundreds of thousands. is used primarily as a rapid-prototyping language and formanipulation tasks such as writing compilers (Van Roy, 1990) and parsing natural language (Pereira and Warren, 1980). Many expert systems have written in for legal, medical, financial, and other domains.

programs are sets of definite clauses written in a notation differentfrom standard first-order. logic. uses uppercase letters for variables and lowercase for constants. Clauses are written with the head preceding the body; : is used forimplication, commas literals in the body, and a period marks the end of a sentence:

includes "syntactic sugar" for list notation and arithmetic. As an example, here is aprogram for append , which succeeds if list is the result of appending

lists x and

In English, we can read these clauses as (1) appending an empty list with a list Y producesthe same list Y and (2) Z I is the result of appending [A onto Y, provided that zis the result of appending onto This definition of append appears fairly similar to thecorresponding definition in Lisp, but is actually much more powerful. For example, we canask the query append ( A , B, 1,2I : what two lists can be appended to give 1 , 2 ? Weget back the solutions

The execution of programs is done via depth-first backward chaining, where clauses are tried in the order in which they are written in the knowledge base. Some: aspectsof fall outside standard logical inference:

Kowalski's

+ Prolog

It

Prolog Prolog

separate

been

number symbol-

Prolog

somewha1

" " left-

criminal(X)

Prolog

american(X), weapon(Y), sells(X ,Y,Z), hosti le(Z).

Prolog (X,Y,Z) Y:

append ( [ ] , Y , Y) .

app end( [A IXJ, Y, [A / Z])

[A l

X Y.

A= ( l B== [ 1, 2]

A= [ 1 l B == [ 2 l A== [ 1, 2] B== []

Prolog

Prolog

z

append(X,Y , Z) .

IX]



There is a set of built-in functions for arithmetic. Literals using these function symbols are "proved by executing code rather than doing further inference. For example, thegoal is 4 succeeds with bound to 7. On the other hand, the goal "5 i sfails, because the built-in functions do not do arbitrary equationThere are built-in predicates that have side effects when executed. These includeoutput predicates and the as t t predicates for modifying the knowledge base. Such predicates have no counterpart in logic and can produce some confusing effects-for example, if facts are asserted in a branch of the proof tree that eventuallyfails.

allows a form of negation called negation as failure. A negated goal n o t isconsidered proved if the system fails to prove Thus, the sentence alive ( X ) - n o t .

can be read as "Everyone is alive if not provably dead."has an equality operator, =, but it lacks the full power of logical equality. An

equality goal succeeds if the two terms are and fails otherwise. Sosucceeds with bound to and bound to 3, butfails. (In classical logic, the latter equality might or might not be true.) No facts or rulesabout equality can be asserted. The occur check is omitted from unification algorithm. This means that some unsound inferences can be made; these are seldom a problem except when using for mathematical theorem proving.

The decisions made in the design of represent a compromise between declarativeness and execution efficiency-inasmuch as efficiency was understood at the time wasdesigned. We will return to this subject after looking at how is implemented.

Efficient implementation of logic programs

The execution of a program can happen in two modes: interpreted and compiled. Interpretation essentially amounts to running the algorithm from Figure 9.6,with the program as the knowledge base. We say "essentially," because interpreterscontain a variety of improvements designed to maximize speed. Here we consider only two.

First, instead of constructing the list of all possible answers for each beforecontinuing to the next, interpreters generate one answer and a "promise" to generatethe rest when the current answer has been fully explored. This promise is called a choice

CHOICE POINT point. When the depth-first search completes its exploration of the possible solutions arising from the current answer and backs up to the choice point, the choice point is expanded toyield a new answer for the and a new choice point. This approach saves both time and space. It also provides a very simple interface for debugging because at all times there isonly a single solution path under consideration.

Second, our simple implementation of FOL-BC-ASKspends a good deal of time gen-erating and composing substitutions. implements substitutions using logic variables

Note that if the axioms are provided, such goals can be solved by inference within a program

•

"x

•

• Prolog

• Prolog

•

5

+3" X X+Y"

solving.5

input-sert/re rac

dead(X)

X 2 y

Prolog's

Prolog

Prolog

Prolog

subgoal

Prolog

Peano

P.

unifiable

p

X+Y=2+3 morningstar=eveningstar

Prolog

Prolog Prolog

FOL-BC-ASK Prolog

subgoal

Prolog


9.4. Backward Chaining 291

procedure ar , continuation) IGLOBAL-TRAIL-POINTER()

if ax = and then

a NEW-VARIABLE(); NEW-VARIABLE();if [ a x]) and [a then y , continuation)

Figure 9.8 Pseudocode representing the result of compiling the Append predicate. The function NEW-VARIABLE returns a new variable, distinct from all other variables so far used.The procedure continues execution with the specified

that can remember their current binding. At any point in time, every variable in the program either is unbound or is bound to some value. Together, these variables and values implicitly define the substitution for the current branch of the proof. Extending the path can only add new variable bindings, because an attempt to add a different binding for an boundvariable results in a failure of unification. When a path in the search fails, will back up to a previous choice point, and then it might have to unbind some variables. is done

TRAIL by keeping track of all variables that have been bound in a stack called the trail. As each new variable is bound by UNIFY-VAR, the variable is pushed onto the trail. When a goal fails and it is time to back up to a previous choice point, each of the variables is unbound as it isremoved from the trail.

Even the most efficient interpreters require several thousand machine instruc-tions per inference step because of the cost of index lookup, unification, and building the recursive call stack. In effect, the interpreter always behaves as if it has never the pro-gram before; for example, it has to clauses that match the goal. A compiled program, on the other hand, is an inference procedure for a specific set of clauses, so it knowswhat clauses match the goal. basically generates a miniature theorem prover for each different predicate, thereby eliminating much of the overhead of interpretation. It is sible to open-code the unification routine for each different call, thereby avoiding explicit analysis of term structure. (For details of open-coded unification, see Warren et al.

The instruction sets of today's computers give a poor match with semantics,so most compilers compile into an intermediate language rather than directly into ma-chine language. The most popular intermediate language is the Warren Abstract Machine, or'WAM, named after David H. D. Warren, one of the of the first com-piler. The WAM is an abstract instruction set that is suitable for and can be eitherinterpreted or translated into machine language. Other compilers translate into a high-level language such as Lisp or C and then use that language's compiler to translate to language. For example, the definition of theAppend predicate can be compiled into codeshown in Figure 9.8. There are several points worth mentioning:

Rather than having search the knowledge base for Append clauses, the be-come a procedure and the inferences are carried out simply by calling the procedure.

Section

OPEN·CODE

APPEND( ax, y,

tmil r-

l

[] UNIFY(y, az) CALL( continuation) R ESET-TRAIL(tmil)

'"-UNIFY(ax, UNIFY(az , I z])

CALL( continuation)

the

Prolog

find

Prolog

Prolog

• to

z '"- NEW-VARIABLE()

APPEND(x, z,

continuation.

already Prolog

This

seen Prolog

1lso pos-

(1977).) Prolog's

implementors Pro log Prolog

Prolog machine the

clauses


Chapter 9. Inference in First-Order Logic

As described earlier, the current variable bindings are kept on a trail. The first step of theprocedure saves the current state of the trail, so that it can be restored by RESET-TRAILif the first clause fails. This will undo any bindings generated by the first call to UNIFY.

CONTINUATIONS The trickiest part is the use of continuations to implement choice points. You can think of a continuation as packaging up a procedure and a list of arguments that together define what should be done next whenever the current goal succeeds. It would notdo just to return from a procedure like APPEND when the goal succeeds, because itcould succeed in several ways, and each of them has to be explored. The continuationargument solves this problem because it can be called each time the goal succeeds. In the APPEND code, if the first argument is empty, then the APPEND predicate has succeeded. We then CALL the continuation, with the appropriate bindings on the trail, to do whatever should be done next. For example, if the call to APPEND were at the top level, the continuation would print the bindings of the variables.

Before Warren's work on the compilation of inference in logic programming was too slow for general use. Compilers by Warren and others allowed code to achieve speeds that are competitive with C on a variety of standard benchmarks (Van Roy, 1990).Of course, the fact that one can write a planner or natural language parser in a few dozen lines of makes it somewhat more desirable than C for prototyping most small-scaleresearch projects.

Parallelization can also provide substantial speedup. There are two principal sources ofOR-PARALLELISM parallelism. The first, called OR-parallelism, comes from the possibility of a goal unifying

with many different clauses in the knowledge base. Each gives rise to an independent branch in the search space that can lead to a potential solution, and all such branches can be solved in parallel. The second, called AND-parallelism, comes from the possibility of solvingeach conjunct in the body of an implication in parallel. AND-parallelism is more difficult toachieve, because solutions for the whole conjunction require consistent bindings for all thevariables. Each conjunctive branch must communicate with the other branches to ensure a global solution.

Redundant inference and infinite loops

We now turn to the Achilles heel of the mismatch between depth-first search and search trees that include repeated states and infinite paths. Consider the following logic pro-gram that decides if a path exists between two points on a directed graph:

A simple three-node graph, described by the facts link(a, and link , is shownin Figure With this program, the query path a,c generates the proof tree shown in Figure On the other hand, if we put the two clauses in the order

: - .path : - link .

•

•

Prolog, Prolog

Prolog AI

AND-PARALLELISM

Prolog:

pa t h( X, Z ) link (X, Z) .

path (X , Z) p a t h( X , Y), lin k (Y , Z ).

b) (b, c )

9.9(a). )

9.l0(a).

path(X,Z) path( X, Y), link (Y, Z)

(X,Z) (X, Z)


Section 9.4. Backward Chaining 293

A B C

Figure 9.9 (a) Finding a path from A to can lead into an infinite loop.. (b) Agraph in which each is connected to two random successors in the next layer. apath from to requires 877 inferences.

Figure 9.10 (a) Proof that a path exists from A to C. (b) Infinite proof tree generated when the clauses are in the "wrong" order.

then follows the path shown in Figure is therefore incompleteas a theorem prover for definite clauses-evenfor programs, as this example because, for some knowledge bases, it fails to prove sentences that are entailed. Notice that forward chaining does not suffer from this problem: once path ( a , , path , andpath ( a ,c are inferred, forward chaining halts.

Depth-first backward chaining also has problems with redundant computations. For example, when finding a path from to in Figure performs 877 inferences, most of which involve finding all possible paths to nodes from which the goal isThis is similar to the repeated-state problem discussed in Chapter 3. The total amount ofinference can be in the number of ground facts that are generated. we applyforward chaining instead, at most n2 path facts can be generated linking n nodes.For the problem in Figure only 62 inferences are needed.

DYNAMICPROGRAMMING Forward chaining on graph search problems is an example of dynamic programming,

in which the solutions to subproblems are constructed incrementally from those of smallersubproblems and are cached to avoid recomputation. We can obtain a similar effect in a back-ward chaining system using memoization-that is, caching solutions to as they are

link (a , c )

fai l

Prolog

(a)

node

link (a.Y \

{Yl b j

(a)

infinite

exponential

9.9(b),

( )

C Prolog

(b)

9.lO(b). Prolog Datalog

Finding

shows-

b) (b, c)

9.9(b), Prolog unreachable.

If (X,Y)

subgoals



found and then reusing those solutions when the recurs, rather than repeating the previous computation. This is the approach taken by tabled logic programming systems,which use efficient storage and retrieval mechanisms to perform memoization. Tabled logicprogramming combines the goal-directedness of backward chaining with the dynamic pro-gramming efficiency of forward chaining. It is also complete for programs, which means that the programmer need less about infinite loops.

Constraint logic programming

In our discussion of forward chaining (Section we showed how constraint satisfaction problems (CSPs) can be encoded as definite clauses. Standard solves such problems in exactly the same way as the backtracking algorithm given in Figure 5.3.

Because backtracking enumerates the domains of the variables, it works only for fi-nite domain CSPs. In terms, there must be a finite number of solutions for any goalwith unbound variables. (For example, the goal d i (q,sa , which says that Queensland and South Australia must be different colors, has six solutions if three colors are allowed.) Infinite-domain CSPs-for example with integer or real-valued variables-require quite dif-ferent algorithms, such as bounds propagation or linear programming.

The following clause succeeds if three numbers satisfy the triangle inequality:

If we ask the query triangle ( 3 5),this works fine. On the other hand, if weask triangle 3,4, ,no solution will be found, because the cannot be handled by The difficulty is that variables in must be in one of two states: unbound or bound to a particular term.

Binding a variable to a particular term can be viewed as an extreme form of constraint,namely an equality constraint. Constraint logic programming (CLP) allows variables to be constrained rather than bound. A solution to a constraint logic program is the most specificset of constraints on the query variables that can be derived from the knowledge base. For example, the solution to the triangle 3, 4, query is the constraint 7 = 1.Standard logic programs are just a special case of CLP in which the solution constraints must be equality constraints-that is, bindings.

CLP systems incorporate various constraint-solving algorithms for the constraints al-lowed in the language. For example, a system that allows linear inequalities on real-valuedvariables might include a linear programming algorithm for solving those constraints. CLPsystems also adopt a much more flexible approach to solving standard logic programming queries. For example, instead of depth-first, left-to-right backtracking, they might use any ofthe more efficient algorithms discussed in Chapter 5, including heuristic conjunct ordering, backjumping, conditioning, and so on. CLP systems therefore combine elements of constraint satisfaction algorithms, logic programming, and deductive databases.

CLP systems can also take advantage of the variety of CSP search optimizations de-scribed in Chapter 5, such as variable and value ordering, forward checking, and intelligent backtracking. Several systems have been defined that allow the programmer more control

TABLED LOGIC PROGRAMMING

CONSTRAINT LOGIC PROGRAMMING

subgoal

Datalog worry

9.3), Prolog

Prolog ff

triangle(X,Y,Z) ·-X> =O, Y>=O, Z>= O, X+Y> =Z, Y+Z>=X, X+Z>=Y.

Prolog (

Prolog.

cutset

r 4, Z) subgoal Z>=O

Prolog

Z) > z >=


Section 9.5. Resolution 295

over the search order for inference. For example, the MRS Language (Genesereth and Smith,1981 Russell, 1985) allows the programmer to write to determine which conjuncts are tried first. The user could write a rule saying that the goal with the fewest shouldbe tried first or could write domain-specific rules for particular predicates.

The last of our three families of logical systems is based on resolution. We saw Chapter 7that propositional resolution is a refutation complete inference procedure for propositionallogic. In this section, we will see how to extend resolution to first-order logic.

The question of the existence of complete proof procedures is of direct concern to math-ematicians. If a complete proof procedure can be found for mathematical statements, two things follow: first, all conjectures can be established mechanically; second, all of mathe-matics be established as the logical consequence of a set of fundamental axioms. Thequestion of completeness has therefore generated some of the most important mathematicalwork of the 20th century. In 1930, the German mathematician Kurt proved the firstcompleteness theorem for first-order logic, showing that any entailed sentence has a finite THEOREM

proof. (No really practical proof procedure was found until J. A. Robinson published the resolution algorithm in I 965.) In 1931, proved an even more famous theorem. The theorem states that a logical system that includes the principle ofTHEOREM

without which very little of discrete mathematics can be constructed-is necessarily incom-plete. Hence, there are sentences that are entailed, but have no finite proof within system.The needle may be in the metaphorical haystack, but no procedure can guarantee it will be found.

Despite theorem, resolution-based theorem provers have been applied widely toderive mathematical theorems, including several for which no proof was known previously. Theorem provers have also been used to verify hardware designs and to generate logicallycorrect programs, among other applications.

Conjunctive normal form for first-order logic As in the propositional case, first-order resolution requires that sentences be in normal form is, a conjunction of clauses, where each clause is a of

Literals can contain variables, which are assumed to be universally quantified. For example, the sentence

x A A y , A Criminal(x)becomes, in CNF,

y , Criminal ( x ) .A clause can also be represented as an implication with a conjunction of atoms on the left and a disjunction of

atoms on the right, as shown in Exercise 7.12. This form, sometimes called Kowalski form when written with aright-to-left implication symbol (Kowalski, is often much easier to read.

METAF!ULES metarules variables

9 .5 RESOLUTION

in

can

Godel COMPLETENESS

Godel INCOMPLETENESS

Godel's

(CNF)-that literals.6

\:/ American(x) Weapon(y) Sells( x , z ) Hostile(z ) =>

,American(x ) V ·--, Weapon(y) V ,Sells(x, z) V , Hostile(z ) V

6

1979b),

incompleteness induction-

the that

conjunctive disjunction



Every sentence of first-order logic can be converted into an inferentially equivalent CNFsentence. In particular, the CNF sentence will be unsatisfiable just when the original sentence is unsatisfiable, so we have a basis for doing proofs by contradiction on the CNF sentences.

The procedure for conversion to CNF is very similar to the propositional case, whichwe saw on page 215. The principal difference arises from the need to eliminate existential quantifiers. We will illustrate the procedure by translating the sentence "Everyone who loves all animals is loved by someone," or

y y)] y x)]The steps are as follows:

Eliminate implications:

ti x y Animal ( y ) y v y Loves( y, xMove inwards: In addition to the usual rules for negated connectives, we need rules for negated quantifiers. Thus, we have

p becomes 3 xx p becomes .

Our sentence goes through the following transformations: y y .y y)] y x)].

v y .Notice how a universal quantifier y) in the premise of the implication has become an existential quantifier. The sentence now reads "Either there is some animal that xdoesn't love, or (if this is not the case) someone loves x." Clearly, the meaning of theoriginal sentence has been preserved. Standardize variables: For sentences like V (3x which use the same variable name twice, change the name of one of the variables. This avoidsconfusion later when we drop the quantifiers. Thus, we have

x y Animal ( y ) y)] .Skolemize: Skolemization is the process of removing existential quantifiers bynation. In the simple case, it is just like the Existential Instantiation rule of Section 9.1: translate 3 x into where A is a new constant. If we apply this rule to our sample sentence, however, we obtain

[Animal( A ) A)] Loves( B ,x )which has the wrong meaning entirely: it says that everyone either fails to love a par-ticular animal A or is loved by some particular entity B. In fact, our original sentence allows each person to fail to love a different animal or to be loved by a different person. Thus, we want the Skolem entities to depend on x:

ti x [Animal V x ) .SKOLEMFUNCTION Here F and G are Skolem functions. The general rule is that the arguments of the

SKOLEMIZATION

V x [V Animal(y) :::} Loves(x, :::} [::l Loves(y,

• [-N -, V Loves(x, )) [3 ) l

• --,

•

•

-Nx --,:J

,p Vx ,p

V x [3 ,(,Animal(y) V Loves(x, y))] V [3 Loves(y, x)] V x [3 ,,Animal(y) /\ ,Loves(x, V [:3 Loves(y, V x [3y Animal(y) /\ ,Loves(x, y)] [3 Loves(y, x))

(V

(V x P(x))

V f:3 /\ --.Loves(x, V [3 z Loves(z, x)]

P(x) P(A),

Vx /\ ,Loves(x, V

(F(x )) /\ --.Loves(x, F(x) )] Loves( G(x),

Q(x))

elimi-



Skolem function are all the universally quantified variables in whose scope the exis-tential quantifier appears. As with Existential Instantiation, the sentence issatisfiable exactly when the original sentence is satisfiable. Drop universal quantifiers: At this point, all remaining variables must be universally quantified. Moreover, the sentence is equivalent to one in which all the universal quan-tifiers have been moved to the left. We can therefore drop the universal quantifiers:

[Animal A x ) .Distribute V over

[Animal( F( x ) ) x ) ].This step may also require flattening out nested conjunctions and disjunctions.

The sentence is now in CNF and consists of two clauses. It is quite unreadable. (It mayhelp to explain that the function F( x )refers to the animal potentially by x,whereas refers to someone who might love x.) Fortunately, humans seldom need lookat CNF sentences-the translation process is easily automated.

The resolution inference rule The resolution rule for first-order clauses is simply a lifted version of the propositional reso-lution rule given on page 214. Two clauses, which are assumed to be standardized apart sothat they share no variables, can be resolved if they contain complementary literals. Propo-sitional literals are complementary if one is the negation of the other; first-order literals arecomplementary if one with the negation of the other. Thus we have

where For example, we can resolve the two clauses [Animal Loves x)] and v )

by eliminating the complementary literals x ) and v ) , with unifier = to produce the resolvent clause

[Animal .

BINARY RESOLUTION The rule we have just given is the binary resolution rule, because it resolves exactly two lit-erals. The binary resolution rule by itself does not yield a complete inference procedure. The full resolution rule resolves subsets of literals in each clause that are unifiable. An aliternative approach is to extend factoring-the removal of' redundant literals-to the first-order case.Propositional factoring reduces two literals to one if they are identical; first-order factoring reduces two literals to if they are The unifier be applied to entireclause. The combination of binary resolution and factoring is complete.

.Example proofs Resolution proves that KB by proving A unsatisfiable, by deriving the empty clause. The algorithmic approach is identical to the propositional case, described in

•

•

(F(x)) ,Loves(x,F(x))] V Loves(G(x),

/\:

Skolemized

V Loves(G(x),x)] I\ [,Loves(x,F(x)) V Loves(G(x),

Skolem unloved G(x)

unifies

SUBST(0 , £1 V · · · V Ci- 1 V £i+1 V · · · V Ek V m 1 V · · · V mj- 1 V m j+l V · · · V mn)

U NIFY( £i, ,mj) == 0.

(F(x)) V (G(x), [,Loves(u,

Loves(G(x ),

V,Kills(u,v)]

,Loves(u, 0 {u/G(x),v/x},

(F(x)) V ,Kills(G(x), x))

one unifiable. must the

I= a KB ,a i.e.,



Figure 9.11 A resolution proof that West is a criminal.

Figure 7.12, so we will not repeat it here. Instead, we will give two example proofs. The firstis the crime example from Section 9.3. The sentences in CNF are

y, z ) .V x ) Sells(West , x , .America) .V .

. .American(Wes t ) . Enemy America) .

We also include the negated goal Criminal(West) . The resolution proof is shown in Fig-ure 9.11. Notice the structure: single beginning with the goal clause, resolving against clauses from the knowledge base until the empty clause is generated. This is characteristic of resolution on Horn clause knowledge bases. In fact, the clauses along the main spinecorrespond exactly to the consecutive values of the goals variable in the backward chaining algorithm of Figure 9.6. This is because we always chose to resolve with a clause whose pos-itive literal unified with the literal of the "current"clause on the spine; this is exactly what happens in backward chaining. Thus, backward chaining is really just a special case ofresolution with a particular control strategy to decide which resolution to perform next.

Our second example makes use of and involves clauses that are not def-inite clauses. This results in a somewhat more complex proof structure. In English, the problem is as follows:

Everyone who loves all animals is loved by someone.Anyone who kills an animal is loved by no one. Jack loves all animals. Either Jack or Curiosity killed the cat, who is named Tuna. Did Curiosity kill the cat?

,A111ericm1/x) v -.Weupcn(y) v ,Sell,(x,y,z) v ,Hmti/e(z) v Crinritwl(:r)

American(Wesl) , A,nerica11(West) v -.Weapm,(y) v ,Se/1.!(Ww,y,z) v ,flo.<til,(z/

,Missile/.<) v Wtapon(x) -,lfeapc11(y) v ,S,1/s(Wtsr;y,z) v ,ffostile(t)

Mis~ilt(M,) -.Ml.!sile(y) v ,Sells( \Vesr,y,t) v ,Hosrile(z)

, Ml.!sile(x) v , Owns/Nono,x) v Sell.!(lVe<t,;:,Nono) ,ScU.(Wts;M1,.) v , Hosti/e/z/

Mlssik(M,) -.Missi le(MJ v -.Owns(Nono,M 1) v -.Hostile(Nono/

Owns(No110,M,) ,Owns(Nono,M,) v ,ffo,tite(Nono)

,£n, my(x,,lmerica) v Rastile(:r)

I Enemy(Nono,America) ~ -,Enemy(Nono,Amuica)

,American(x) V, Weapon(y) V ,Sells(x, ,Missile ( x) --, Owns (Nono, V

,Enemy(x, V Hostile(x) ,Missile(x) Weapon(x) Owns(Nono, M1) Missile(M1)

(Nono,

-,

"spine"

leftmost

Skolemization

V -... Hostile ( z ) V Criminal ( x) Nono)



First, we express the sentences, some background knowledge, and the negated goal G in first-order logic:

A. 'dx ['d y y x ) ]B. x y Animal ( y )AC. ' dx ( x ) x )D. Kills (Jack,Tuna)V Kills(Curiosity, Tuna)E.F. x Animal ( x )

Kills (Curiosity, Tuna) Now we apply the conversion procedure to convert each sentence to CNF:

Al . Animal x)A2. x )

B.C. ( x ) Loves(Jack, x )D. Kills (Jack,Tuna)V Curiosity, Tuna)E.F. Cat( x ) Animal ( x )

(Curiosity, Tuna) The resolution proof that Curiosity the cat is given in Figure 9.12. In English,, the proof could be paraphrased as follows:

Suppose Curiosity did not Tuna. We know that either Jack or Curiosity did; thus Jack must have. Tuna is a cat and cats are animals, so Tuna is an animal. Because anyone who kills an animal is loved by no one, we know that no one loves Jack. theother hand, Jack all animals, so someone loves him; so we have a contradiction. Therefore, Curiosity killed the cat.

The proof answers the question "Did Curiosity the cat?" but often we want to pose more general questions, such as "Who killed the cat?" Resolution can do this, but it takes a little

U

Figure 9.12 A proof that Curiosity killed the cat. Notice the use of factoring in the derivation of the clause Loves(G(Jack) , Jack) .

original

Animal(y) • Loves(x, y)] • [:3 Loves(y,

V [:3 Kills(x,y) ] • [Vz ,Loves(z,x)]

Animal • Loves ( Jack,

Cat(Tuna)

V Cat(x) => ,G. --,

(F(x)) V Loves(G(x) ,

,Loves(x, F(x)) V Loves(G(x),

,Animal(y) V ,Kills(x, y) V ,Loves(z, x)

,Animal V

Kills(

Cat( Tuna)

--, \/

,G. ,Kills

killed

kill Now,

loves

resolution

kill

On

~A11imal(.t) v Lovu(Jack, x )

A,rima/(F(x)) v Lo,,s(G(x}. x}



more work to obtain the answer. The goal is 3w Tuna), which, when negated, becomes Kills(w, Tuna) in CNF. Repeating the proof in Figure 9.12 with the new negatedgoal, we obtain a similar proof tree, but with the substitution in one of thesteps. So, in this case, finding out who killed the cat is just a matter of keeping track of thebindings for the query variables in the proof.

NONCONSTRUCTIVEPROOF Unfortunately, resolution can produce nonconstructive proofs for existential goals.

For example, Tuna) resolves with Kills(Jack,Tuna)V Kills(Curiosity,Tuna)to give Kills (Jack,Tuna),which resolves again with , Tuna) to yield the empty clause. Notice that w has two different bindings in this proof; resolution is telling us that,yes, someone killed Tuna-either Jack or Curiosity. This is no great surprise! One so-lution is to restrict the allowed resolution steps so that the query variables can be bound only once in a given proof; then we need to be able to backtrack over the possible bind-ings. Another solution is to add a special answer literal to the negated goal, which be-comes Tuna) Now, the resolution process generates an answerwhenever a clause is generated containing just a single answer literal. For the proof in Fig-ure 9.12, this is The nonconstructive proof would generate the clause Answer ( V Answer( Jack),which does not constitute an answer.

Completeness of resolution

This section gives a completeness proof of resolution. It can be safely skipped by those who are willing to take it on faith.

REFUTATIONCOMPLETENESS We will show that resolution is refutation-complete, which means that set of

tences is unsatisfiable, then resolution will always be able to derive a contradiction. Resolu-tion cannot be used to generate all logical consequences of a set of sentences, but it can beused to establish that a given sentence is entailed by the set of sentences. Hence, it can beused to find all answers to a given question, using the negated-goal method that we described earlier in the Chapter.

We will take it as given that any sentence in first-order logic (without equality) can be rewritten as a set of clauses in CNE This can be proved by induction on the form ofthe sentence, using atomic sentences as the base case (Davis and Putnam, 1960). Our goal therefore is to prove the following: is an set of clauses, then the applicationof number of resolution steps to S yield a contradiction.

Our proof sketch follows the original proof due to Robinson, with some simplifications from Genesereth and Nilsson (1987). The basic structure of the proof is shown in Figure 9.13; it proceeds as follows:

1. First, we observe that if S is unsatisfiable, then there exists a particular set of groundinstances of the clauses of S such that this set is also unsatisfiable (Herbrand's theorem).

2. We then appeal to the groundresolutiontheorem given in Chapter 7, which states that propositional resolution is complete for ground sentences.

3. We then use a lifting lemma to show that, for any propositional resolution proof usingthe set of ground sentences, there is a corresponding first-order resolution proof using the first-order sentences from which the ground sentences were obtained.

Kills( w, ---,

{ w /Curiosity}

-,l(ills(w, ,Kills(w

ANSWER LITERAL

,Kills(w, V Answer(w).

Answer( Curiosity). Curiosity)

if a sen-

if S unsatisfiable a.finite will



Any set of sentences S is re resentable in clausal form

Assume S is unsatisfiable, and in clausal form

Herbrand's theorem

set of ground ihstances is unsatisfiableI

Ground resolution theorem

Resolution can find a contradiction in S'

Lifting lemma

There is a resolution proof for the contradiction in S'

Figure 9.13 Structure of a completeness proof for resolution.

To carry out the first step, we will need three new concepts:

UNIVERSE universe: If S is a set of clauses, then the universe of S, isthe set of all ground terms constructible from the following:

a. The function symbols in S, if any.b. The constant symbols in S , if any; if none, then the constant symbol A.

For example, if S contains just the clause A)) V A) V B), thenis the following infinite set of ground terms:

A), A)),. . .SATURATION Saturation: If S a set of clauses and P is a set of ground terms, then the

saturation of S with respect to P , is the set of all ground clauses obtained by applyingall possible consistent substitutions of ground terms in P with variables in S .

BASE base: The saturation of a set S of clauses with respect to its verse is called the base of S , written as For example, if containssolely the clause just given, then is the infinite set of clauses

. . . HERBRAND'STHEOREM These definitions allow us to state a of Herbrand's theorem (Herbrand, 1930):

If a set S of clauses is unsatisfiable, then there exists a finite subset of (S)thatis also unsatisfiable.

Let be this finite subset of ground sentences. Now, we can appeal to the ground resolution theorem (page 217) to show that the resolution closure contains the empty clause. That is, running propositional resolution to completion on S' will derive a contradiction.

Now that we have established that there is always a resolution proof involving some finite subset of the base of S, the next step is to show that there is a resolution

HERBRAND

HERBRAND

f i . .....,___ __

Some S'

i---i--

I

• Herbrand H s, Herbrand

, P(x, F(x, ,Q(x, R(x, Hs

{A , B , F (A, A), F(A, B), F(B, F(B, B), F(A, F(A, .}

• is P(S),

• Herbrand Herbrand uniS

S'

Herbrand Hs(S)

{, P(A, F(A, A)) V ,Q(A,A) V R(A,B), ,P(B ,F(B,A) ) V , Q(B ,A) V R(B, B),

Hs(S).

,P(F(A, A), F (F(A, A), A)) V , Q(F(A, A), A) V R(F(A, A), B), , P (F(A,B), F(F(A,B), A) ) V,Q(F(A, B),A) V R(F(A,B),B), }

form

Hs

RC(S')

Herbrand



By slightly extending the language of first-order logic to allow for the mathemat-ical induction schema in arithmetic, was able to show, in his incomplete-ness theorem, that there are true arithmetic sentences that cannot be proved.

The proof of the incompleteness theorem is somewhat beyond the scope ofthis book, occupying, as it does, at least 30 pages, but we can give a hint here. Webegin with the logical theory of numbers. In this theory, there is a single constant, 0, and a single function, S (the successor function). In the intended model, denotes 1, denotes 2, and so on; the language therefore has names for all the natural numbers. The vocabulary also includes the function symbols x, andExpt (exponentiation) and the usual set of logical connectives and quantifiers. Thefirst step is to notice that the set of sentences that we can write in this language can be enumerated. (Imagine defining an alphabetical order on the symbols and then arranging, in alphabetical order, each of the sets of sentences of length 1, 2, andso on.) We can then number each sentence a with a unique natural number (the number). This is crucial: number theory contains a name for each ofits own sentences. Similarly, we can number each possible proof P with a number because a proof is simply a finite sequence of sentences.

Now suppose we have a recursively enumerable set A of sentences that aretrue statements about the natural numbers. Recalling that can be named by agiven set of integers, we can imagine writing in our language a sentence A) ofthe following sort:

i is not the number of a proof of the sentence whose number is where the proof uses only premises in A.

Then let a be the sentence A), that is, a sentence that states its ownability from A. (That this sentence always exists is true but not entirely obvious.)

Now we make the following ingenious argument: Suppose that a is provablefrom A; then is false (because a says it cannot be proved). But then we have afalse sentence that is provable from A, so A cannot consist of only true a violation of our premise. Therefore a is not provable from A. But this is exactlywhat itself claims; hence is a true sentence.

So, we have shown (barring pages) that for any set of true sentences ofnumber theory, and in particular any set of basic axioms, there are other true sen-tences that cannot be proved from those axioms. This establishes, among other things, that we can never prove all the theorems of mathematics within any given system of axioms. Clearly, this was an important discovery for mathematics. Its significance for has been widely debated, beginning with speculations by himself. We take up the debate in Chapter 26.

GODEL' S INCOMPLETE ESS THEOREM

Godel

S(O) S(S(O))

+,

#o: Godel

Godel G(P),

A o{j,

Vi Godel Godel j,

unprov-

sentences-

(7

AI Godel



proof using the clauses of S itself, which are not necessarily ground clauses. We start byconsidering a single application of the resolution rule. Robinson's basic lemma thefollowing fact:

Let and be two clauses with no shared variables, and let and beground instances of and If is a resolvent of and then there a clause such that (1) is a resolvent of and and (2) is a ground instance of C.

LIFTINGLEMMA This is called a lifting because it lifts a proof step from ground clauses up to generalfirst-order clauses. In order to prove his basic lifting lemma, Robinson had to unifi-cation and derive all of the properties of most general unifiers. Rather than repeat the proof here, we simply illustrate the lemma:

= A ) )V A ) B )=

=

A ) ) A) B )= .

We see that indeed is a ground instance of In general, for and to have any resolvents, they must be constructed by first applying to and the most general unifierof a pair of complementary literals in and From the lifting lemma, it is easy to derivea similar statement about any sequence of applications of the resolution rule:

For any clause C' in the resolution closure of there a clause C in the resolu-tion closure of such that is a ground instance of C and the derivation ofis the same length as the derivation of

From this fact, it follows that if the empty clause appears in resolution closure of itmust also appear in the resolution closure of S. This is because the empty clause cannot be aground instance of any other clause. To recap: we have shown that if S is unsatisfiable, then there is a finite derivation of the empty clause using the resolution rule.

The lifting of theorem proving from ground clauses to first-order clauses provides a vastincrease in power. This increase comes from the fact that the first-order proof need instantiate variables only as far as necessary for the proof, whereas the ground-clause wererequired to examine a huge number of arbitrary instantiations.

Dealing with equality

None of the inference methods described so far in this chapter handle equality. There are three distinct approaches that can be taken. The first approach is to axiomatize equality-towrite down sentences about the equality relation in the knowledge base. We need to say that

is reflexive, symmetric, and transitive, and we also have to say that we can substitute

C

lemma,

C1 ,P(x, F(x, ,Q(x, V R(x,

C2 ,N(G(y), z) V P(H(y), z)

C' 1

C~ ,P(H(B) 1 F(H(B), A)) V ,Q(H(B ), A) V R(H(B), B)

C~ = ,N(G(B), F(H(B), A)) V P(H(B), F (I-l(B), A)) C' = ,N(G(B) , F(H(B), V ,Q(H(B), V R(H(B),

C ,N(G(y),F(H(y),A)) V ,Q(H(y),A) V R(H(y), B)

C' C. Ci C1 C2

C1 C2.

S' is s, C'

C'.

the

equality

implies

C' 2 exists

invent

C' 2

C

S',

methods



equals for equals in any predicate or function. So we need three basic axioms, and then onefor each predicate and function:

x = xx = y y = x

Given these sentences, a standard inference procedure such as resolution can perform tasks requiring equality reasoning, such as solving mathematical equations.

Another way to deal with equality is with an additional inference rule. The simplestrule, demodulation, takes a unit clause x = y and substitutes y for any term that unifies with x in some other clause. More formally, we have

DEMODULATION Demodulation: For any terms x , y, and where = and is aliteral containing

V . VDemodulation is typically used for simplifying expressions using collections of assertionssuch as x + = x , = x , and so on. The rule can also be extended to handleclauses in which an equality literal appears:

PARAMODULATION Paramodulation: For any terms x , y, and where =

. . . . . Unlike demodulation, paramodulation yields a complete inference procedure for first-orderlogic with equality.

A third approach handles equality reasoning entirely within an extended unification algorithm. That is, terms are unifiable if they are provably equal under some substitution, where "provably" allows for some amount of equality reasoning. For example, the terms 1 2 and 2 +1 normally are not unifiable, but a unification algorithm that knows that x +y = y x could unify them with the empty substitution. Equational unification of this kind can be done with efficient algorithms designed for the particular axioms used (commutativity, associativity, and so on), rather than through explicit inference with those axioms. Theorem provers using this technique are closely related to the constraint logic programming systemsdescribed in Section 9.4.

Resolu t ion strategies

We know that repeated applications of the resolution inference rule will eventually find aproof if one exists. In this subsection, we examine strategies that help find proofs

EOOATIONAL UNIFICATION

•

•

+

\/x \/ x, y • \fx y, z x=y/\y=z :::¢'· x=z \/x,y x =y =• (Pi(x) {=? Pi(y)) \f x, y x = y • (P2(x) ,~ P2 (y))

'vw,x,y,z w=y/\x = z ==> (Fi(w)x)=F1(y,z)) Vw,x,y,z w=y/\x=z • (F2(w,x)= F2(y,z))

z, UNIFY(x, z ) z:

x =y, m1 V · - · v mn[z] m1 .. mn[SUBST(0, y)]

0 Xl

z, UNIFY(x , z) f1 V · · · V fk V X = Y, m1 V · · · Vmn [z] SUBST(0, l!1 V V £k V m1 V · V mn[y]) ·

+

e mn [zJ

non-unit

e,

efficiently.



Unit preference

This strategy prefers do resolutions where one of the sentences is a single literal (also known as a unit clause). The idea behind the strategy is that we are trying to produce an empty clause, so it might be a good idea to prefer inferences that produce clauses.Resolving a unit (such as P) with any other sentence (such as V V R)always yields a clause (in this case, V R) that is shorter than the other Whenthe unit preference strategy was first tried for propositional inference in 1964, it led to a dramatic speedup, it feasible to prove theorems that could not be handled without the preference. Unit preference by itself does not, however, reduce the branching factor inmedium-sized problems enough to make them solvable by resolution. It is, nonetheless, a useful heuristic that can be combined with other strategies.

UNIT RESOLUTION Unit resolution a restricted form of resolution in which every resolution step must involve a unit clause. Unit resolution is incomplete in general, but complete for Horn knowl-edge bases. Unit resolution proofs on Horn knowledge bases resemble forward chaining.

Set of support

Preferences that try certain resolutions first are helpful, but in general it is effectiveto try to eliminate some potential resolutions altogether. The set-of-support strategy does

that. It starts by identifying a subset of the sentences called the set of support. Everyresolution combines a from the set of support with another sentence adds the resolvent into the set of support. If the set of support is small relative to the wholebase, the search space will be reduced dramatically.

We have to be careful with this approach, because a bad choice for the set of supportwill make the algorithm incomplete. However, if we choose the set of support S so thatthe remainder of the sentences are jointly satisfiable, then set-of-support will be complete. A common approach is to use the negated query as the set of support, on the assumption that the original knowledge base is consistent. (After all, if it is not consistent,then the fact that the query follows from it is vacuous.) The set-of-support strategy has the additional advantage of generating proof trees that are often easy for humans to because they are goal-directed.

Input resolution

In the input resolution strategy, every resolution combines one of the input sentences (from the or the query) witlh some other sentence. The proof in Figure 9.11 uses only input res-olutions and has the characteristic shape of a single "spine" with single sentences combining onto the spine. Clearly, the space of proof trees of this shape is smaller than the of allproof graphs. In Horn bases, Ponens is a kind of input strategy,because it combines an implication from the original with some other sentences. Thus, itis no surprise that input resolution is complete for knowledge bases that are in Horn form, but

LINEAR RESOLUTION incomplete in the general case. The linear resolution strategy is a slight generalization that allows P and Q to be resolved together either if is in the original KB or if P is an ancestor of Q in the proof tree. Linear resolution is complete.

to

sentence ,Q

making

is

SET OF SUPPORT j US t sentence

INPUT FtESOLUTION

KB

knowledge Modus KB

p

shorter -,p ,Q

clause.

more

and knowledge

resolution

understand,

space resolution



Subsumption

The subsumption method eliminates all sentences that are subsumed by more specificthan) an existing sentence in the KB. For example, if is in the KB, then there is no sensein adding and even less sense in adding V Subsumption helps keep theKB small, and thus helps keep the search space small.

Theorem provers

Theorem provers (also known as automated reasoners) differ from logic programming lan-guages in two ways. First, most logic programming languages handle only Horn clauses, whereas theorem provers accept full first-order logic. Second, programs intertwinelogic and control. The programmer's choice : - C instead of : - affectsthe execution of the program. In most theorem provers, the syntactic form chosen for sen-tences does not affect the results. Theorem provers still need control information to operate efficiently, but that information is usually kept distinct from the knowledge base, rather than being part of the knowledge representation itself. Most of the research in theorem provers involves finding control strategies that are generally useful, as well as increasing the speed.

Design of a theorem prover

In this section, we describe the theorem prover OTTER (Organized Techniques forproving and Effective Research) with particular attention to its control strat-egy. In preparing a problem for OTTER, the user must divide the knowledge into four parts:

A set of clauses known as the set of support (or sos), which defines the importantfacts about the problem. Every resolution step resolves a member of the set of supportagainst another axiom, so the search is focused on the set of support.A set of usable axioms that are outside the set of support. These provide background knowledge about the problem area. The boundary between what is part of the problem(and thus in sos) and what is background (and thus in the usable axioms) is up to theuser's judgment. A set of equations known as rewrites or demodulators. Although demodulators areequations, they are always applied in the left to right direction. Thus, they define acanonical form into which all terms will be simplified. For example, the demodulator x + = x says that every term of the form x + should be replaced by the term A set of parameters and clauses that defines the control strategy. In particular, the userspecifies a heuristic function to control the search and a filtering function to eliminatesome as uninteresting.

OTTER works by continually resolving an element of the set of support against one of theusable axioms. Unlike it uses a form of best-first search. Its heuristic functionsures the "weight" of each clause, where lighter clauses are preferred. The exact choice ofheuristic is up to the user, but generally, the weight of a clause should be correlated with its size or difficulty. Unit clauses are treated as light; the search can thus be seen as a generaliza-tion of the unit preference strategy. At each step, OTTER moves the "lightest"clause in the

SUBSUMPTION

P(A)

•

•

•

0

• subgoals

Prolog,

P(x) P(A) Q(B).

A B,

(McCune, 1992),

0

Prolog

(i.e.,

A C, B

Theorem-

x .

mea-


9.5. Resolution 307

set of support to the usable list and adds to the set of support some immediate of resolving the lightest clause with elements of the usable list. OTTER halts when it hasfound a refutation or when there are no more clauses in the set of support. The algorithm is shown in more detail in Figure 9.14.

procedure usable)inputs: sos, a set of support-clauses defining the problem (a global variable)

usable, background knowledge potentially relevant to the problem

repeatclause the lightest member of sosmove clause sos to usable

usable), sos)until sos = or a refutation has been found

function usable) returns clauses

resolve clause with each member of usablereturn the resulting clauses after applying FILTER

procedure sos)

for each clause in clauses doclausemerge identical literals

clause if it is a tautologysos [clause sos]if clause has no literals then a refutation has been found if clause has one literal then look for unit refutation

Figure 9.14 Sketch of the OTTER theorem prover. Heuristic control is applied in theselection of the "lightest" clause and in the FILTER function that eliminates uninteresting clauses from consideration.

Extending

An alternative way to build a theorem prover is to start with a compiler and extend it to get a sound and complete reasoner for full first-order logic. This was the approach taken in the Technology Theorem Prover, or PTTP (Stickel, 1988). PTTP fivesignificant changes to to restore completeness and expressiveness:

The occurs check is put back into the unification routine to make it sound.The depth-first search is replaced by an iterative deepening search. This thesearch strategy complete and takes only a constant factor more time. Negated literals as are allowed. In the implementation, there are two separate routines, one trying to prove P and one trying to prove

Section

• •

•

OTTER(sos,

from PROCBSS(INFE R(clause,

[]

I NFER( clause,

PROCES s( clauses,

+- SlMPLIFY(clause)

discard +- I

Prolog

Prolog Prolog

(such ,P(x))

consequences

Prolog

includes

makes

,P.



A clause with n atoms is stored as n different rules. For example, A B A C wouldalso be stored as C A and as B A This technique, known as

LOCKING locking, means that the current goal need be unified with only the head of each clause,yet it still allows for proper handling of negation.

Inference is made complete (even for non-Horn clauses) by the addition of the linearinput resolution rule: If the current goal unifies with the negation of one of the goals on the stack, then that goal can be considered solved. This is a way of reasoning by con-tradiction. Suppose the original goal is P and this is reduced by a series of inferencesto the goal This establishes that P, which is logically equivalent to P.

Despite these changes, PTTP retains the features that make fast. Unifications are stilldone by modifying variables directly, with unbinding done by unwinding the trail during backtracking. The search strategy is still based on input resolution, meaning that every reso-lution is against one of the clauses given in the original statement of the problem (rather than a derived clause). This makes it feasible to compile all the clauses in the original statement of the problem.

The main drawback of PTTP is that the user has to relinquish all control over the search for solutions. Each inference rule is used by the system both in its original form and in thecontrapositive form. This can lead to unintuitive searches. For example, consider the rule

As a rule, this is a reasonable way to prove that two f terms are equal. But PTTP would also generate the contrapositive:

It seems that this is a wasteful way to prove that any two terms x and a are different.

Theorem provers as assistants

So far, we have thought of a reasoning system as an independent agent that has to make decisions and act on its own. Another use of theorem provers is as an assistant, providing advice to, say, a mathematician. In this mode the mathematician acts as a supervisor, mapping out the strategy for determining what to do next and asking the theorem prover to fill inthe details. This alleviates the problem of semi-decidability to some extent, because thesupervisor can cancel a query and try another approach if the query is taking too much time.

PROOF-CHECKER A theorem prover can also act as a proof-checker, where the proof is given by a as a series of fairly large steps; the individual inferences required to show that each step is soundare filled in by the system.

REASONER A Socratic reasoner is a theorem prover whose ASK function is incomplete, but which can always arrive at a solution if asked the right series of questions. Thus, Socratic reasonersmake good assistants, provided that there is a supervisor to make the right series of calls toASK. 1989) is a Socratic reasoning system for mathematics.

•

•

SOCRATIC

, A -,c {::::

-,p =?

(f(x,y) =f(a,b)) {= (x=a) I\ (y=b)

Prolog

(x =I= a) ¢= (f(x, y) =I= f(a, b)) I\ (y = b)

ONTIC (McAllester,

-,A.

Prolog

human



Practical uses of theorem provers

Theorem provers have come up with novel mathematical results. The SAM (Semi-AutomatedMathematics) program was the first, proving a lemma in lattice theory (Guard et 1969).The AURA program has also answered open questions in several areas of mathematics (Wosand Winker, 1983). The Boyer-Moore theorem prover (Boyer and Moore, 1979) has been used and extended over many years and was used by Natarajan Shankar to give the first fully rigorous formal proof of Incompleteness Theorem (Shankar, 1986). The OTTER pro-gram is one of the strongest theorem provers; it has been used to solve several questionsin combinatorial logic. most famous of these concerns algebra. In 1933, Her-bert proposed a simple set of axioms that appeared to define Boolean but noproof of this could be found (despite serious work by several mathematicians including Al-fred himself). On October 1996, after eight days of computation, EQP (a version of OTTER) found a proof 1997).

VERIFICATION Theorem provers can be applied to the problems involved in the andthesis of both hardware and software, because both domains can be given correct atizations. Thus, theorem proving research is carried out in the fields of hardwareprogramming languages, and software engineering-not just in AI. In the case of software,the axioms state the properties of each syntactic element of the programming language. (Rea-soning about programs is quite similar to reasoning about actions in the situation calculus.) An algorithm is verified by showing that its outputs meet the specifications for all TheRSA public key encryption algorithm and the Boyer-Moore string-matching algorithm have been verified this way (Boyer and Moore, 1984). In the case of hardware, the axioms describe the interactions between signals and circuit elements. (See Chapter 8 for an example.) Thedesign of a 16-bit adder has been verified by AURA (Wojcik, 1983). Logical reasoners de-signed specially for verification have been able to verify entire including timingproperties (Srivas and Bickford, 1990).

The formal synthesis of algorithms was one of the first uses of theorem provers, as outlined by Green who built on earlier ideas by Simon (1963). The idea is to prove a theorem to the effect that "there exists a program p satisfying a speci-fication." If the proof is constrained to be constructive, the program can be extracted.

DEDUCTIVESYNTHESIS though automated dleductivesynthesis, as it is called, has not yet become feasible for

general-purpose programming, hand-guided deductive synthesis has been successful in de-signing several novel and sophisticated algorithms. Synthesis of special-purpose programs isalso an active area of research. In the area of hardware synthesis, the AURA theorem prover has been applied to design circuits that are more compact than any previous design

and Wojcik, 1983). For many circuit designs, propositional logic is sufficientbecause the set of interesting propositions is fixed by the set of circuit elements. appli-cation of propositional inference in hardware synthesis is now a standard technique having many large-scale deployments et al. (1993)).

These same techniques are now starting to be applied to software verification as well,by systems such as the SPIN model checker (Holzmann, 1997). For example, the Remote Agent spacecraft control program was verified before and after flight (Havelund et al., 2000).

ROBBINS ALGEBRA

Robbins

Tarski

Godel's

The

10, (McCune,

al.,

open Robbins

algebra,

verification syn-svNTH1:s1s axiom-

design,

inputs.

CPUs, their

Cordell (1969a), certain

Al-fully

(Wof ciechowski

The

(see, e.g., Nowick



We have presented an analysis of logical inference in first-order logic and a number of algo-rithms for doing it.

A first approach uses inference rules for instantiating quantifiers in order to tionalize the inference problem. Typically, this approach is very slow.The use of unification to identify appropriate substitutions for variables eliminates theinstantiation step in first-order proofs, making the process much more efficient. A lifted version of Ponens uses unification to provide a natural and powerful inference rule, generalized Ponens. The forward chaining and backwardchaining algorithms apply this rule to sets of definite clauses. Generalized Ponens is complete for definite clauses, although the entailment problem is semidecidable. For programs consisting of function-free definite clauses, entailment is decidable.Forward chaining is used in deductive databases, where it can be combined with re-lational database operations. It is also used in production systems, which perform efficient updates with very large rule sets. Forward chaining is complete for programs and runs in polynomial time. Backward chaining is used in logic programming systems such as which em-ploy sophisticated compiler technology to provide very fast inference. Backward chaining suffers from redundant inferences and infinite loops; these can be alleviated by memoization.The generalized resolution inference rule provides a complete proof system for order logic, using knowledge bases in conjunctive normal form.Several strategies exist for reducing the search space of a resolution system without compromising completeness. Efficient resolution-based theorem provers have been used to prove interesting mathematical theorems and to verify and synthesize software and hardware.

AND HISTORICAL NOTES

Logical inference was studied extensively in Greek mathematics. The type of inference most SYLLOGISM carefully studied by Aristotle was the syllogism, which is a kind of inference rule. Aristotle's

syllogisms did include elements of first-order logic, such as quantification, but were restricted to unary predicates. Syllogisms were categorized by "figures"and "moods,"depending on the order of the terms (which we would call predicates) in the sentences, the degree of generality(which we would today interpret through quantifiers) applied to each term, and whether each term is negated. The most fundamental syllogism is that of the first mood of the first figure:

9.6 SUMMARY

•

•

•

•

•

• •

•

•

•

B IBLIOGRAPHICAL

proposi-

Modus Modus

Modus Datalog

Datalog

Prolog,

first-


Section 9.6. Summary 311

A11S are M.All areTherefore, all S are P

Aristotle tried to prove the validity of other by "reducing" them to those of thefirst figure. He was much less precise in describing what this "reduction" should thanhe was in characterizing the syllogistic figures and moods themselves.

Gottlob Frege, who developed full first-order logic in 1879, based his of in-ference on a large collection of logically valid schemas plus a single inference rule, Ponens. Frege took advantage of the fact that the effect of an inference rule of the form"From

infer Q" can be simulated by applying Ponens to along with a logically validschema P Q. This style of exposition, Ponens plus a number of logically valid schemas, was employed by a number of logicians after Frege; most notably, it was used in Principia (Whitehead and Russell, 1910).

Inference rules, as distinct from axiom schemas, were the focus of the natural deduc-tion approach, introduced by Gerhard Gentzen (1934) and by (1934).Natural deduction is called "natural" because it does not require conversion to (unread-able) normal form and its inference rules are intended to appear natural to hu-mans. Prawitz (1965) offers a book-length treatment of natural deduction. Gallier (1986) uses Gentzen's approach to expound the theoretical of automated

The invention of clausal form was a crucial step in the development of a deep mathe-matical analysis of first-order logic. Whitehead and Russell (1910) expounded the so-calledrules of passage (the actual term is from (1930)) that are used to move quantifiers to the front of formulas. constants and Skolem functions were introduced, appropri-ately enough, by Thoralf Skolem (1920). The general procedure for skolemization is givenby Skolem along with the important notion of the universe.

Herbrand's theorem, named after the French logician Jacques hasplayed a vital role in the development of automated reasoning methods, both before and after Robinson's introduction of resolution. This is reflected in reference to theuniverse"rather than the "Skolem universe," even though Skolem really invented the concept.

can also be regarded as the inventor of unification. (1930) built on the ideas of Skolem and to show that first-order logic has a proof AlanTuring (1936) and Alonzo Church (1936) simultaneously showed, using very different proofs, that validity in first-order logic was not decidable. The excellent text by Enderton (1972) explains all of these results in a rigorous yet moderately understandable fashion.

Although (1958) had suggested the use of first-order logic for and reasoning in AI, the first such systems were developed by logicians interested in

theorem proving. It was Abraham Robinson who proposed the use ofand Herbrand's theorem, and Gilmore (1960) who wrote the first program

based on this approach. Davis and (1960) used clausal form and produced a program that attempted to find refutations by substituting members of the universe for vari-ables to produce ground clauses and then looking for propositional inconsistencies theground clauses. Prawitz (1960) developed the key idea of letting the quest for propositional

M P.

P, "axiomatic"

Mathematica

because

syllogisms

Modus p

using Modus

involve

system Modus

Stanislaw Jaskowsld

underpinnings deduction.

Skolem

(1928),

Herbrand

tion mathematical sitionalization

Herbrand

McCarthy

Herbrand

Putnam

Herbrand Herbrand (1930),

our

Godel complete

Herbrand

"Herbrand

procedure.

representa-

propo-

among



inconsistencydrive the search process, and generating terms from the universe only when it was necessary to do so in order to establish propositional inconsistency. After fur-ther development by other researchers, this idea led J. A. Robinson (no relation) to developthe resolution method (Robinson, 1965). The so-called inverse method developed at about the same time by the Soviet researcher S. Maslov (1964, based on somewhat differ-ent principles, offers similar computational advantages over propositionalization. Wolfgang

(1981) connection method can be viewed as an extension of this approach. After the development of resolution, work on first-order inference proceeded in several

different directions. In AI, resolution was adopted for question-answering systems byGreen and Bertram Raphael (1968). A somewhat less formal approach was taken by Carlwitt (1969). His PLANNER language, although never fully implemented, was a precursor tologic programming and included directives for forward and backward chaining and for nega-tion as failure. A subset known as MICRO-PLANNER and Winograd, 1970) wasimplemented and used in the SHRDLU natural language understanding system (Winograd,1972). Early implementations put a good deal of effort into data structures that would al-low efficient retrieval of facts; this work is covered in programming texts al.,1987; Norvig, 1992; and de 1993).

By the early forward chaining was well established in as an easily under-standable alternative to resolution. It was used in a wide variety of systems, ranging from

geometry theorem prover (Nevins, 1975) to the expert system for VAX config-uration 1982). applications typically involved large numbers of rules, so it was important to develop efficient rule-matching technology, particularly for incremental updates. The technology for production systems was developed to support such applica-tions. The production system language (Forgy, 1981; Brownston et al., 1985) wasused for R l and for the SOAR cognitive architecture (Laird al., 1987). OPS-5 incorpo-rated the rete match process (Forgy, 1982). SOAR, which generates new rules to cache the results of previous computations, can handle very large rule sets--over 8,000 rules in the case of the system for controlling simulated fighter aircraft (Jones et al., 1998).CLIPS (Wygant, 1989) was a C-based production system language developed at NASA thatallowed better integration with other software, hardware, and sensor systems and was usedfor spacecraft automation and several military applications.

The area of research known as deductive databases has also contributed a great dealto our understanding of forward inference. It began with a workshop in Toulouse in 1977, organized by Jack Minker, that brought together experts in logical inference and databasesystems (Gallaire and Minker, 1978). A recent historical survey (Ramakrishnan and Ullman,1995) says, "Deductive [database] systems are an attempt to adapt which has a 'smalldata' view of the world, to a 'large data' world." Thus, it aims to meld relational database technology, which is designed for retrieving large sets of facts, with Prolog-based inference technology, which typically retrieves one fact at a time. Texts on deductive databases include Ullman (1989) and Ceri et (1990).

Influential work by Chandra and (1980) and Ullman (1985) led to the adoption of as a standard language for deductive databases. "Bottom-up" inference, or forward

chaining, also became the standard-partly because it avoids the problems with

Bibel's

AI

Forbus Kleer, 1970s,

Nevins's (McDermott, AI

OPS-5

TACAIR-SOAR

al. Harel

Datalog

1967),

(Sussman

AI

Rl

et

Herbrand

Cordell He-

(Charniak et

AI

Prolog,

nontermi-



nation and redundant computation that occur with backward chaining and partly because it has a more natural implementation in of the basic relational database operations. Thedevelopment of the magic sets technique for rule rewriting by al. (1986) al-lowed forward chaining to borrow the advantage of goal-directedness from backward chain-ing. Equalizing the race, tabled logic programming methods (see page 313) borrow the advantage of dynamic programming from forward chaining.

Much of our understanding of the complexity of logical inference has come from the deductive database community. Chandra and Merlin first showed that a

QUERY single nonrecursive rule (a conjunctive query in database terminology) can be NP-hard.Kuper and (1993) proposed data complexity-that is, complexity as a function ofdatabase size, viewing rule size as constant-as a suitable measure for query answering. Gottlob al. discuss the connection between conjunctive queries and constraint satisfaction, showing how hypertree decomposition can optimize the matching

As mentioned earlier, backward chaining for logical inference appeared in Hewitt's PLANNER language Logic programming per se evolved independently of this ef-

SL-RESOLUTION fort. A restricted form of linear resolution called SL-resolution was developed and Kuehner building on Loveland's model elimination technique (1968); when ap-

SLD-RESOLUTION plied to definite clauses, it becomes SLD-resolution,which lends itself to the interpretation of definite clauses as programs (Kowalski, 1974, Meanwhile, in 1972, the French researcher Alain Colmerauer had developed and implemented for the purpose of parsing natural language-Prolog7s clauses were intended initially as gram-mar rules (Roussel, 1975; Colmerauer et al., 1973). Much of the theoretical background for logic programming developed by with Colmerauer. The semanticdefinition using least fixed points is due to Van and (1976). (1988)and Cohen (1988) provide good historical overviews of the origins of Foundationsof Logic Programming (Lloyd, 1987) is a theoretical analysis of the underpinnings ofand other logic programming languages.

Efficient compilers are generally based on the Warren Abstract (WAM)model of computation developed by David H. D. Warren (1983). Van Roy (1990) showed that the application of additional compiler techniques, such as type inference, made programs competitive with C programs in terms of speed. The Japanese Fifth Generation project, a 10-year research effort beginning in 1982, was based completely on as the means to develop intelligent systems.

Methods for unnecessary looping in recursive logic programs were developed independently by Smith et al. (1986) and and Sato (1986). The latter paper also included memoization logic programs, a method developed extensively as logicprogramming by David S. Warren. Swift and Warren (1994) show how to extend WAMto handle tabling, programs to execute an order of magnitude faster than forward-chaining deductive database systems.

Early theoretical on constraint logic programming was done by Jaffar Lassez(1987). Jaffar et al. developed the system for handling con-straints. Jaffar et al. generalized the WAM to produce the CLAM (Constraint Logic Abstract Machine) for specifying implementations of CLP. Ait-Kaci and (1993)

terms Bancilhon et

arms

(1977) matching CONJUNCTIVE

DATA COMPLEXITY Vardi

et (1999b) process .

(1969). by Kowalski

(1971),

1979a, 1979b). Prolog

context-free

was Kowalski, working Emden Kowalski Kowalski

Prolog. Prolog

Prolog Machine

Prolog

Prolog

avoiding Tamaki

for tabled the

enabling Datalog

work and (1992a) CLP(R) real-valued

(1992b) Podelsk.i



describe a sophisticated language called LIFE, which combines CLP with functional pro-gramming and with inheritance reasoning. Kohn (1991) describes an ambitious project to use constraint logic programming as the foundation for a real-time control withapplications to fully automatic pilots.

There are a number of textbooks on logic programming and Logic for Problem Solving is an early text on logic programming in general. texts in-clude and Mellish Shoham and Bratko (2001). Marriott and Stuckey (1998) provide excellent coverage of CLP. Until its demise in 2000, the Journal of Logic Pro-gramming was the journal of record; it has now been replaced by Theory and Practice ofLogic Programming. Logic programming conferences include the International Conference on Logic Programming (ICLP) and the International Logic Programming Symposium (ILPS).

Research into mathematical theorem proving began even before the first completefirst-order systems were developed. Herbert Gelernter's Geometry Theorem Prover

1959) used heuristic search methods combined with diagrams for pruning false and was able to prove some quite intricate results in Euclidean geometry. Since that time, however, there has not been very much interaction between theorem proving and AI.

Early work concentrated on completeness. Following Robinson's seminal paper, the demodulation and paramodulation rules for equality reasoning were introduced by Wos et al.(1967) and Wos and Robinson respectively. These rules were also developed indepen-dently in the context of term rewriting systems and Bendix, 1970). The incorporation of equality reasoning into the unification algorithm is due to Gordon (1972); it wasalso a feature of QLISP (Sacerdoti et al., 1976). Jouannaud and Kirchner (1991) survey equa-tional unification from a term rewriting perspective. Efficient algorithms for standard unifi-cation were developed by Martelli and Montanari (1976) and Paterson and Wegman (1978).

In addition to equality reasoning, theorem provers have incorporated a variety ofpurpose decision procedures. Nelson and Oppen (1979) proposed an influential scheme forintegrating such procedures into a general reasoning system; other methods include Stickel's (1985) "theory resolution"and Manna and Waldinger's (1986) "special relations."

A number of control strategies have been proposed for resolution, beginning with the unit preference strategy (Wos et al., 1964). The set of support strategy was proposed by Wos et al. to provide a degree of goal-directedness in resolution. Linear resolution first appeared in (1970). Genesereth and Nilsson (1987, Chapter 5) provide a short butthorough analysis of a wide variety of control strategies.

Guard et al. (1969) describe the early SAM theorem prover, which helped to solve anopen problem in lattice theory. Wos and Winker (1983) give an overview of the contributions of the AURA theorem prover toward solving open problems in various areas of mathematicsand logic. (1992) follows up on this, recounting the accomplishments of AURA'Ssuccessor, OTTER, in solving open problems. Weidenbach (2001) describes SPASS, one ofthe strongest current theorem provers. A Computational Logic (Boyer and Moore, 1979) is the basic reference on the Boyer-Moore theorem prover. (1988) covers the Technology Theorem Prover (PTTP), which combines the advantages of compilationwith the completeness of model elimination (Loveland, 1968). SETHEO (Letz et al., 1992)is another widely used theorem prover based on this approach; it can perform several million

ter,

(Kowalski, 1979b) Clocksin (1994),

(1965), Loveland

McCune

(1968),

(1994),

(Knuth

Stickel

architecture,

Prolog. Prolog

Plotkin

Prolog

(Gelernsubgoals

special-

Prolog



inferences per second 2000-model workstations. (Beckert and Posegga, 1995) is an efficient theorem prover implemented in only 25 lines of

Early work in automated program synthesis was done by Simon Greenand Manna and Waldinger (1971). The transformational system of andton (1977) used equational reasoning for recursive program synthesis. KIDS (Smith, 1990, 1996) is one of the strortgest modem systems; it operates as an expert assistant. andWaldinger (1992) give a tutorial introduction to the current state of the art, with emphasis on their own deductive approach. Automating Software Design and 1991)collects a number of papers in the area. The use of logic in hardware design is surveyed by Kern and Greenstreet (1999); Clarke et al. (1999) cover model checking for hardware verification.

Computability and Logic (Boolos and Jeffrey, 1989) is a good reference on complete-ness and undecidability. Many early papers in mathematical logic are to be in FromFrege to A Source Book in Mathematical Logic (van Heijenoort, 1967). journalof record for the field of pure mathematical logic (as opposed to automated deduction) isThe Journal of Symbolic Logic. Textbooks geared toward automated deduction theclassic Symbolic Logic and Mechanical Theorem Proving (Chang and Lee, as well as more recent works by et al. and et al. The an-thology Automationof Reasoning (Siekmann and Wrightson, 1983) includes many importantearly papers on automated deduction. Other historical surveys been written by(1984) and Bundy (1999). The principal journal for the field of theorem proving is the Jour-nal of Automated Reasoning; the main conference is the annual Conference on AutomatedDeduction (CADE). Research in theorem proving is also strongly related to the use of logic inanalyzing programs and programming languages, for which the principal conference is Logicin Computer Science.

9.1 Prove from first that Universal Instantiation is sound and that Existential Instantiation produces an inferentially equivalent knowledge base.

9.2 From (Jerry, it seems reasonable to infer 3x Likes( x ,IceCream.).EXISTENTIAL Write down a general inference rule, Existential Introduction, that sanctions this inference.

State carefully the conditions that must be satisfied by the variables and terms

'9.3 Suppose a knowledge base contains just one sentence, 3 x (x,Everest).'Which of the following are legitimate results of applying Existential Instantiation?

a. Everest).b. AsHighAs Everest).c. AsHighAs A AsHighAs

(after two

EXERCISES

INTRODUCTION

on LEANTAP

Godel:

Wos (1992), Bibel (1993),

principles

Likes IceCream)

AsHighAs(Everest,

(Kilimanjaro,

(Kilimanjaro, Everest) applications).

Prolog.

(Lowry

(1963), Burstall

(1969a), Darling-

Manna

McCartney,

found The

Kaufmann

include 1973), (2000).

have Loveland

involved.

AsHighAs

(BenNevis, Everest)



9.4 For each pair of atomic sentences, give the most general unifier if it exists:

a.

c. y), John).d. y), x).

9.5 Consider the subsumption lattices shown in Figure9.2.

a. Construct the lattice for the sentenceb. Construct the lattice for the sentence y) ("Everyone works for IBM) .

Remember to include every of query that unifies with the sentence. c. Assume that STORE indexes each sentence under every node in its subsumption lattice.

Explain how FETCH should work when some of these sentences contain variables; use as examples the sentences in (a) and (b) and the query

9.6 Suppose we put into a logical database a segment of the U.S. census data listing the age,city of residence, date of birth, and mother of every person, using social security numbers asidentifying constants for each person. Thus, George's age is given by Age(443-65-Which of the indexing schemes S following enable an efficient solution for which of thequeries (assuming normal backward chaining)?

an index for each atom in each position. S2: an index for each first argument.S3: an index for each predicate atom.

an index for each combinationof predicate and first argument. an index for each combination of predicate and second argument and an index for

each first argument (nonstandard).

Ql: Age(443-44-4321,x)Q2: Houston)Q3: y)Q4: 34) A TownUSA)

9.7 One might suppose that we can avoid the problem of variable conflict in unificationduring backward chaining by standardizing apart all of the sentences in the knowledge base once and for all. Show that, for some sentences, this approach cannot work. (Hint:Considera sentence, one part of which unifies with another.)

9.8 Explain how to write any given 3-SAT problem of arbitrary size using a single first-orderdefinite clause and no more than 30 ground facts.

9.9 Write down logical representations for the following sentences, suitable for use with Generalized Ponens:

P(A, B , B), P(x , y, z). b. Q(y, G(A, B)), Q(G(x, x), y).

Older( Father(y), Older(Father( x) ,

Knows(Father(y), Knows(x,

Ql-Q4

• S1:

• • • S4:

• S5:

• • Residesin(x,

• Mother(x,

kind

1-S5

Employs(Mother( John), Father(Richard)).

Employs (IBM,

Employs(x , Father(x)).

1282, 56).

• Age(x, Residesln(x, Tiny

Modus



a. Horses, cows, and pigs are mammals. b. An offspring of a horse is a horse. c. Bluebeard is a horse. d. Bluebeard is Charlie's parent. e. Offspring and parent are inverse relations. f. Every mammal has a parent.

9.10 In this question we will use the sentences you wrote in Exercise 9.9 to answer a ques-tion using a backward-chaining algorithm.

a. Draw the proof tree generated by an exhaustive backward-chaining algorithm for thequery 3 h Horse ,where clauses are matched in the order given.

b. What do you notice about this domain? c. How many solutions for h actually follow from your sentences? d. Can you think of a way to find all of them? (Hint: You might want to consult Smith

et al.

9.11 A popular riddle is "Brothers and sisters I none, but that man's father is my father's son." Use the rules of the family domain (Chapter 8) to show who man is.You may apply any of the:inference methods described in this chapter. Why do you think that this riddle is difficult?

9.12 Trace the execution of the backward chaining algorithm in Figure 9.6 when it is appliedto solve the crime problem. Show the sequence of values taken on by the goals variable, and arrange them into a tree.

9.13 The following code defines a predicate

a.b. What standard list operation does represent?

9.14 In this exercise, we will look at sorting in

a. Write clauses that define the predicate sorted , which is true if only iflist is sorted in ascending order.

b. Write a definition for the predicate perm , , which is true if and ifis a permutation of

c.Define sort is a sorted version of using perm and sorted.d. Runsort on longer and longer lists until you lose patience. What is the time complex-

ity of your program? e. Write a faster sorting algorithm, such as insertion sort or quicksort, in

(h)

(1986).)

children's

Prolog

P(X, (X I Y ]) .

P(X , (Y j Z]) : - P (X,Z) .

have that

P:

Show proof trees and solutions for the queries P (A , [ 1, 2 , 3] ) and P ( 2, [ 1 , A, 3] ) .

Prolog L

Prolog M.

(L,M) (M

p

Prolog.

(L) and

(L M) only L

L)

Prolog.



9.15 In this exercise, we will look at the recursive application of rewrite rules, using logic programming. A rewrite rule (or demodulator in OTTER terminology) is an equation with aspecified direction. For example, the rewrite rule x suggests replacing any expressionthat matches x with the expression x. The application of rewrite rules is a central part of equational reasoning systems. We will use the predicate rewrite to representrewrite rules. For example, the earlier rewrite rule is written as rewrite , . Someterms are primitive and cannot be further simplified; thus, we will write primitive tosay that is a primitive term.

a. Write a definition of a predicate simplify , that is true when is a simplified version of is, when no further rewrite rules are applicable to any subexpressionof Y.

b. Write a collection of rules for the simplification of expressions involving arithmetic operators, and apply your simplification algorithm to some sample expressions.

c. Write a collection of rewrite rules for symbolic differentiation, and use them along with your simplification rules to differentiate and simplify expressions involving arithmetic expressions, including exponentiation.

9.16 In this exercise, we will consider the implementation of search algorithms in Suppose that successor Y ) is true when state is a successor of state and that goal is true when is a goal state. Write a definition for solve , , which means that is a path (list of states) beginning with ending in a goal state, and consisting of asequence of legal steps as defined by successor. You will find that depth-first search isthe easiest way to do this. How easy would it be to add heuristic search control?

9.17 How can resolution be used to show that a sentence is valid? Unsatisfiable?

From "Horses are animals," it follows that "The head of a horse is the head of ananimal." Demonstrate that this inference is valid by carrying out the following steps:

a. Translate the premise and the conclusion into the language of first-order logic. Use three predicates: (h, x ) (meaning "h is the head of x"), and Animal(x) .

b. Negate the conclusion, and convert the premise and the negated conclusion into con-junctive normal form.

c. Use resolution to show that the conclusion follows from the premise.

9.19 Here are two sentences in the language of first-order logic:

a. Assume that the variables range over all the natural numbers . . . , and that the predicate means "is greater than or equal to." Under this interpretation, translate

(A) and (B) into English.b. Is (A) true under this interpretation?

9.18

+o

0

X-that

(X) X p

HeadOf

(A) : Vx :ly (x 2 y) (B): 3y Vx (x2y)

">"

(X,

X,

x+o-

(X,Y)

y

(X, Y)

(X+O X)

(0)

y

Prolog. X;

(X P)

Horse(x),

0,1,2, 00



c. Is (B) true under this interpretation? d. Does (A) logically entail (B)?e. Does (B) logically entail (A)?f. Using resolution, try to prove that (A) follows from (B). Do this even if think that

(B) does not logically entail (A); continue until the proof breaks down and cannotproceed (if it does break down). Show the unifying substitution for each resolution step.

the proof fails, explain exactly where, how, and why it breaks down. g. Now try to prove that (B) follows from (A).

9.20 Resolution can produce nonconstructive proofs for queries with variables, so we had to introduce special mechanisms to extract definite answers. Explain why this issue does not arise with knowledge bases containing only definite clauses.

9.21 We said in this chapter that resolution cannot be used to generate all conse-quences of a set of sentences. Can any algorithm do this?

If

you you

logical


A ModernStuart Russell Peter

SECOND EDITION

The edition of A has become a classic in the literature. It hasbeen adopted by over 600 universitiesin 60 countries, and praised as the definitivesynthesis thefield. Here's what people had to say:

"The publication of this textbook was a step forward, not only for the teachingview of that this book introduces. Even for experts in thefield, there are important insights in almostevery chapter" Dietterich (Oregon State)"Just terrific. The book I've waiting for. bible for next decade."-Prof. Brewka (Vienna)"A marvelous achievement, a book!" -Prof. (RPI)

a with incredible breadth very well-written. Everyone I know has used itin their class has -Prof.

, .am deeply by its unprecedented in presenting a coherent, balanced, and deep,

enjoyable picture of the field of It become the standard text for the years come."-Prof. Wolfgang"Terrific! Well-written and well-organized,with comprehensive coverage of the material that every studentshould know." -Prof. Martha (Michigan)"Outstanding... descriptions are extremely clear and readable; its organization is excellent: its aremotivating: and its coverage is scholarly and thorough!...will deservedlydominate for some time."-Prof. Nils (Stanford)"The best book available now...It's almost as good as the book Charniak and I but more up to date.(Okay, I'll admit it, it may even be better than our book.)"-Prof. Drew"A magisterial wide scope the of Artificial Intelligence that willas well as students." Alan Kay"This is the that made me love -Student (Indonesia)

In the second edition, every chapter has been extensively rewritten. new bas beenintroduced to cover areas such as satisfaction, fast propositional inference, planning graphs,

agents, exact inference, Monte techniques,ensemble learning methods, statistical learning, probabilistic language.models, probabilisticrobotics, and of AI.The book is supported by a of online code, figures, slides, a

of over 800 links the and an online discussionsima.cs.berkeley.edu

. . 1HALL

Upper Saddle 07458www.prenhall.com

"It's

internet

directory

Artificial Intelligence

Artificial 111telligence: Modern Approach bas~

major the field

-Prof. Thomas

always been .. the ,\I

Approach

• Norvig

of Al. but for 1'1,· 1111i/i<'d

truly_ beautiful Selmer Bringsjord

great,book,

loved it."

impressed

"'"' ,1,,,,1h. lllld Haym Hirsh-(Rutgers)

quality will

Bibel (Darmstadt)

Pollack

Nilsson

McDermott (Yale)

-Dr. qccow1t of entire field

Al."

Significant constraint

probabilistic Markov Chain Carlo natural

who

broad

examples the field

wrote,

enlighren professors

material

Kalman tilters,

ethical aspects

suite resources including sourc~ lecture to .. Al on Web," group. ,\ II ot'this b mailahk at:

~11~~!~~~111111111111111 Pl'arson Education

PRENTICE River.NJ

Artlflctal lntelll Approach (2nd l~f 8:l Modem

Russell Norvig

SECOND EDITION

Prentice' I !all


ptabdata.blob.core.windows.net · 2020. 9. 11. · prenticehall series in artificial intelligence...

Documents