artificial intelligence and expert systmmycsvtunotes.weebly.com/uploads/1/0/1/7/10174835/... ·...

2010

Shikha Sharma

RCET,Bhilai

3/11/2010

ARTIFICIAL INTELLIGENCE AND EXPERT SYSTM

Shikha Sharma RCET Bhilai

What is Artificial intelligence? • It is the science and engineering of making intelligent machines, especially

intelligent computer programs. It is related to the similar task of using computers to understand human intelligence.

• ―Intelligence implies that a machine must be able to adapt to new situations‖ – Ability to learn

– Ability to think abstractly – To solve problems – To percieve relationship

– To adjust to one’s environment – To profit by experience

• Woodworth intelligence is a way of acting. • Woadrow intelligence is an acquiring capacity • Binet comprehension, invention , direction and criticism– intelligence

contained in these four words. • Ryburn intelligence is the power which enables us to solve problems and to

achieve our purpose. Intelligence is not a single power or capacity or abilitiy which operates equally well in all situations.

It is rather than composite of several different abilities.

What is the objective of “AI”

One term is • ―the ability to reason, to trigger new thoughts, to perceive and learn is

intelligence‖.

Second term is

―thought‖

A thought is a mechanism which

1. Stimulates

a. action

b. further thought

c.information generation

d. knowledge generation

2. Is triggered by

a. External stimulus or

b. internal stimulus

3. Acts through

a. Present environment

b. past memory

4. Is stored as


a. charged /discharged state of neurons.

b. electromagnetic thought waves

Definition of AI

• ―John McCarthy ― gives in 1956 ―Developing computer programs to solve complex problems by applications of processes that are analogous to human

reasoning processes. • ―Ai is the branch of computer science that is concerned with the automation of

intelligent behavior.‖ • AI is the study of how to make computers do things which, at the moment, people

do better.

• the intelligent is behavior , when we call this man Intelligent, we mean by that (he

have the ability to Think, understand, learn and make decision) so if we a combine this word with system to become (Intelligent System(IS))we mean by that , the system able to (Think, understand, learn and make decision) in other

word. • It is the science and engineering of making intelligent machines, especially

intelligent computer programs. It is related to the similar task of using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable.

Que. Explain areas of AI.

Ans. Areas of Artificial Intelligence

Perception o Machine vision o Speech understanding o Touch ( tactile or haptic) sensation

Robotics Natural Language Processing

o Natural Language Understanding o Speech Understanding o Language Generation o Machine Translation

Planning Expert Systems Machine Learning Theorem Proving Symbolic Mathematics Game Playing


Perception

Machine Vision:

It is easy to interface a TV camera to a computer and get an image into memory; the problem is understanding what the image represents. Vision takes lots of computation; in humans, roughly 10% of all calories consumed are burned in vision computation.

Speech Understanding: Speech understanding is available now. Some systems must be trained for the individual user and require pauses between words. Understanding continuous speech with a larger

vocabulary is harder.

Touch ( tactile or haptic) Sensation: Important for robot assembly tasks.

Robotics

Although industrial robots have been expensive, robot hardware can be cheap: Radio

Shack has sold a working robot arm and hand for $15. The limiting factor in application of robotics is not the cost of the robot hardware itself.

What is needed is perception and intelligence to tell the robot what to do; ``blind'' robots

are limited to very well-structured tasks (like spray painting car bodies).

Natural Language Understanding:

Natural languages are human languages such as English. Making computers understand English allows non-programmers to use them with little training. Applications in limited

areas (such as access to data bases) are easy.

(askr '(where can i get ice cream in berkeley))

Natural Language Generation:

Easier than NL understanding. Can be an inexpensive output device.

Machine Translation:

Usable translation of text is available now. Important for organizations that operate in many countries.

In a not too far future develops for eleven-year old David in a research lab the first intelligent

robot with human feelings in the shape. But its "foster parents" are overtaxed with the artificial


spare child and suspend it. Posed on itself alone David tries to fathom its origin and the secret of

its existence.

Planning

Planning attempts to order actions to achieve goals.Planning applications include

logistics, manufacturing scheduling, planning manufacturing steps to construct a desired product. There are huge amounts of money to be saved through better planning.

Expert Systems

Expert Systems attempt to capture the knowledge of a human expert and make it

available through a computer program. There have been many successful and

economically valuable applications of expert systems.

Benefits:

Reducing skill level needed to operate complex devices. Diagnostic advice for device repair. Interpretation of complex data. ``Cloning'' of scarce expertise. Capturing knowledge of expert who is about to retire. Combining knowledge of multiple experts. Intelligent training.

Theorem Proving

Proving mathematical theorems might seem to be mainly of academic interest. However, many practical problems can be cast in terms of theorems. A general theorem prover can

therefore be widely applicable.

Examples:

Automatic construction of compiler code generators from a description of a CPU's instruction set.

J Moore and colleagues proved correctness of the floating-point division algorithm on AMD CPU chip.

Symbolic Mathematics

Symbolic mathematics refers to manipulation of formulas, rather than arithmetic on numeric values.


Algebra Differential and Integral Calculus

Symbolic manipulation is often used in conjunction with ordinary sc ientific computation

as a generator of programs used to actually do the calculations. Symbolic manipulation programs are an important component of scientific and engineering workstations.

> (solvefor

'(= v (* v0 (- 1 (exp (- (/ t (* r c)))))))

't)

(= T (* (- (LOG (- 1 (/ V V0)))) (* R C)))

Game Playing

Games are good vehicles for research because they are well formalized, small, and self-contained. They are therefore easily programmed. Games can be good models of

competitive situations, so principles discovered in game-playing programs may be applicable to practical problems.

AI Tree

Fruits: Applications

Branches: Expert Systems, Natural Language processing, Speech Understanding,

Robotics and Sensory Systems, Computer Vision, Neural Computing, Fuzzy Logic, GA

Roots: Psychology, Philosophy, Electrical Engg, Management Science, Computer science, Linguistics


Difference between AI & conventional S/W

Features AI programs Conventional

s/w

Processing type Symbolic type Numeric

Technique used Heuristic search Algorithm search

Solutions steps Indefinite definite

Answers sought Satisfactory Optimal

Knowledge Imprecise Precise

Modification Frequent Rare

Involves Large knowledge Large DB

Process Inferential repetitive

How problems can be represented in AI

Before a solution can be found the prime condition is that the problem must be very

precisely defined. So to build a system to solve a particular problem, we need to do four things.

1. Define the problem precisely. like what is initial situation, what will be the final, acceptable solutions.

2. Analyze the problem. various possible techniques for solving the problem. 3. Isolate and represent the task knowledge that is necessary to solve the problem. 4. Choose the best problem solving technique and apply it

The most common methods of problem representation in AI

State space representation

―A set of all possible states for a given problem is known as the state space of the

problem.‖or

―A state space represents a problem in terms of states and operators that change states.‖

A problem space consists of 1. Precondition/An initial state 2. Post condition/Final states 3. Actions 4. Total Cost


Water jug problem?

• States– amount of water in both jugs. • Actions—Empty large/small, pour from large/small • Goal—specified amount of water in both jug

• Path cost—total no of actions applied

State Space Search: Playing Chess

• State space is a set of legal positions.

• Starting at the initial state.


• Using the set of rules to move from one state to another. • Attempting to end up in a goal state.

State Space Search: Water Jug Problem

―You are given two jugs, a 4- litre one and a 3- litre one. Neither has any measuring

markers on it. There is a pump that can be used to fill the jugs with water. How can

you get exactly 2 litres of water into 4-litre jug.‖

• State: (x, y)

x = 0, 1, 2, 3, or 4 y = 0, 1, 2, 3

• Start state: (0, 0).

• Goal state: (2, n) for any n.

• Attempting to end up in a goal state.

1. current state = (0, 0) 2. Loop until reaching the goal state (2, 0) Apply a rule whose left side matches the current state

Set the new current state to be the resulting state

(0, 0) (0, 3)

(3, 0) (3, 3) (4, 2)

(0, 2) (2, 0) Find a driving route from city A to city B

• States– location specified by city . • Actions– driving along the roads between cities

• Goal— city B • Path cost—total distance or expected travel time.

Explain State space search. Solve Tic-Tac-Toe using state space search.

Ans. A state space represents a problem in terms of states and operators that change

states. A state space consists of:

A representation of the states the system can be in. In a board game, for example, the board represents the current state of the game.

A set of operators that can change one state into another state. In a board game, the operators are the legal moves from any given state. Often the operators are represented as programs that change a state representation to represent the new state.

An initial state.


A set of final states; some of these may be desirable, others undesirable. This set is often represented implicitly by a program that detects terminal states.

Tic-Tac-Toe as a State Space

State spaces are good representations for board games such as Tic-Tac-Toe. The state of

a game can be described by the contents of the board and the player whose turn is next.

The board can be represented as an array of 9 cells, each of which may contain an X or O

or be empty.

State: o Player to move next: X or O. o Board configuration:

X O

O

X X

Operators: Change an empty cell to X or O. Start State: Board empty; X's turn. Terminal States:

Three X's in a row; Three O's in a row; All cells full.

Search Tree

The sequence of states formed by possible moves is called a search tree. Each level of

the tree is called a ply .


Since the same state may be reachable by different sequences of moves, the state space may in

general be a graph. It may be treated as a tree for simplicity, at the cost of duplicating states.

production system

A production system (or production rule system) is a computer program typically used

to provide some form of artificial intelligence, which consists primarily of a set of rules about behavior. These rules, termed productions, are a basic representation found useful in automated planning, expert systems and action selection. A production system

provides the mechanism necessary to execute productions in order to achieve some goal for the system.

A production system consists of four basic components:

1. A set of rules of the form Ci ® Ai or

C1, C2, … Cn => A1 A2 …Am

http://en.wikipedia.org/wiki/Artificial_intelligence

http://en.wikipedia.org/wiki/Knowledge_representation

http://en.wikipedia.org/wiki/Automated_planning_and_scheduling

http://en.wikipedia.org/wiki/Expert_systems

http://en.wikipedia.org/wiki/Action_selection


Left hand side (LHS) Right hand side (RHS)

Conditions/antecedents Conclusion/consequence

where Ci is the condition part and Ai is the action part.

1. The condition determines when a given rule is applied, and the action determines what

happens when it is applied.

2. knowledge databases/ working memory that contain whatever information is relevant

for the given problem & also maintains data about current state or knowledge.

Some parts of the database may be permanent, while others may temporary and only exist during the solution of the current problem. The information in the databases may be structured in any appropriate manner.

3. A control strategy that determines the order in which the rules are applied to the

database, and provides a way of resolving any conflicts that can arise when several rules match at once.

4. A rule applier which is the computational system that implements the control strategy

and applies the rules.

Productions consist of two parts: a sensory precondition (or "IF" statement) and an

action (or "THEN"). If a production's precondition matches the current state of the world, then the

production is said to be triggered. If a production's action is executed, it is said to have fired. A production system also contains a database, sometimes called working memory, which maintains data about current state or knowledge, and a rule

interpreter. The rule interpreter must provide a mechanism for prioritizing productions when more than one is triggered. A production system is a tool used in

artificial intelligence and especially within the applied AI domain known as expert systems. Production systems consist of a database of rules, a working memory, a matcher, and a procedure that resolves conflicts between rules. PS is a computer

program typically used to provide some form of AI, which consists a set of rules about behavior. A PS provides the mechanism necessary to execute productions in

order to achieve some goal for the system. it is used as the basis for many rule-based expert systems

Production rule for water jug problem 1. (x, y) (4, y), If x < 4 fill the 4-gallon jug. 2. (x, y) (x,3), If y < 3 fill the 3-gallon jug.

3. (x, y) (x- d , y), If x > 0 pour some water out of the 4-gallon jug

http://en.wikipedia.org/wiki/State_(computer_science)

http://en.wikipedia.org/wiki/Execution_(computers)

http://en.wikipedia.org/wiki/Working_memory



http://ai.eecs.umich.edu/cogarch0/common/theory/ai.html

http://ai.eecs.umich.edu/cogarch0/common/capa/explain.html




4. (x, y) (x, y - d), If y > 0 pour some water out of the 4-gallon jug 5. (x, y) (0, y) If x > 0 empty the 4-gallon jug.

6. (x, y) (x, 0), If y > 0 empty the 3-gallon jug. 7 (x, y) (4, y – (4 – x) ), if x + y >= 4 & y > 0 pour water from the

3-gallon jug into the 4-gallon jug until the 4-gallon jug is full.

8. (x, y) (x – (3 – y), 3 ), if x + y >= 4 & y > 0 pour water from the

4-gallon jug into the 3-gallon jug until the 3-gallon jug is full.

9. (x, y) (x + y, 0 ), if x + y <= 4 & y > 0 pour all the

water from the 3-gallon jug nto the 4-gallon jug.

10. (x, y) (0, x + y), if x + y <= 3 & x > 0 pour all the water from the 4-gallon jug into the 3-gallon jug.

11. (0, 2) (2, 0), pour 2-g from 3-g to 4-g

12. (2, y) (0, y)

One solution of water jug problem

Rule applied 4-Gallon 3-Gallon

Initial state 0 0

Rule 2 0 3

Rule 9 3 0

Rule 2 3 3

Rule 7 4 2

Rule 5 or 12 0 2

Rule 9 or 11 2 0

Problem of Conflict Resolution

• When there are more then one rule that can be fired in a situation and the rule interpreter can not be decide which is to be fired, what is the order of triggering

and whether to apply it .

Some Resolution Strategies


• Perform the first. the system chooses the first rule that matches. • Sequencing techniques. adopt the rules in the sequence they are.

• Perform the most specific. if there are two matching rules and one rule is more specific than the other, activate the most specific.

• Most recent policy. chooses newly added rule.

sseeaarrcchh • Search process of locating a solution to a problem by any method in a search

tree or search space until a goal node is found. • Search Space A set of possible permutation that can be examined by any

search method in order to find solution.

• Search Tree A tree that is used to represent a search problem and is examined by search method to search for a solution.

To do a search process the following are needed :--

The initial state description.


A set of legal operators.

The final or goal state.

Search Tree – Terminology

• Root Node: The node from which the search starts.

• Leaf Node: A node in the search tree having no children. • Ancestor/Descendant: X is an ancestor of Y is either X is Y’s parent or X is an

ancestor of the parent of Y. If S is an ancestor of Y, Y is said to be a descendant of X.

• Branching factor: the maximum number of children of a non- leaf node in the

search tree • Path: A path in the search tree is a complete path if it begins with the start node

and ends with a goal node. Otherwise it is a partial path. • We also need to introduce some data structures that will be used in the search

algorithms.

Evaluating Search strategies

• We will look at various search strategies and evaluate their problem solving

performance. What are the characteristics of the different search algorithms and what is their efficiency? We will look at the following three factors to measure this.

Completeness: We will say a search method is ―complete‖ if it has both the following properties:

– if a goal exists then the search will always find it

– if no goal exists then the search will eventually finish and be able to say that no goal exists

Time complexity: how long does it take?( number of nodes expanded)

Space complexity: how much memory is needed?

Optimality: is a high-quality solution found? Does the solution have low cost or the minimal cost? What is the search cost associated

with the time and memory required to find a solution?

Which path to find? The objective of a search problem is to find a path from the initial state to a goal state.

If there are several paths which path should be chosen? Our objective could be to find any path, or we may need to find the shortest path or least cost path.

The different search strategies that we will consider include the following:

1. Blind Search strategies or Uninformed search a. Depth first search

b. Breadth first search c. Iterative deepening search


d. Iterative broadening search 2. Informed Search

3. Constraint Satisfaction Search 4. Adversary Search

• Uninformed or blind or Brute force search – No information about the number of steps

– No information about the path cost – blind search or uninformed search that does not use any extra information

about the problem domain. • Informed or heuristic search

– Information about possible path costs or number of steps is used

Uninformed Search

Breadth-first search • Root node is expanded first • All nodes at depth d in the search tree are expanded before the nodes at depth d+1 • Implemented by putting all the newly generated nodes at the end of the queue

Algorithm of BFS

Step 1: put the initial node on a list S.

Step 2 : if ( S is empty) or (S = goal) terminate search.

Step 3 : remove the first node from S. call this

node a.

Step 4 : if (a = goal) terminate search with success.

Step 5 :Else if node a has successor, generate all

of them and add them at the tail of S.

Step 6 : go to to step 2.

Breadth-first search merits

Complete: If there is a solution, it will be found Optimal: Finds the nearest goal state

Breadth-first search problem: – Time complexity

– Memory intensive – Remembers all unwanted nodes

Breadth first search is: • Complete. : The algorithm is optimal (i.e., admissible) if all operators have the

same cost. Otherwise, breadth first search finds a solution with the shortest path length.


• The algorithm has exponential time and space complexity. Then the time and space complexity of the algorithm is O(bd) where d is the depth of the solution

and b is the branching factor (i.e., number of children) at each node. • A complete search tree of depth d where each non- leaf node has b children, has a

total of 1 + b + b2

+ ... + bd

= (b(d+1)

- 1)/(b-1) nodes • Consider a complete search tree of depth 15, where every node at depths 0 to14

has 10 children and every node at depth 15 is a leaf node. The complete search

tree in this case will have O(1015

) nodes. If BFS expands 10000 nodes per second

and each node uses 100 bytes of storage, then BFS will take 3500 years to run in the worst case, and it will use 11100 terabytes of memory. So you can see that the breadth first search algorithm cannot be effectively used unless the search space is

quite small. You may also observe that even if you have all the time at your disposal, the search algorithm cannot run because it will run out of memory very

soon.

Depth-first search • Always expands one of the node at the deepest level of the tree • Only returns when the search hits a dead end • Implemented by putting the newly generated nodes at the front of the queue

Algorithm of DFS

Step 1: put the initial node on a list S.

Step 2 : if ( S is empty) or (S = goal) terminate search.

Step 3 : remove the first node from S. call this

node a.

Step 4 : if (a = goal) terminate search with success.

Step 5 :Else if node a has successor, generate all

of them and add them at the beginning of S.

Step 6 : go to to step 2.

Time Complexity :

1 + b + b

2

+ b

3

+…+……b

d.

Hence Time complexity = O (

b

d)

Space Complexity :

space complexity = O (d)


Depth-first search merits

Modest memory requirements:- only the current path from the root to the leaf node needs to be stored.

Time complexity : - With many solutions, depth-first search is often faster than breadth-first search,

but the worst case is still O (bm

)

Properties of Depth First Search

Let us now examine some properties of the DFS algorithm. The algorithm takes exponential time. If N is the maximum depth of a node in the search space, in the

worst case the algorithm will take time O(bd

). However the space taken is linear in the

depth of the search tree, O(bN). Note that the time taken by the algorithm is related to the maximum depth of the

search tree. If the search tree has infinite depth, the algorithm may not terminate. This can happen if the search space is infinite. It can also happen if the search space

contains cycles. The latter case can be handled by checking for cycles in the algorithm. Thus Depth First Search is not complete.

CCoonnssttrraaiinntt SSaattiissffaaccttiioonn

• A constraint problem is a task where you have to

– Arrange objects – Schedule tasks

– Assign values – … – subject to a number of constraints


Example of constraint problems

S E N DM O R E

M O N E Y

+

Each letter stands for a different digit. Assign digits to the letters so that the sum is

correct. A constraint problem consists of

– A set of variables x1, x2,… xn

– For each variable xi a finite set Di of its possible values (its domain)

– A set of constraints restricting the values that the variables can take

– Goal: find an assignment of values to the variables which satisfies all

the constraints

Cryptarithmetic problems:

Constraint: when the values are assigned, the sum must add up correctly.

Some easy examples

• AS + A = MOM • I + DID = TOO

• A + FAT = ASS • SO + SO = TOO

• US + AS = ALL • ED + DI = DID • DI + IS = ILL

6

Another example

http://www.geocities.com/Athens/Agora/2160/puzzle35.html








The 8 Queens puzzle.Place 8 queens on a chessboard so that no two queens are attacking one another.

Constraints: no two queens must be on the same row, the same column, or the same diagonal.

A more practical example

• Timetabling/scheduling

– Assign classes to rooms so that

• Students aren’t required to be in two different rooms at the same

time

• Similarly for lecturers

• Two classes aren’t booked into the same room at the same time

• Rooms are sufficiently large to hold classes assigned to them

• Labs have enough computers for the classes assigned to them

summary

• Constraint problem-solving can be applied to a wide variety of real-world

problems

• Formally, a constraint problem consists of

– A set of variables and their domains

– A set of constraints

• The goal

– Find a valid set of values

– Find all sets of values


– Find the best set of values

• The method

– Combine search and constraint propagation

AI – Game Playing

Why has game playing been a focus of AI?

games have well-defined rules, which can be implemented in programs interfaces required are usually simple

Many human expert exist to assist in the developing of the programs. Games provide a structured task wherein success or failure can be measured with

least effort.

Classification of Games

1. Single Person playing

2. Two player or Multi person playing

Formal Description of Game :

Initial State:- from where game start.

Successor function:- for each state, list of legal moves and consequent states

Terminal State:-test to determine if a state is a terminal state- the end of the game


Utility function :- computes a single numeric value

● Games are represented by game trees in which

● Each node represents a position

● Each link represents a legal move

● Leaf nodes are final positions

The aim is to reach the goal node from the root node.

Components of Game Playing

Plausible move generator :- it generates the necessary states for further expansion.

Static Evaluation function generator:-Ranks each of the position

The basic methods Minimax Strategy

Minimax Strategy with alfa-beta cutoffs

Minimax Strategy ―brute force‖; not recognizing abstract patterns an optimal strategy

1st

player ―MAX‖ tries to maximize the utility fn

2nd

player ―MIN‖ tries to minimize the utility fn assumes the opponent always makes the best possible move

• not always assumed by a human player under such conditions gives best possible outcome—maximizes the worst-case

outcome

when a leaf node is evaluated, a large value is good for player ―MAX‖; a small value is good for player ―MIN‖

which player is making the move alternates between adjacent levels

(level 0 MAX, level 1 MIN, level 2 MAX, etc.)

Minimax Algorithm

minimaxValue(n) = • utility(n)

if n is a terminal state • max minimaxValue(s) of all successors, s

if n is a MAX node • min minimaxValue(s) of all successors, s

if n is a MIN node

Alpha-Beta Pruning


improves minimax algorithm by pruning needless evaluations computes same result without searching the entire tree don’t explore a move which is inferior to a known alternative if cannot search to terminal state, use a heuristic to approximate the eventual

terminal state

Alpha-Beta Pruning Alpha: -minimal score that player MAX is guaranteed to attain

(best known so far, but possibility of improvement. Minimum attainable) Beta: - best score that player MIN can attain so far

(lowest score known so far, lower score may yet be found. Maximum attainable)

Knowledge representation Knowledge representation is a study of ways of how knowledge is actua lly

picturized and how effectively it resembles the representation of knowledge in human brain.

Some widely known representation schemes:

Semantic nets

Frames Scripts Conceptual dependency

Semantic networks A semantic network is a structure for representing knowledge as a pattern of

interconnected nodes and arc. It is a graphical representation of knowledge. Nodes in the semantic net represent either

-- Entities

-- attributes

-- states or

-- Events


Arcs in the net gives the relationship between the nodes and labels on the arc specify the type of relationship.

“A sparrow is a bird”

• Two concepts: ―sparrow‖ and ―bird‖ • A sparrow is a kind of bird, so connect the two concepts with a IS-A

relation

Spar

Bird

BirdSparIS-A

―A bird has feathers‖

• This is a different relation: the part-whole relation • Represented by a HAS-A link or PART-OF link • The link is from whole to part, so the direction is the opposite of the IS-A

link

BirdSparIS-A

HAS-A

Draw semantic for the followings:-

• Tweety and Sweety are birds • Tweety has a red beak

• Sweety is Tweety’s child • A crow is a bird

• Birds can fly Semantic networks

Feather


Semantic networks Semantic networks can answer queries

• Query: ―Which birds have red beaks?‖ • Answer: Tweety • Method: Direct match of subgraph • Query: ―Can Tweety fly?‖ • Answer: Yes • Method: Following the IS-A link from ―Tweety‖ to ―bird‖ and the

property link of ―bird‖ to ―fly‖ • This process is called inheritance

Convert following into Semantic Net 1. Motor-bike is a two wheeler.

2. Two wheeler is a moving vehicle. 3. Moving-vehicle has a brake.

4. Moving vehicle has a engine. 5. Moving vehicle has electrical system. 6. Moving vehicle has fuel- system.

feather


Hierarchical Structure

vehicle

Land-vehicle Water-vehicle Air-vehicle

Road rail river sea aircraft space

Is_a Is_a Is_a

Is_a Is_a


Represent following information in SN (is_a circus-elephant elephant) (has elephant head) (has elephant trunk) (has head mouth) (is_a elephant animal) (has animal heart) (is_a circus-elephant performer)


(has performer costumes)

Circus-elephant elephant

head trunk

mouth

performer

costumes

animal

heart

Semantic networks

Advantages of semantic networks • Simple representation, easy to read • Associations possible

• Inheritance possible Disadvantages of semantic networks

• A separate inference procedure (interpreter) must be build • The validity of the inferences is not guaranteed • For large networks the processing is inefficient

•

Frame systems Frame theory

• When humans encounter something new, a basic structure called a frame

is selected from memory • A frame is a fixed framework in which all kinds of information is stored • For more details about the information in a frame, a different frame is

selected • A frame is connected to other frames, so this is a network of frames

Frame The term Frame was introduced in Minsky's paper: ``A Framework for

Representing Knowledge''.


A basic idea of frames is that people make use of stereo typed information about typical features of objects, images, and situations;

such information is assumed to be structured in large units representing the stereotypes, and these units are referred to as ``frames''.

Typical Features of Frames

A frame can represent an individual object or a class of similar objects.

Instead of properties, a frame has slots. A slot is like a property, but can contain

more kinds of information

Data type information; constraints on possible slot fillers, Documentation.

Frames can inherit slots from parent frames. For example, FIDO (an individual

dog) might inherit properties from DOG (its parent class) or MAMMAL (a parent

class of DOG).

A sample frame of a computer

center

Name : computer Center

Air-conditioner Stationery cupboard

computer Dumb-terminal

printer

Name of the frame

Slotes in the frame


Frame systems Frame

• Frame name: represents an object or a concept, so similar to node in the semantic network

• Frame type: shows if this a concept (class) or an object (instance)

Slot• Consists of slot name and facets

• Slot name: property or relation name

Facet• A facet gives information about

the slot, i.e. the value and name• Value: the value of the property

• Default: connecting frames can have a different value for this property

Demon• Perform a certain action if a

condition is satisfied

bird class

IS-A value animal

HAS-A default feather

default leg

#Leg default 2

Weight If-Needed calc-weight

Frame

nameFrame

type


Frame systems

bird class

IS-A value animal


default leg

#Leg default 2


Tweety instance

IS-A value bird

HAS-A value beak

Beakcol value red

Child value Sweety

Birthday value 1990.1.1

If-Added calc-age

crow class

IS-A value bird

Color default black

beak class

Beakcol default yellow

Frame systems

bird class

IS-A value animal


default leg

#Leg default 2


Tweety instance

IS-A value bird

HAS-A value beak

Beakcol value red

Child value Sweety

Birthday value 1990.1.1

If-Added calc-age

crow class

IS-A value bird

Color default black

beak class

Beakcol default yellow

Inference in frame systems

• Query: ―How many legs has a crow?‖


• Answer: 2

• Inference

• No information about this in the ―crow‖ frame

• Try to find it in the ―bird‖ frame

• Default value is 2

• Also called inheritance

• As soon as the birthday of Tweety is added, the ―calc-age‖ procedure is

invoked

• Query: ―What is the weight of Tweety?‖

• The answer is obtained by the procedure ―calc-weight‖ in bird

Frame interpreter

• Each frame system needs an inference mechanism

• Takes care of inheritance, the invoking of demons and the message

passing

Advantages of frame systems

• The knowledge can be structured

• Flexible inference by using procedural knowledge

• Layered representation and inheritance is possible

Disadvantages of frame systems

• The design of the interpreter is not easy

• The validity of the inferences is not guaranteed

• Hard to maintain consistency between the knowledge

Construct semantic network representations


1Richard Nixon is a Quaker and a Republican. Quakers and Republicans

are Persons.

Every Quaker follows the doctrine of pacifism.

b. Mary gave the green flowered vase to her cousin.


Scripts A script is a knowledge representation structure that is extensively used for

describing stereo type sequences of action.

It is special case of frame structure.

It represent events that takes place in day – to – day activities.

Script do have slots and with each slots, we associate info about the slot.

Components of scripts

1. Entry conditions

• Preconditions:

• facts that must be true to call the script

• Eg.: an open restaurant, a hungry customer that has some money

2. Results


• Postconditions:

• facts that will be true after the script has terminated

• Eg.: customer is full and has less money; restaurant owner has more

money

3. Props

• Typical things that support the content of the script

• Eg.: waiters, tables, menus

4. Roles

• Actions that participants perform

• Represented using conceptual dependency

• Eg.: waiter takes orders, delivers food, presents bill

5. Scenes

• A temporal aspect of the script

• Eg.: entering the restaurant, ordering, eating, …

LOGIC

One of the prime activity of human intelligence is reasoning. The activity of reasoning involves construction , organization and manipulation of statements to arrive at new

conclusions.

Thus logic can be defined as a scientific study of the process of reasoning and the system of rates and procedure that help in the reasoning process.Basically the logic process takes

some function called premises and produces some outputs called conclusions.

Classification of Logic


1. Propositional Logic

2. Predicate Logic

Propositional Logic:- This is the simplest form of logic.It takes only two values , i. e.

either the proposition is true or it is false.

Examples

Kinds of proposition

• Atomic or Simple Proposition in which simple or atomic sentences. • Molecular or Compound Propositions combining one or more atomic

proposition using a set of logical connectives.

Commonly used Propositional Logical Connectives

Properties of statements

• A sentence is

– satisfiable if it is true under some interpretation – valid if it is true under all possible interpretations

NAME CONNECTIVE

CONJUCTION AND

DISJUNCTION OR

NEGATION NOT

MATERIAL CONDITION IMPLIES

JOINT DENIAL NAND

DISJOINT DENIAL NOR


– Inconsistent/contradiction if there does not exist any interpretation under which the sentence is true

– logical consequence: S |= X if all models of S are also models of X OR – A sentence is LC of another if it is satisfied by all interpretations which

satisfy the first. – Example P is a LC of (P & Q) since any interpretation for which (P &

Q) is true , P is also true.

First Order Predicate Logic (FOPL )

OR

First Order Predicate Calculus

The predicate logic is logical extension of propositional logic.FOPL was developed

by logician as a means for formal reasoning , primarily in the areas of mathematics.It is used in representing different kind of knowledge.

FOPL is flexible enough to permit the accurate representation of natural language.

It is commonly used in program design.It provides a way of deducing new statements from old ones.

The predicate calculus includes a wider range of entities. It permits the description of relations and the use of variables. It also requires an understanding of quantification.

• First-order logic is used to model the world in terms of

– objects which are things with individual identities

e.g., individual students, lecturers, companies, cars ...

– properties of objects that distinguish them from other objects

e.g., mortal, blue, oval, even, large, ...

– classes of objects (often defined by properties)

e.g., human, mammal, machine, ...

– functions which are a subset of the relations in which there is only one

``value'' for any given ``input''.

e.g., father of, best friend, second half, one more than ...


The language of predicate calculus requires:

Variables

Constants

---these include the logical constants

The last two logical constants are additions to the logical connectives of propositional calculus ---they are known as quantifiers. The non-logical constants include both the

`names' of entities that are related and the `names' of the relations. For example, the constant dog might be a relation and the constant fido an entity.

Predicate :-these relate a number of entities. This number is usually greater than one. A

predicate with one argument is often used to express a property e.g. sun(hot) may represent

the statement that ``the sun has the property of being hot''.

Predicates: P(x[1], ..., x[n])

– predicate name; (x[1], ..., x[n]): argument list – Examples: human(x), – father(x, y)

A predicate, like a membership function, defines a set (or a class) of objects

Terms (arguments of predicates must be terms)

– Constants are terms (e.g., Fred, a, Z, “red”, etc.) – Variables are terms (e.g., x, y, z, etc.), a variable is instantiated when it is

assigned a constant as its value

Formulae:-these are constructed from predicates and formulae. The logical constants are used

to create new formulae from old ones. Here are some examples:


Note that the word ``and'' used in the left hand column is used to suggest that we have more than one formula for combination ---and not necessarily a conjunction.

In the last two examples, ``dog(X)'' contains a variable which is said to be free while the

``X'' in `` X.dog(X)'' is bound.

Sentence :- a formula with no free variables is a sentence.

Two informal examples to illustrate quantification follow:

We can now represent the problem we initially raised: X.(dog(X) smelly(X)) dog(fido) smelly(fido)

A well-formed formula (wff) is a sentence containing no "free" variables.

i.e., all variables are "bound" by universal or existential quantifiers.

( x)P(x,y) has x bound as a universally quantified variable, but y is free.

Quantifiers

Universal quantification (or for all)


– ( x)P(x) means that P holds for all values of x in the domain associated with that variable.

– E.g., ( x) dolphin(x) => mammal(x) ( x) human(x) => mortal(x)

– Universal quantifiers often used with "implication (=>)“ ( x) student(x) => smart(x) (All students are smart)

– Often associated with English words “all”, “everyone”, “always”, etc.

Existential quantification

– ( x)P(x) means that P holds for some value(s) of x in the domain associated with that variable.

– E.g., ( x) mammal(x) ^ lays-eggs(x) ( x) taller(x, Fred)

– Existential quantifiers usually used with “^ (and)" to specify a list of properties about an individual.

( x) student(x) ^ smart(x) (there is a student who is smart.)

– Often associated with English words “someone”, “sometimes”, ” at least “ etc.

Scopes of quantifiers

• Each quantified variable has its scope

– ( x)[human(x) => ( y) [human(y) ^ father(y, x)] – All occurrences of x within the scope of the quantified x refer to the same thing. – Use different variables for different things

• Switching the order of universal quantifiers does not change the meaning:

– ( x)( y)P(x,y) <=> ( y)( x)P(x,y), can write as ( x,y)P(x,y) • Similarly, you can switch the order of existential quantifiers.

– ( x)( y)P(x,y) <=> ( y)( x)P(x,y) • Switching the order of universals and existential does change meaning:

– Everyone likes someone: ( x)( y)likes(x,y) – Someone is liked by everyone: ( y)( x) likes(x,y)

Translating English to FOPL


1. Bhaskar likes aeroplanes. 2. Ravi’s father is rani’s father.

3. Plato is a man 4. Ram likes mango.

5. Sima is a girl. 6. Rose is red. 7. John owns gold

8. Ram is taller than mohan 9. My name is khan

10. Apple is fruit. 11. Ram is male. 12. Tuna is fish.

13. Dashrath is ram’s father. 14. Kush is son of ram.

15. Kaushaliya is wife of Dashrath. 16. Clinton is tall. 17. There is a white alligator.

18. All kings are person. 19. Nobody loves jane.

20. Every body has a father.

INFERENCE RULE

If we want to prove something, we apply some manipulation procedures on the given

statements to deduce new statements.If we are totally sure that the given statement are

true , then the newly derived statements are also true. Some rules are

1. Modus Ponens 2. Chain Rule 3. Substitution 4. Simplification 5. Transposition 6. Resolution 7. Unification

Modus Pones


If a has property P and all objects that have property P also have property Q, we conclude that a

has property Q.

P(a)

( x) P(x) → Q(x)

Q(a)

Example:-

• Assertion : Leo is a lion • Implication : All lions are ferocious. • Conclusion : Leo is ferocious.

Lion(Leo )

( x) Lion (x) → ferocious(x)

ferocious(Leo)

Chain Rule

If P→Q and Q→R then P →R

Example

Given : (programmer likes LISP)→(programmer hates COBOL)

And : (programmer hates COBOL) → (programmer likes Prolog)

Conclude: (programmer likes LISP)→ (programmer likes Prolog)

Substitution

If S is a valid statement then S’ is derived from is also valid.

Example :- if P Ѵ ˜P is valid then Q Ѵ ˜ Q is also valid.

Finding General Substitutions:---


• Given: Hate(x,y)

Hate(Marcus,z)

We Could Produce:

(Marcus/x, z/y)

(Marcus/x, y/z)

(Marcus/x, Caesar/y, Caesar/z)

(Marcus/x, Polonius/y, Polonius/z)

Simplification: - P and Q → P

•

Transposition :- P → Q

infer ~ Q → ~ P

UNIFICATION

• A unification of two terms is a join with respect to a specialization order.

Full Unification (variables on both sides)

Variable-Variable Matching P(x) P(y)

Q(x,x) Q(y,z)

P(f(x),z) P(y, Fido)

Unifiers: Variable Substitutions P(x) P(y) {y/x}


Q(x,x) Q(y,z) {y/x, z/x}

P(f(x),z) P(y,Fido) {y/f(x), z/Fido}

Consistent Variable Assignments P(Mary,John) P(y,y) #

R(x,y,y) R(y,y,z) {y\z, x\y}

W(P(x),y,z) W(Q(x),y,Fido) #

Advantages of Full Unification Query and data => both fully allow variables Permits full FOL Resolution (next)

Unification

Q(x)

P(y) FAIL

P(x)

P(y) x/y

P(Marcus)

P(y) Marcus/y

P(Marcus)

P(Julius) FAIL

P(x,x)

P(y,y) (y/x)

P(y,z) (z/y , y/x)

RESOLUTION


The resolution rule in prositional logic is a single valid inference rule producing from two clause ,

a new clause implied by them.

Robinsón in 1965 introduced the resolution principle which can be directly applied to any set of

clause.

In other Words to prove a statements resolution attempts to show that the negation of the

statement produces a contraduction with the known statement.

Example :- Perform resolution on the set of clauses.

A: P V Q V R

B: P V Q

C: R

D: Q

Resolution in Propositional Logic


The Basis of Resolution and Herbrand's Theorem

Given: winter V summer

winter V cold

We can conclude:

summer v cold

Herbrand's Theorem:

P V Q V R R

P V Q

Q

P

T V Q

T T


To show that a set of clauses S is unsatisfiable, it is necessary to consider only

interpretations over a particular set, called the Herbrand universe of S. A set of clauses S is

unsatisfiable if and only if a finite subset of ground instances (in which all bound variables have

had a value substituted for them) of S is unsatsifable.

Algorithm: Propositional Resolution

1. Convert all the propositions of F to clause form. 2. Negate P and convert the result to clause form. Add it to the set of clauses obtained in

step 1. 3. Repeat until either a contradiction is found or no progress can be made:

a) Select two clauses. Call these the parent clauses.

b) Resolve them together. The resolvent will be the disjunction of all of the literals of both of

the parent clauses with the following exception: If there are any pairs of literals L and L such

that one of the parent clauses contains L and the other contains L, then select one such pair

and eliminate both L and L from the resolvent.

c) If the resolvent is the empty clause, then a contradiction has been found. If it is not, then

add it to the set of clauses available to the procedure.

Skolem Functions in FOL

Objective Want all variables universally quantified Notational variant of FOL w/o existentials Retain implicitly full FOL expressiveness

Skolem’s Theorem Every existentially quantified variable can be replaced by a unique Skolem function

whose arguments are all the universally quantified variables on which the existential depends,

without changing FOL.

Examples “Everybody likes something”

(x) (y) [Person (x) & Likes(x,y)]

(x) [Person(x) & Likes(x, S1(x))]

Where S1(x) = “that which x likes”


“Every philosopher writes at least one book”

(x) (y)[Philosopher(x) & Book(y)) => Write(x,y)]

(x)[(Philosopher(x) & Book(S2(x))) => Write(x,S2(x))]

Examples of Conversion to Clause Form

Example: x: [Roman(x) know(x, Marcus)] [hate(x,Caesar) V ( y: z:

hate(y,z) thinkcrazy(x,y))]

Eliminate

x: [Roman(x) know(x, Marcus)] V [hate(x,Caesar) V ( y:

z: hate(y,z) V thinkcrazy(x,y))]

Reduce scope of . x: [ Roman(x) V know(x, Marcus)] V [hate (x,Caesar) V ( y: z:

hate(y,z) V thinkcrazy(x,y))]

“Standardize” variables:

x: P(x) V x: Q(x) converts to x: P(x) V y: Q(y)

Move quantifiers. x: y: z: [ Roman(x) V know(x, Marcus)] V

[hate(x,Caesar) V ( hate(y,z) V thinkcrazy(x,y))]

Eliminate existential quantifiers.

y: President(y) will be converted to President(S1)

x: y: father-of(y,x) will be converted to x: father-of(S2(x),x))

Drop the prefix.

[ Roman(x) know(x, Marcus)] V [hate(x, Caesar) V ( hate(y,z) V

thinkcrazy(x, y))]

Convert to a conjunction of disjuncts.

Roman(x) V know(x,Marcus) V hate(x,Caesar) V hate(y,z) V

thinkcrazy(x,y)


A Predicate Logic Example

1. Marcus was a man. 2. Marcus was a Pompeian. 3. All Pompeians were Romans. 4. Caesar was a ruler. 5. All Romans were either loyal to Caesar or hated him. 6. Everyone is loyal to someone. 7. People only try to assassinate rulers they aren't loyal to 8. Marcus tried to assassinate Caesar. 9. All men are people.

Prove that “Marcus hates Caesar”.

A Predicate Logic form

1. Marcus was a man. man(Marcus) 2. Marcus was a Pompeian. Pompeian(Marcus)

3. All Pompeians were Romans. x: Pompeian(x) Roman(x) 4. Caesar was a ruler. ruler(Caesar) 5. All Romans were either loyal to Caesar or hated him.

x: Roman(x) loyalto(x, Caesar) V hate(x,Caesar)

6. Everyone is loyal to someone. x: y: loyalto(x,y) 7. People only try to assassinate rulers they aren't loyal to.

x: y:person(x) ruler(y) tryassassinate(x,y) loyalto(x,y)

8. Marcus tried to assassinate Caesar. tryassassinate(Marcus, Caesar)

9. All men are people. x: man(x) person(x)

A Resolution Proof

Axioms in clause form:

1. man(Marcus)

2. Pompeian(Marcus)

3. Pompeian(x1) v Roman(x1)

4. Ruler(Caesar)

5. Roman(x2) v loyalto(x2, Caesar) v hate(x2, Caesar)


6. loyalto(x3, f1(x3))

7. man(x4) v ruler(y1) v tryassassinate(x4, y1) v

loyalto (x4, y1)

8 .tryassassinate(Marcus, Caesar)

9. man(x5) v person(x4)

An Unsuccessful Attempt at Resolution

Prove: hate(Marcus, Caesar) hate(Marcus, Caesar)

Roman(Marcus) V loyalto(Marcus,Caesar)

Marcus/x2

5

3

2

7

1

4

8

Marcus/x

1 Pompeian(Marcus) V loyalto(Marcus,Caesar)

loyalto(Marcus,Caesar)

Marcus/x4, Caesar/y1

man(Marcus) V ruler(Caesar) V tryassassinate(Marcus, Caesar)

ruler(Caesar) V tryassassinate(Marcus, Caesar)

tryassassinate(Marcus, Caesar)


Horn clause

a Horn clause is a clause (a disjunction of literals) with at most one positive literal. They are

named for the logician Alfred Horn, who first pointed out the significance of such clauses in

1951, in the article "On sentences which are true of direct unions of algebras", Journal of

Symbolic Logic, 16, 14-21.

A Horn clause with exactly one positive literal is a definite clause; a Horn clause with no

positive literals is sometimes called a goal clause, especially in logic programming. A Horn formula is a conjunctive normal form formula whose clauses are all Horn; in other

Prove: loyalto(Marcus, Caesar) loyalto(Marcus, Caesar)

Roman(Marcus) V hate(Marcus,Caesar)

Marcus/x2

5

3

2 Marcus/x

1 Pompeian(Marcus) V hate(Marcus,Caesar)

hate(Marcus,Caesar)


(a)

hate(Marcus,Caesar

)

10

persecute(Caesar, Marcus)

hate(Marcus,Caesar)

9


(b) :

:

http://en.wikipedia.org/wiki/Clause_%28logic%29

http://en.wikipedia.org/wiki/Disjunction

http://en.wikipedia.org/wiki/Literal_%28mathematical_logic%29

http://en.wikipedia.org/wiki/Alfred_Horn

http://en.wikipedia.org/wiki/1951

http://en.wikipedia.org/wiki/Sentence

http://en.wikipedia.org/wiki/Algebra

http://en.wikipedia.org/wiki/Journal_of_Symbolic_Logic



http://en.wikipedia.org/wiki/Logic_programming

http://en.wikipedia.org/wiki/Conjunctive_normal_form


words, it is a conjunction of Horn clauses. A dual-Horn clause is a clause with at most one negative literal. Horn clauses play a basic role in logic programming and are

important for constructive logic.

The following is an example of a (definite) Horn clause:

Such a formula can also be written equivalently in the form of an implication:

Horn clauses can be propositional or first order, depending on whether we consider

propositional or first-order literals.

The relevance of Horn clauses to theorem proving by first-order resolution is that the resolution of two Horn clauses is a Horn clause. Moreover, the resolution of a goal clause

and a definite clause is again a goal clause. In automated theorem proving, this can lead to greater efficiencies in proving a theorem (represented as a goal clause).

In fact, the resolution of a goal clause with a definite clause to produce a new goal clause

is the basis of the SLD resolution inference rule, used to implement logic programming and the programming language Prolog. In logic programming a definite clause behaves as

a goal-reduction procedure. For example, the Horn clause written above behaves as the procedure:

to show u, show p and show q and and show t.

To emphasize this backwards use of the clause, it is often written using the consequence

operator:

Propositional Horn clauses are also of interest in computational complexity, where the problem of finding a set of variable assignments to make a conjunction of Horn clauses

true is a P-complete problem, sometimes called HORNSAT. This is P's version of the boolean satisfiability problem, a central NP-complete problem. Satisfiability of first-order

Horn clauses is undecidable.


http://en.wikipedia.org/wiki/Constructive_logic

http://en.wikipedia.org/wiki/Propositional_logic

http://en.wikipedia.org/wiki/First-order_logic

http://en.wikipedia.org/wiki/First-order_resolution

http://en.wikipedia.org/wiki/Automated_theorem_proving

http://en.wikipedia.org/wiki/SLD_resolution


http://en.wikipedia.org/wiki/Prolog

http://en.wikipedia.org/wiki/Computational_complexity

http://en.wikipedia.org/wiki/P-complete

http://en.wikipedia.org/wiki/Horn-satisfiability

http://en.wikipedia.org/wiki/Boolean_satisfiability_problem

http://en.wikipedia.org/wiki/NP-complete

http://en.wikipedia.org/wiki/Undecidable


Using Resolution with Equality and Reduce

Axioms in clause form: 1. man(Marcus) 2. Pompeian(Marcus) 3. Born(Marcus, 40)

4. man(x1) V mortal(x1)

5. Pompeian(x2) V died(x2,79) 6. erupted(volcano, 79)

7. mortal(x3) V born(x3, t1) V gt(t2—t1, 150) V dead(x3, t2) 8. Now=2002

9. alive(x4, t3) V dead (x4, t3)

10. dead(x5, t4) V alive (x5, t4)

11. died (x6, t5) V gt(x6, t5) V dead(x6, t6)

Prove: alive(Marcus, now)

Prove: loyalto(Marcus, Caesar) loyalto(Marcus, Caesar)

Roman(Marcus) V hate(Marcus,Caesar)

Marcus/x2

5

3

2 Marcus/x

1 Pompeian(Marcus) V hate(Marcus,Caesar)

hate(Marcus,Caesar)


(a)

hate(Marcus,Caesar

)

10

persecute(Caesar, Marcus)

hate(Marcus,Caesar)

9


(b) :

:


Translating English to FOL

1. Every gardener likes the sun. 2. Not Every gardener likes the sun. 3. You can fool some of the people all of the time. 4. You can fool all of the people some of the time. 5. You can fool all of the people at same time. 6. You can not fool all of the people all of the time. 7. Everyone is younger than his father. 8. All purple mushrooms are poisonous. 9. No purple mushroom is poisonous. 10. There are exactly two purple mushrooms. 11. Clinton is not tall. 12. X is above Y if X is directly on top of Y or there is a pile of one or more other objects

directly on top of one another starting with X and ending with Y. 13. no one likes everyone

Rule-based Systems

A rule based system is also called a production system.

A production rule is an:

IF situation THEN action

IF premise THEN conclusion


IF antecedent THEN consequent

Rule-based systems are the most popular type of expert

systems.

Two inference methods are used in rule-based systems

Forward reasoning (Forward chaining, data driven reasoning)

start with known data and progress to a conclusion.

Backward reasoning (Backward chaining, goal driven reasoning)

start with a possible conclusion and try to prove its validity by searching for evidance.

Why are rule-based systems more popular?

1. Modular nature (easy to expand)

2. Explanation facilities easily implemented (by keeping track of the rules that fire)

3. Similarity to human cognitive process (work of Newell and Simon)

Forward chaining:- starts with the data available and uses the inference rules to conclude more data until a desired goal is reached.

An inference engine using forward chaining searches the inference rules until it finds

one in which the if-clause is known to be true.

It then concludes the then-clause and adds this information to its data. It would continue to do this until a goal is reached. Because the data available determines

which inference rules are used.this method is also called data driven.


A Simple Example

R1: IF hot AND smoky THEN ADD fire

R2: IF alarm_beeps THEN ADD smoky

R3: If fire THEN ADD switch_on_sprinklers

F1: alarm_beeps [Given]

F2: hot [Given]

F3: smoky [from F1 by R2]

F4: fire [from F2, F3 by R1]

F5: switch_on_sprinklers [from F4 by R3]

A typical Forward Chaining example

Forward Chaining Algorithm

Start from a set of facts (data available) and check to see if the premises of any rules are satisfied.

If there is a match then the rule fires (is executed).

The steps followed in forward chaining are:

1. Matching: Compare rules with known facts and find rules that are satisfied.


2. Conflict Resolution: More than one rule may be satisfied. Conflict resolution is the process of selecting the one with highest priority for execution.

3. Execution: The rule selected is executed (fired). This may result in a new fact(s)

to be added and the process continues forward.

Backward chaining:starts with a list of goals and works backwards to see if there is data which will allow it to conclude any of these goals.An inference engine using

backward chaining would search the inference rules until it finds one which has a then-clause that matches a desired goal. If the if-clause of that inference rule is not known to

be true, then it is added to the list of goals.


Backward Chaining

Same rules/facts may be processed differently, usingbackward chaining interpreter

Backward chaining means reasoning from goals backto facts.

The idea is that this focuses the search.

Checking hypothesis

Should I switch the sprinklers on?

Example

Rules:

R1: IF hotAND smoky THEN fire alarm_beeps

R2: IF alarm_beeps THEN smoky

R3: If fire THEN switch_on_sprinklers

Facts:

smoky hotF1: hot

F2: alarm_beeps

Goal:

Should I switch sprinklers on? fire

switch_on_sprinklers

Backward Chaining Algorithm

To prove goal G:


If G is in the initial facts, it is proven.

Otherwise, find a rule which can be used to conclude G, and try to prove each of that rule's conditions.

Advantages of Rule Based Systems

• Modularity:- Each rule is a separate unit. This makes adding, editing or removing of rules easily possible giving great flexibility to the system.

• Uniformity: -The same format is used for representing all of the knowledge.

• Naturalness:- In many domains rules are used to express the knowledge.

Disadvantages of Rule Based Systems

• Infinite Chaining

• Addition of new contradictory knowledge • Modification of existing Knowledge

• Inefficiency • Large number of rules needed to cover some domains (e.g. air traffic control)

Forward vs Backward

Chaining

Depends on problem, and on properties of rule set. If you have clear hypotheses, backward chaining is likely to be better.

Goal driven

Diagnostic problems or classification problems Medical expert systems

Forward chaining may be better if you have less clear hypothesis and want to

see what can be concluded from current situation.

Data driven

Synthesis systems Design / configuration

Problem Reduction

Sometimes problems only seem hard to solve. A hard problem may be one that can be reduced to a number of simple problems...and, when each of the simple problems is

solved, then the hard problem has been solved. This is the basic intuition behind the method of problem reduction.


In problem reduction search the problem space consists of an AND/OR

graph of (partial) state pairs. These pairs are referred to as (sub)problems. The first element of the pair is the starting state of the (sub)problem and the second element of the pair is the goal state (sub)problem.

There are two types of generators: non-terminal rules and terminal rules. Non-terminal

rules decompose a problem pair, <s0, g0> into an ANDed set of problem pairs {<si,gi>, <sj,gj>, ...>. The assumption is that the set of subproblems are in some sense simpler

problems than the problem itself. The set is referred to as an ANDed set because the assumption is that the solution of all of the subproblems implies that the problem has been solved. Note all of the subproblems must be solved in order to solve the parent

problem.

Any subproblem may itself be decomposed into subproblems. But, in order for this method to succeed, all subproblems must eventually terminate in primitive subproblems.

A primitive subproblem is one which can not be decomposed (i.e., there is no non-terminal that is applicable to the subproblem ) and its solution is simple or direct. The terminal rules serve as recognizers of primitive subproblems.

Problem reduction methods

• I want to be a famous musician– Learn to sing

– Learn to play the guitar

– Learn to play the bass

– Learn to play drums

• If I want to play the guitar what do I do?– Buy a guitar

– Take lessons

– Practice


Problem reduction methods

Musician

Singer Guitarist Bass player Drummer

Buy Guitar Take lessons Practice

AND/OR tree

What is an Expert System?

Jackson (1999) provides us with the following definition:


An expert system is a computer program that represents and reasons with knowledge of some

specialist subject with a view to solving problems or giving advice. To solve expert-level

problems, expert systems will need efficient access to a substantial domain knowledge base,

and a reasoning mechanism to apply the knowledge to the problems they are given. Usually

they will also need to be able to explain, to the users who rely on them, how they have reached

their decisions. They will generally build upon the ideas of knowledge representation,

production rules, search, and so on, that we have already covered. Often we use an expert

system shell which is an existing knowledge independent framework into which domain

knowledge can be inserted to produce a working expert system. We can thus avoid having to

program each new system from scratch.

Typical Tasks for Expert Systems

There are no fundamental limits on what problem domains an expert system can be built

to deal with. Some typical existing expert system tasks include:

1. The interpretation of data Such as sonar data or geophysical measurements

2. Diagnosis of malfunctions Such as equipment faults or human diseases

3. Structural analysis or configuration of complex objects Such as chemical compounds

or computer systems

4. Planning sequences of actions Such as might be performed by robots

5. Predicting the future Such as weather, share prices, exchange rates However, these

days, “conventional” computer systems can also do some of these things.

Characteristics of Expert Systems

Expert systems can be distinguished from conventional computer systems in that:

1. They simulate human reasoning about the problem domain, rather than simulating

the domain itself.

2. They perform reasoning over representations of human knowledge, in addition to

doing numerical calculations or data retrieval. They have corresponding distinct modules

referred to as the inference engine and the knowledge base.


3. Problems tend to be solved using heuristics (rules of thumb) or approximate methods or

probabilistic methods which, unlike algorithmic solutions, are not guaranteed to result in a

correct or optimal solution.

4. They usually have to provide explanations and justifications of their solutions or

recommendations in order to convince the user that their reasoning is correct.

Note that the term Intelligent Knowledge Based System (IKBS) is sometimes used as a

synonym for Expert System.

The Architecture of Expert Systems

The process of building expert systems is often called knowledge engineering. The

knowledge engineer is involved with all components of an expert system:

Building expert systems is generally an iterative process. The components and their

interaction will be refined over the course of numerous meetings of the knowledge

engineer with the experts and users. We shall look in turn at the various components.


Expert System Shells

An expert system shell is an expert system with an empty knowledge base, i.e.

An inference engine User interface module

Tracer/explanation module Knowledge base (rule) editor Etc.

EXSYS is a shell, KEE, OPS5, KAS, …

EMYCIN is the shell of MYCIN It is important to start with a shell with a suitable control strategy.

Recent trends are towards shells that include multiple engines, making them more flexible.

Knowledge Acquisition

The knowledge acquisition component allows the expert to enter their knowledge or

expertise into the expert system, and to refine it later as and when required. Historically, the

knowledge engineer played a major role in this process, but automated systems that allow the

expert to interact directly with the system are becoming increasingly common.

The knowledge acquisition process is usually comprised of three principal stages:

1. Knowledge elicitation is the interaction between the expert and the knowledge

engineer/program to elicit the expert knowledge in some systematic way.

2. The knowledge thus obtained is usually stored in some form of human friendly intermediate

representation.

3. The intermediate representation of the knowledge is then compiled into an executable form

(e.g. production rules) that the inference engine can process. In practice, many iterations

through these three stages are usually required!

Knowledge Elicitation

The knowledge elicitation process itself usually consists of several stages:


1. Find as much as possible about the problem and domain from books, manuals, etc. In

particular, become familiar with any specialist terminology and jargon.

2. Try to characterise the types of reasoning and problem solving tasks that the system will be

required to perform.

3. Find an expert (or set of experts) that is willing to collaborate on the project. Sometimes

experts are frightened of being replaced by a computer system!

4. Interview the expert (usually many times during the course of building the system). Find out

how they solve the problems your system will be expected to solve. Have them check and refine

your intermediate knowledge representation.

This is a time intensive process, and automated knowledge elicitation and machine

learning techniques are increasingly common modern alternatives.

Stages of Knowledge Acquisition

The iterative nature of the knowledge acquisition process can be represented in the

Following diagram

Levels of Knowledge Analysis


Knowledge identification: Use in depth interviews in which the knowledge engineer encourages

the expert to talk about how they do what they do. The knowledge engineer should understand

the domain well enough to know which objects and facts need talking about.

Knowledge conceptualization: Find the primitive concepts and conceptual relations of

the problem domain.

Epistemological analysis: Uncover the structural properties of the conceptual knowledge, such

as taxonomic relations (classifications).

Logical analysis: Decide how to perform reasoning in the problem domain. This kind

of knowledge can be particularly hard to acquire.

Implementational analysis: Work out systematic procedures for implementing and testing the

system.

Capturing Tacit/Implicit Knowledge

One problem that knowledge engineers often encounter is that the human experts use

tacit/implicit knowledge (e.g. procedural knowledge) that is difficult to capture.

There are several useful techniques for acquiring this knowledge:

1. Protocol analysis: Tape-record the expert thinking aloud while performing their role and later

analyse this. Break down the their protocol/account into the smallest atomic units of thought,

and let these become operators.

2. Participant observation: The knowledge engineer acquires tacit knowledge through practical

domain experience with the expert.

3. Machine induction: This is useful when the experts are able to supply examples of the results

of their decision making, even if they are unable to articulate the underlying knowledge or

reasoning process.

Which is/are best to use will generally depend on the problem domain and the expert.


Representing the Knowledge

We have already looked at various types of knowledge representation. In general, the

knowledge acquired from our expert will be formulated in two ways:

1. Intermediate representation – a structured knowledge representation that the

knowledge engineer and expert can both work with efficiently.

2. Production system – a formulation that the expert system’s inference engine can

process efficiently.

It is important to distinguish between:

1. Domain knowledge – the expert’s knowledge which might be expressed in the

form of rules, general/default values, and so on.

2. Case knowledge – specific facts/knowledge about particular cases, including any

derived knowledge about the particular cases.

The system will have the domain knowledge built in, and will have to integrate this with

the different case knowledge that will become available each time the system is used.

The Inference Engine

We have already looked at production systems, and how they can be used to generate

new information and solve problems.

Recall the steps in the basic Recognize Act Cycle:

1. Match the premise patterns of the rules against elements in the working memory.

Generally the rules will be domain knowledge built into the system, and the

working memory will contain the case based facts entered into the system, plus

any new facts that have been derived from them.

2. If there is more than one rule that can be applied, use a conflict resolution

strategy to choose one to apply. Stop if no further rules are applicable.


3. Activate the chosen rule, which generally means adding/deleting an item to/from

working memory. Stop if a terminating condition is reached, or return to step 1.

Early production systems spent over 90% of their time doing pattern matching, but there

is now a solution to this efficiency problem:

The Rete-Algorithm

The naïve approach to the recognize act cycle tries to match all E elements in working

memory against all P premises in all R rules, so it is necessary to check all EPR

possible matches in each cycle.

However, the rules will generally have parts of their conditions in common (structural

similarity), and the application of any one rule will generally only slightly change the

working memory (temporal redundancy).

These facts are used by the Rete Algorithm to improve efficiency (‘rete’ is Latin for

‘net’). The condition parts of the rules are structured into a network. For the first cycle,

the match algorithm generates the initial conflict set by processing the network for all

the initial facts. Thereafter, only the changed elements in working memory need to be

fed through the network to determine the changes to the conflict set.

The User Interface

The Expert System user interface usually comprises of two basic components:

1. The Interviewer Component

This controls the dialog with the user and/or allows any measured data to be read

into the system. For example, it might ask the user a series of questions, or it

might read a file containing a series of test results.

2. The Explanation Component

This gives the system’s solution, and also makes the system’s operation transparent


by providing the user with information about its reasoning process. For example, it

might output the conclusion, and also the sequence of rules that was used to come

to that conclusion. It might instead explain why it could not reach a conclusion.

So that is how we go about building expert systems. In the next two weeks we shall see

how they can handle uncertainty and be improved by incorporating machine learning.

Question:- Explain different types of ES.

Ans. Expert systems appear in many varieties. The following classification of ES is not exclusive, that is, one ES can appear in several categories:

1. Expert System and Knowledge - based Systems

An ES is one whose behaviour is so sophisticated that we would call a person who performed in a similar manner as an expert. MYCIN and XCON are good examples.

In the business world, however, systems are emerging that can perform tasks effectively and efficiently for whose execution you really do not need an

expert.

Such systems are referred to as Knowledge -based Systems. They are also

known as advisory systems, knowledge systems.

For example, let us look at a system that gives advice on immunizations recommended for travel abroad. The advice depends upon many attributes

such as age, sex and the health of the traveler and the country of destination. Knowledge systems can be constructed more quickly and cheaply than

expert systems.

2. Rule - based expert system

Many commercial ES are rules based, because the technology of rule -based system is relatively well developed. In such systems the knowledge is

represented as a series of production rules.


For example MYCIN is the best example of rule based ES.

3. Frame - based system

In these systems, the knowledge is represented as frames, a representation of the object - oriented programming approach.

4. Hybrid Systems

These systems include several knowledge representation approaches, at minimum frames and rules, but usually more.

5. Model-based Systems

Model-based systems are structured around a model that simulates the structure and function of the system under study. The model is used to

compute values, which are compared to observed ones. The comparison triggers action (if needed) or further diagnosis.

6. Ready-made (Off-the-Shelf) Systems

ES can be developed to meet the particular needs of a user (custom made), or they can be purchased as ready-made packages for any user. Ready-made

systems are similar to application packages like an accounting general ledger or project management in operations management. Ready-made systems enjoy the economy of mass production and therefore are considerably

less expensive than customized systems. They also can be used as soon as they are purchased. Unfortunately, ready-made systems are very general in nature, and the advice they render may not be of value to a user involved in a

complex situation.

7. Real-time Expert Systems

Real-time systems are systems in which there is a strict time limit on the system's response time, which must be fast enough for use to control the

process being computerized.


Case Study (MYCIN)

An example Goal-driven Medical Diagnostic Expert System

(taken from Luger and Stubblefield section 8.4)

Purpose:

Diagnose and recommend treatment for meningitis and bacteremia (more quickly

than definitive lab tests).

Explore how human experts reason with missing and incomplete information.

History

mid- late '70s 50 person years

Stanford medical school Comprehensively evaluated Never used clinically

Widely documented ("Rule-based expert systems" Buchanan and Shortliffe, Stanford 1984, a collection of publications on MYCIN).

Representation

Facts : <attribute-object-value-(certainty)>

(ident organism-1 klebsiella .25) there is evidence (.25) that the identity of organism-1 is klebsiella

(sensitive organism-1 penicillin -1.0) it is known that org-1 is NOT sensitive to penicillin. Rules: condition-action pairs

Condition is a conjunction (AND) of facts IF: (AND (same-context infection

primary-bacteremia) (membf-context site sterilesite) (same-context portal GI))

THEN: (conclude context- ident bacteroid tally .7)

If the infection is primary bacteremia and the site of the culture is a sterile one and the suspected portal of entry is GI tract then there is suggestive evidence (.7) that infection is bacteroid.

Consequent (then-part) can Add facts to database

Write to terminal Change a value in a fact, or its certainty Lookup a table

Execute a LISP procedure


Operation:

Routine questions

Specific questions about symptoms

Depth-first goal driven consideration

of each "known" organism

Terminates "depth-search" when certainty measures get too low.

Selection criterion is to maximise certainty - if a rule can prove a goal with certainty 1 then no more rules need be considered.

Goal-driven so that questions appear to be directed - less frustrating, more confidence building for the user. English- like interaction (see handout).

Answers WHY by printing the rule under consideration. Exhaustive consideration of possible infections - patient may have more than one.

Uncertainty in MYCIN

If A: stain is gram positive and B: morphology is coccus

and C: growth conformation is chains then there is suggestive evidence (0.7) that

H: organism is streptococcus

0.7 is the measure of increase of belief (MB) of H given evidence A and B and C.

MB ranges 0 to 1. Assigned by subjective judgement usually. As a guide:

1 if P(H)=1 MB(H|E) = max[P(H|E),P(H)] - P(H) otherwise

max[1,0] - P(H) Measures of disbelief also allowed. These also range 0 to 1. 1 if P(H)=1

MD(H|E) = min[P(H|E),P(H)] - P(H) otherwise min[1,0] - P(H)

Note if E and H are independent, E does not change the belief in H: P(H|E) = P(H), so MB = MD = 0.

MB(H|E) should only be 1 if E logically implies H. Initially each hypothesis has MB=MD=0.


As evidence is accumulated these are updated. At the end a certainty factor CF = MB-MD is computed for each hypothesis.

The largest absolute CF values used to determine appropriate therapy. Weakly supported hypotheses |CF| < 2 are ignored.

MYCIN's handling of uncertainty is an ad-hoc method (based on probability). But it seems to work as well as more formal approaches.

LEARNING

Learning is the study of how to build computer systems that adapt and improve with experience. It is a subfield of Artificial Intelligence and intersects with cognitive science, information theory, and probability theory, among others.

Machine learning is particularly attractive in several real life problem because of the

following reasons:

• Some tasks cannot be defined well except by example


• Working environment of machines may not be known at design time

• Explicit knowledge encoding may be difficult and not available

• Environments change over time

learning is widely used in a number of application areas including

• Data mining and knowledge discovery

• Speech/image/video (pattern) recognition

• Adaptive control

• Autonomous vehicles/robots

• Decision support systems

• Bioinformatics

• WWW

Rote learning

Rote learning is a learning technique which avoids understanding of a subject and

instead focuses on memorization. The major practice involved in rote learning is learning by repetition. The idea is that one will be able to quickly recall the meaning of the material the more one repeats it.

Rote learning is widely used in the mastery of foundational knowledge. Examples include, phonics in reading, the periodic table in chemistry, multiplication tables in mathematics, anatomy in medicine, cases or statutes in law, basic formulas in any

science, etc. Rote learning, by definition, eschews comprehension, however, and consequently, it is an ineffective tool in mastering any complex subject at an advanced level. However, rote learning is still useful in passing exams. If exam papers are not well

designed, it is possible for someone with good memorization techniques to pass the test without any meaningful comprehension of the subject. However, learning the context of a

particular topic can make the subject more memorable.

However, with some material rote learning is the only way to learn it in a timely manner; for example, when learning the Greek alphabet or the vocabulary of a foreign language.

Similarly, when learning the conjugation of foreign irregular verbs, the morphology is often too subtle to be learned explicitly in a short time. However, as in the alphabet example, learning where the alphabet came from helps one to grasp the concept of it and

therefore memorize it. (Native speakers and speakers with a lot of experience usually get

http://en.wikipedia.org/wiki/Learning

http://en.wikipedia.org/wiki/Memory

http://en.wikipedia.org/wiki/Recollection

http://en.wikipedia.org/wiki/Phonics

http://en.wikipedia.org/wiki/Reading

http://en.wikipedia.org/wiki/Periodic_table

http://en.wikipedia.org/wiki/Multiplication_tables

http://en.wikipedia.org/wiki/Anatomy

http://en.wikipedia.org/wiki/Statute

http://en.wikipedia.org/wiki/Formula

http://en.wikipedia.org/wiki/Vocabulary

http://en.wikipedia.org/wiki/Foreign_language

http://en.wikipedia.org/wiki/Grammatical_conjugation

http://en.wikipedia.org/wiki/Irregular_verb

http://en.wikipedia.org/wiki/Morphology_%28linguistics%29

http://en.wikipedia.org/wiki/First_language


an intuitive grasp of those subtle rules and are able to conjugate even irregular verbs that they have never heard before.)

The source transmission could be auditory or visual, and is usually in the form of short

bits such as rhyming phrases (but rhyming is not a prerequisite), rather than chunks of text large enough to make lengthy paragraphs. Brevity is not always the case with rote

learning. For example, many Americans can recite their National Anthem, or even the much more lengthy Preamble to the United States Constitution. Their ability to do so can be attributed, at least in some part, to having been assimilated by rote learning. The

repeated stimulus of hearing it recited in public, on TV, at a sporting event, etc. has caused the mere sound of the phrasing of the words and inflections to be "written", as if

hammer-to-stone, into the long-term memory.

Inductive learning

Inductive learning is an inherently conjectural process because any knowledge created by generalization from specific facts cannot be proven true; it can only be proven false. Hence, inductive inference is falsity preserving, not truth preserving.

To generalize beyond the specific training examples, we need constraints or biases on what f is best. That is, learning can be viewed as searching the Hypothesis Space H of possible f functions.

A bias allows us to choose one f over another one A completely unbiased inductive algorithm could only memorize the training examples

and could not say anything more about other unseen examples. Two types of biases are commonly used in machine learning:

o Restricted Hypothesis Space Bias Allow only certain types of f functions, not arbitrary ones

o Preference Bias Define a metric for comparing fs so as to determine whether one is better than another

Inductive Learning Framework

Raw input data from sensors are preprocessed to obtain a feature vector, x, that adequately describes all of the relevant features for classifying examples.

Each x is a list of (attribute, value) pairs. For example,

x = (Person = Sue, Eye-Color = Brown, Age = Young, Sex = Female)

The number of attributes (also called features) is fixed (positive, finite). Each attribute has a fixed, finite number of possible values.

Each example can be interpreted as a point in an n-dimensional feature space, where n is the number of attributes.

http://en.wikipedia.org/wiki/Hearing_%28sense%29

http://en.wikipedia.org/wiki/Visual_system

http://en.wikipedia.org/wiki/Star-Spangled_Banner

http://en.wikipedia.org/wiki/Preamble_to_the_United_States_Constitution


Decision tree learning

It used in data mining and machine learning, uses a decision tree as a predictive model

which maps observations about an item to conclusions about the item's target value. More

descriptive names for such tree models are classification trees or regression trees. In these

tree structures, leaves represent classifications and branches represent conjunctions of

features that lead to those classifications.

In decision theory and decision analysis, a decision tree is a graph or model of decisions and

their possible consequences, including chance event outcomes, resource costs, and utility. It

can be used to create a plan to reach a goal. Decision trees are constructed in order to help

with making decisions. A decision tree is a special form of tree structure.

Another use of trees is as a descriptive means for calculating conditional probabilities.

In decision analysis, a decision tree can be used to visually and explicitly represent decisions

and decision making.

In data mining, a decision tree describes data but not decisions; rather the resulting

classification tree can be an input for decision making. This page deals with trees in data

mining.

Practical example

David is the manager of a famous golf club. Sadly, he is having some trouble with his

customer attendance. There are days when everyone wants to play golf and the staff are

overworked. On other days, for no apparent reason, no one plays golf and staff have too

much slack time. David’s objective is to optimize staff availability by trying to predict when

people will play golf.

To accomplish that he needs to understand the reason people decide to play and if there is

any explanation for that.

He assumes that weather must be an important underlying factor, so he decides to use the

weather forecast for the upcoming week.

So during two weeks he has been recording:

• The outlook, whether it was sunny, overcast or raining.


• The temperature (in degrees Fahrenheit). • The relative humidity in percent. • Whether it was windy or not. • Whether people attended the golf club on that day.

David compiled this dataset into a table containing 14 rows and 5 columns as shown below.


A decision tree is a model of the data that encodes the distribution of the class label (again

the Y) in terms of the predictor attributes. It is a directed acyclic graph in form of a tree. The

top node represents all the data.

The classification tree algorithm concludes that the best way to explain the dependent

variable, play, is by using the variable "outlook". Using the categories of the variable

outlook, three different groups were found:

• One that plays golf when the weather is sunny, • One that plays when the weather is cloudy, and • One that plays when it's raining.

David's first conclusion: if the outlook is overcast people always play golf, and there are

some fanatics who play golf even in the rain. Then he divided the sunny group in two. He

realized that people don't like to play golf if the humidity is higher than seventy percent.

Finally, he divided the rain category in two and found that people will also not play golf if it

is windy.

And lastly, here is the short solution of the problem given by the classification tree:

David dismisses most of the staff on days that are sunny and humid or on rainy days that are

windy, because almost no one is going to play golf on those days.


On days when a lot of people will play golf, he hires extra staff.

The conclusion is that the decision tree helped David turn a complex data representation

into a much easier structure.

Explaination based learning

Explanation based learning (EBL) uses an explicit domain theory to construct an explanation or

proof of a training example. By then generalizing from the proof, new knowledge is acquired

that can be applied in non-training situations.

This differs from inductive learning in that the domain theory implies the new knowledge. It is sometimes called deductive learning or analytic learning.

Example: Learning When an Object is a Cup

Target Concept: cup(C) :- premise(C).

Domain Theory:

cup(X) :- liftable(X),

holds_liquid(X).

holds_liquid(Z) :- part(Z,W),

concave(W),

points_up(W).

liftable(X) :- light(X), part(Y,handle).

light(X) :- small(X).

light(X) :- made_of(X,feathers).

Note that the domain theory includes the knowledge needed to determine when

something is a cup. We want an explicit rule that specifies when something is a cup.

What is Natural language processing?

Ans. ``Natural'' languages are human languages, such as English, German, or Chinese.

Understanding text (in machine-readable form). What customers ordered widgets in May? Understanding continuous speech: perception as well as language understanding. Language generation (written or spoken).


Machine translation, e.g., German to English:[METAL system, University of Texas Linguistics Research Center.]

Vor dem Headerfeld befindet sich eine

Praeambel von 42 Byte Laenge fuer den

Ausgleich aller Toleranzen.

-->

A preamble of 42 byte length for the

adjustment of all tolerances is found

in front of the header field.

Que:- Explain Grammar. Write grammar for English sentences.

Ans. A grammar specifies the legal syntax of a language. The kind of grammar most

often used in computer language processing is a context-free grammar. A grammar specifies a set of productions; non-terminal symbols (phrase names or parts of speech) are enclosed in angle brackets. Each production specifies how a nonterminal symbol may

be replaced by a string of terminal or nonterminal symbols, e.g., a Sentence is composed of a Noun Phrase followed by a Verb Phrase.

< s> --> < np> < vp>

< np> --> < art> < adj> < noun>

< np> --> < art> < noun>

< np> --> < art> < noun> < pp>

< vp> --> < verb> < np>

< vp> --> < verb> < np> < pp>

< pp> --> < prep> < np>

< art> --> a | an | the

< noun> --> boy | dog | leg | porch

< adj> --> big

< verb> --> bit

< prep> --> on

Q:- What is Parsing? Explain each type of parsing.

Ans. Parsing is the inverse of generation: the assignment of structure to a linear string of words according to a grammar; this is much like the ``diagramming'' of a sentence taught

in grammar school.


A parser is a program that converts a linear string of input words into a structured representation that shows how the phrases (substructures) are related and shows how the input could have been derived according to the grammar of the language.

Finding the correct parsing of a sentence is an essential step towards extracting its meaning.

Natural languages are harder to parse than programming languages; the parser will often make a mistake and have to fail and back up: parsing is search. There may be hundreds of

ambiguous parses, most of which are wrong.

Parsers are generally classified as top-down or bottom-up, though real parsers have characteristics of both.

There are several well-known context-free parsers:

Cocke-Kasami-Younger (CKY or CYK) chart parser Earley algorithm Augmented transition network

Top-down Parser

A top-down parser begins with the Sentence symbol, < S> , expands a production for < S> , and so on recursively until words (terminal symbols) are reached. If the string of

words matches the input, a parsing has been found.[See the Language Generation slide earlier in this section.]

This approach to parsing might seem hopelessly inefficient. However, top-down filtering,

that is, testing whether the next word in the input string could begin the phrase about to be tried, can prune many failing paths early.

For languages with keywords, such as programming languages or natural language

applications, top-down parsing can work well. It is easy to program.


Bottom-up Parsing

In bottom-up parsing, words from the input string are reduced to phrases using grammar

productions:

< NP>

/ \

< art> < noun>

| |

The man ate fish

This process continues until a group of phrases can be reduced to < S> .

Chart Parser

A chart parser is a type of bottom-up parser that produces all parses in a triangular array called the chart; each chart cell contains a set of nonterminals. The bottom level of the

array contains all possible parts of speech for each input word. Successive levels contain reductions that span the items in levels below: cell a_i,k contains nonterminal N iff there

is a parse of N beginning at word i and spanning k words.

The chart parser eliminates the redundant work that would be required to reparse the same phrase for different higher-level grammar rules.

The Cocke-Kasami-Younger (CKY) parser is a chart parser that guarantees to parse any

context-free language in at most O(n^3) time.

HOW CAN WE REASON?

To a certain extend this will depend on the chosen knowledge representation. Although

a good knowledge representation scheme has to allow easy, natural, and plausible reasoning.

Listed below are very broad methods of how we may reason.

1) Deductive Reasoning:


Deductive Reasoning is a process in which general premises are used to obtain a specific

inference. Reasoning moves from a general principle to a specific conclusion.

Example:

Premise : I wash my car when the weather is good on weekends.

Premise: Today is Sunday and the weather is hot

Conclusion: Therefore, I will wash my car today

To use deductive reasoning the problem must generally be formatted in this way.Once the

format has been achieved, the conclusion must be valid if the premises are true.

The whole idea is to develop new knowladge from previously given knowledge.

Starting with such a set of postulates, axioms, and definitions Euclid was able to prove 465

geometric propositions as the logical consequence of the input assumptions.

One of the basic rules of inference of deductive logic is the modus ponens rule.

A formal English statement of this rule is :

If X is true and if X being true implies Y is true then Y is true.

(X∧(X→Y)→Y

Example:

All cats are felines.

Bosty is a cat.

I can deduce that Bosty is a feline.

Abduction

Abduction is a form of deductive logic which provides only a ‘plausible inference’.

For instance:

If I read Smoking couses lung cancer and Frank died of lung cancer, I may infer that

Frank was a smoker.

Again this may or may not be true. Using statistics and probability theory, abduction

may yield the most probable inference among many possible inferences.


Abduction is heuristic in the sense that it provides a plausible conclusion consistent with

available information, but one which may in fact be wrong.


To illustrate how abduction works, consider following logical system consisting of a general rule

and a specific proposition:

1) All succesful , entrepreneurial industrialists are rich persons. 2) John is a rich person.

If this was only information available, a plausible inference would be that John was a

successful , entrepreneurial industrialist. This conclusion could also be false since there are

other roads to riches such as inheritance , the lottery...If we had a table of the income

distribution of wealthy persons along with their personal histories, we could refine our

abduction inference with the probability of the inference being true .

2)Inductive Reasoning A principle of reasoning to a conclusion about all members of a

class from examination of only a few members of the class; broadly, reasoning from the

particular to the general.

For example:

In 1998, The best model of Turkey is from İzmir



I would logically infer that all the girls from İzmir is beautiful. This may or may not be

true. But it provides a useful generalization.

Another example :

1= 12

1+3 = 22

1+3+5 = 32

1+3+5+7 =42

and, by induction Σ ( n successive odd integers ) = n2

Another example :

Falcon can fly.


Canary can fly.

Gull can fly.

Conclusion: Birds can fly.

The outcome of the inductive reasoning process will frequently contain some measure

of uncertainty because including all possible facts in the premises is usually impossible.

Deductive or inductive approaches are used in logic, rule-based systems, and frames.

Shikha Sharma RCET,Bhilai

3) Analogical Reasoning

Analogical reasoning assumes that when a question is asked, the answer can be derived by analogy.

Example :

Premise : All football teams gets 3 point when they win.

Question : How many points did GS take this weekend?

Conclusion : Because GS is a football team and won Antep they took 3 points.

Analogical reasoning is a type of verbalization of an internalized learning process. An individual uses processes that require an ability to recognize previously

encountered experiences.

The use of this approach has not been exploited yet in AI field. However, case-based reasoning is an attempt.

4) Formal Reasoning

Formal reasoning involves syntactic manipulation of data structures to deduce new facts. A typical example is the mathematical logic used in proving

theorems in geometry.

5) Procedural Numeric Reasoning

Procedural numeric reasoning uses mathematical models or simulation to solve problems. Model -based reasoning is an example of this approach.

6) Generalization and Abstraction

Generalization and abstraction can be successfully used with both logical and semantic representation of knowledge.


7) Metalevel Reasoning

Meta level reasoning involves “knowledge about what you know”.

Which approach to use, how successful the inference will be, depends to a great extent on which knowledge representation method is used.

For example; reasoning by analogy can be more successful with semantic networks than with frames.

REASONING WITH LOGIC

We utulize various rules of inference to manipulate the logical expressions to create new expressions.

Modus Ponens

If there is a rule “if A, then B,” and if we know that A is true, then it is valid to conclude that B is also true.

[A AND (A → B)+ → B

Example :

A : It is rainy.

B : I will stay at home.

A→B : If is rainy, I will stay at home.

Modus Tollens

When B is known to be false, and if there is a rule “if A, then B,” it is valid to conclude that A is also false.

Resolution

Resolution is a method of discovering whether a new fact is valid, given a set of logical statements. It is a method of “theorem proving”. The resolution

process, which can be computerized because of its well-formed structure, is applied to a pair of parent clauses to produce a derived new clause.


Bayesian networks

These are also called Belief Networks or Probabilistic Inference Networks. Initially developed by Pearl (1988).

The basic idea is:

Knowledge in the world is modular -- most events are conditionally independent of most other events. Adopt a model that can use a more local representation to allow interactions between events that only affect each other. Some events may only be unidirectional others may be bidirectional -- make a distinction between these in model. Events may be causal and thus get chained together in a network.

Implementation

A Bayesian Network is a directed acyclic graph: o A graph where the directions are links which indicate dependencies that exist between nodes. o Nodes represent propositions about events or events themselves. o Conditional probabilities quantify the strength of dependencies.

Consider the following example:

The probability, that my car won't start. If my car won't start then it is likely that

o The battery is flat or o The staring motor is broken.

In order to decide whether to fix the car myself or send it to the garage I make the following decision:

If the headlights do not work then the battery is likely to be flat so i fix it myself. If the starting motor is defective then send car to garage. If battery and starting motor both gone send car to garage.

The network to represent this is as follows:


Fig. 21 A simple Bayesian network

Reasoning in Bayesian nets

Probabilities in links obey standard conditional probability axioms. Therefore follow links in reaching hypothesis and update beliefs accordingly. A few broad classes of algorithms have bee used to help with this:

o Pearls's message passing method. o Clique triangulation. o Stochastic methods. o Basically they all take advantage of clusters in the network and use their limits on the influence to constrain the search through net. o They also ensure that probabilities are updated correctly.

Since information is local information can be readily added and deleted with minimum effect on the whole network. ONLY affected nodes need updating.

A Practical Example

Here we describe a practical example from research based here in Cardiff.

We have used Bayesian Nets in a Computer Vision application. Details of the visual processes involved will be discussed later in the course so the contest will become clearer later.

http://www.cs.cf.ac.uk/Dave/AI2/node95.html#figbnet1


Here we attempt to describe the Bayesian reasoning behind the process.

The goal is to perform a task called data fusion to obtain a segmentation -- a description of an object (viewed from a set of images) detailing its surface properties. In the example given here we deal with a simple cube. So the final description will hopefully list its ed ges and its faces and how

they are connected together.

The input to the fusion process is three preprocessing stages that have extracted out edge information and planar surface information from 2D grey scale (monochrome) images and 3D range data.

So from these three pre-processes we have a list of all lines, curved or straight, a list of all line intersections (two or three line intersections) and a list

of all the surface equations extracted from both image types.

We can now build the network from these lists of features. As mentioned above, we hypothesise about extracted surfaces intersecting. For us to evaluate these hypotheses we need to have evidence to support or contradict them. The evidence that we use is :

straight lines extracted from light image. curves extracted from light image. `areas of uncertainty' extracted from depth map.

The two lines lists are generated as described above. The areas of uncertainty are found when we are attempting to find the surface equations of each surface type.

Errors are found in the depth map where the mask to find the general surface shape overlaps two or more surfaces, the error tends to be enlarged

therefore, giving us a clue that a surface intersection exists in that general area. So we are using evidence from more than one source of data.

We proceed by taking each of the surfaces in the surface list and a node is generated to represent it. We then take a pair of surfaces and attempt to intersect them.

If they are possibly intersecting then a `feature group' node is generated referencing the surfaces and connected to the children surface nodes. This process is repeated for each pair surfaces that we have extracted. We now want to attach a conditional probability to each of our new nodes. So we

now know the surfaces that could possibly interact in the object.

We now attach a probability to these connections. We do this by finding the equation of intersection, this will be a three dimensional line for two planes or an ellipse for a plane and a sphere, and project this onto our focal plane.

Now we have our hypothesised intersections in the same dimension as our extracted lines from the preliminary stage. So we now find, for each

intersecting line a closest match line from our line list.


Once we have found the closest matching line we generate a probability from the error. So a line that closely matches our intersection line then we have a high probability whereas two surfaces that don't intersect in the object are unlikely to coincide with a line from the line list therefore giving us

a low probability.

The line that is found is also checked to see if it lies in an area of uncertainty. If it does then that is another strong clue that the line that we have found is actually where surfaces are joined.

So once we have generated this network with all the necessary links etc. any more information that is provided to the system can be added and the

network will propagate this information throughout the network in the form of probability updating.

So for example say a new image was provided from say a colour image and this image increased the likelihood of some edges and corners being present in the image then this would increase the probability of those features that are linked to those edges and corners which would propagate

throughout the network.

Figure 22 shows us a simple example of the network that would be generated from the input data of edges and planar faces of the cube. As can be seen, the feature group nodes can represent groups that vary from single features such as line segments, surfaces or corners or the whole object is represented in the lower nodes which includes three surfaces, three line segments, three crosses and one corner.



Fig. 22 A Bayesian Network for Segmentation of a Cube



Neural Network

A neural network is a computational structure inspired by the study of biological neural processing. There are many different

types of neural networks, from relatively simple to very complex, just as there are many theories on how biological neural

processing works.

We will begin with a discussion of a layered feed-forward type of neural network.

A layered feed-forward neural network has layers, or subgroups of processing elements.

A layer of processing elements makes independent computations on data that it receives and passes the results to another

layer.

The next layer may in turn make its independent computations and pass on the results to yet another layer. Finally, a

subgroup of one or more processing elements determines the output from the network.

Each processing element makes its computation based upon a weighted sum of its inputs.

The first layer is the input layer and the last the output layer. The layers that are placed between the first and the last layers

are the hidden layers.

The processing elements are seen as units that are similar to the neurons in a human brain, and hence, they are referred to as

cells, neuromimes, or artificial neurons.


A threshold function is sometimes used to qualify the output of a neuron in the output layer. Even though our subject matter

deals with artificial neurons, we will simply refer to them as neurons. Synapses between neurons are referred to as

connections, which are represented by edges of a directed graph in which the nodes are the artificial neurons.

Figure 1.1 is a layered feed-forward neural network. The circular nodes represent neurons.

Here there are three layers, an input layer, a hidden layer, and an output layer. The directed graph mentioned shows the

connections from nodes from a given layer to other nodes in other layers. Throughout this book you will see many variations

on the number and types

of layers.

Figure 1.1 A typical neural network.

Output of a Neuron

Basically, the internal activation or raw output of a neuron in a neural network is a weighted sum of its inputs, but a threshold

function is also used to determine the final value, or the output. When the output is 1, the neuron is said to fire, and when it is

0, the neuron is considered not to have fired. When a threshold function is used, different results


of activations, all in the same interval of values, can cause the same final output value.

This situation helps in the sense that, if precise input causes an activation of 9 and noisy input causes an activation of 10, then

the output works out the same as if noise is filtered out.

To put the description of a neural network in a simple and familiar setting, let us describe an example about a daytime game

show on television, The Price is Right.

Fuzzy Logic

Logic deals with true and false. A proposition can be true on one occasion and false on another. “Apple is a red fruit” is such a

proposition. If you are holding a Granny Smith apple that is green, the proposition that apple is a red fruit is false. On the

other hand, if your apple is of a red delicious variety, it is a red fruit and the proposition in reference is true.

If a proposition is true, it has a truth value of 1; if it is false, its truth value is 0. These are the only possible truth values.

Propositions can be combined to generate other propositions, by means of logical operations.

When you say it will rain today or that you will have an outdoor picnic today, you are making statements with certainty. Of

course your statements in this case can be either true or false. The truth values of your statements can be only 1, or 0. Your

statements then can be said to be crisp.


On the other hand, there are statements you cannot make with such certainty. You may be saying that you think it will rain

today. If pressed further, you may be able to say with a degree of certainty in your statement that it will rain today. Your level

of certainty, however, is about 0.8, rather than 1. This type of situation is what fuzzy logic was developed to model. Fuzzy logic

deals with propositions that can be true to a certain degree—somewhere from 0 to 1. Therefore, a proposition’s truth value

indicates the degree of certainty about which the proposition is true.

The degree of certainity sounds like a probability (perhaps subjective probability), but it is not quite the same. Probabilities for

mutually exclusive events cannot add up to more than 1, but their fuzzy values may.

Suppose that the probability of a cup of coffee being hot is 0.8 and the probability of the cup of coffee being cold is 0.2. These

probabilities must add up to 1.0. Fuzzy values do not need to add up to 1.0. The truth value of a proposition that a cup of

coffee is hot is 0.8.

The truth value of a proposition that the cup of coffee is cold can be 0.5. There is no restriction on what these truth values

must add up to.

Fuzzy Sets

Fuzzy logic is best understood in the context of set membership.

Suppose you are assembling a set of rainy days. Would you put today in the set? When you deal only with crisp statements

that are either true or false, your inclusion of today in the set of rainy days is based on certainty. When dealing with fuzzy

logic, you would include today in the set or rainy days via an ordered pair, such as (today, 0.8). The first member in such an

ordered pair is a candidate for inclusion in the set, and the second member is a value between 0 and 1, inclusive, called the

degree of membership in the set. The inclusion of the degree of membership in the set makes it convenient for developers to


come up with a set theory based on fuzzy logic, just as regular set theory is developed. Fuzzy sets are sets in which members

are presented as ordered pairs that include information on degree of membership.

A traditional set of, say, k elements, is a special case of a fuzzy set, where each of those k elements has 1 for the degree of

membership, and every other element in the universal set has a degree of membership 0, for which reason you don’t bother

to list it.

Fuzzy Set Operations

The usual operations you can perform on ordinary sets are union, in which you take all the elements that are in one set or the

other; and intersection, in which you take the elements that are in both sets. In the case of fuzzy sets, taking a union is finding

the degree of membership that an element should have in the new fuzzy set, which is the union of two fuzzy sets.

If a, b, c, and d are such that their degrees of membership in the fuzzy set A are 0.9, 0.4, 0.5, and 0, respectively, then the

fuzzy set A is given by the fit vector (0.9, 0.4, 0.5, 0). The components of this fit vector are called fit values of a, b, c, and d.

Union of Fuzzy Sets

Consider a union of two traditional sets and an element that belongs to only one of those sets. Earlier you saw that if you treat

these sets as fuzzy sets, this element has a degree of membership of 1 in one case and 0 in the other since it belongs to one

set and not the other.

Yet you are going to put this element in the union. The criterion you use in this action ha to do with degrees of membership.

You need to look at the two degrees of membership, namely, 0 and 1, and pick the higher value of the two, namely, 1. In other

words, what you want for the degree of membership of an element when listed in the union of two fuzzy sets, is the

maximum value of its degrees of membership within the two fuzzy sets forming a union.


If a, b, c, and d have the respective degrees of membership in fuzzy sets A,B as A = (0.9, 0.4, 0.5, 0) and B = (0.7, 0.6, 0.3, 0.8),

then A [cup] B = (0.9, 0.6, 0.5, 0.8).

Intersection and Complement of Two Fuzzy Sets

Analogously, the degree of membership of an element in the intersection of two fuzzy sets is the minimum, or the smaller

value of its degree of membership individually in the two sets forming the intersection. For example, if today has 0.8 for

degree of membership in the set of rainy days and 0.5 for degree of membership in the set of days of work completion, then

today belongs to the set of rainy days on which work is completed to a

degree of 0.5, the smaller of 0.5 and 0.8.

Recall the fuzzy sets A and B in the previous example. A = (0.9, 0.4, 0.5, 0) and B = (0.7, 0.6, 0.3, 0.8). A[cap]B, which is the

intersection of the fuzzy sets A and B, is obtained by taking, in each component, the smaller of the values found in that

component in A and in B.

Thus A[cap]B = (0.7, 0.4, 0.3, 0).

The idea of a universal set is implicit in dealing with traditional sets. For example, if you talk of the set of married persons, the

universal set is the set of all persons. Every other set you consider in that context is a subset of the universal set. We br ing up

this matter of universal set because when you make the complement of a traditional set A, you need to put in every element

in the universal set that is not in A. The complement of a fuzzy set,

however, is obtained as follows. In the case of fuzzy sets, if the degree of membership is 0.8 for a member, then that member

is not in that set to a degree of 1.0 – 0.8 = 0.2. So you can set the degree of membership in the complement fuzzy set to the

complement with respect to 1. If we return to the scenario of having a degree of 0.8 in the set of rainy days, then today has to

have 0.2 membership degree in the set of nonrainy or clear days.


Continuing with our example of fuzzy sets A and B, and denoting the complement of A by

A’, we have A’ = (0.1, 0.6, 0.5, 1) and B’ = (0.3, 0.4, 0.7, 0.2).

Note that A’ *cup+ B’ = (0.3, 0.6, 0.7, 1), which is also the complement of A *cap+ B. You can similarly verify that the

complement of A *cup+ B is the same as A’ *cap+ B’.

Applications of Fuzzy Logic

Applications of fuzzy sets and fuzzy logic are found in many fields, including artificial intelligence, engineering, computer

science, operations research, robotics, and pattern recognition. These fields are also ripe for applications for neural networks.

So it seems natural that fuzziness should be introduced in neural networks themselves. Any area where humans need to

indulge in making decisions, fuzzy sets can find a place, since information on which decisions are to be based may not always

be complete and the reliability of the supposed values of the underlying parameters is not always certain.

Examples of Fuzzy Logic

Let us say five tasks have to be performed in a given period of time, and each task requires one person dedicated to it.

Suppose there are six people capable of doing these tasks. As you have more than enough people, there is no problem in

scheduling this work and getting it done. Of course who gets assigned to which task depends on some criterion, such as total

time for completion, on which some optimization can be done. But suppose these

six people are not necessarily available during the particular period of time in question.

Suddenly, the equation is seen in less than crisp terms. The availability of the people is fuzzy-valued. Here is an example of an

assignment problem where fuzzy sets can be used.


Commercial Applications

Many commercial uses of fuzzy logic exist today. A few examples are listed here:

• A subway in Sendai, Japan uses a fuzzy controller to control a subway car. This controller has outperformed human and

conventional controllers in giving a smooth ride to passengers in all terrain and external conditions.

• Cameras and camcorders use fuzzy logic to adjust autofocus mechanisms and to cancel the jitter caused by a shaking hand.

• Some automobiles use fuzzy logic for different control applications. Nissan has patents on fuzzy logic braking systems,

transmission controls, and fuel injectors. GM uses a fuzzy transmission system in its Saturn vehicles.

• FuziWare has developed and patented a fuzzy spreadsheet called FuziCalc that allows users to incorporate fuzziness in their

data.

• Software applications to search and match images for certain pixel regions of interest have been developed. Avian Systems

has a software package called FullPixelSearch.

• A stock market charting and research tool called SuperCharts from Omega

Research, uses fuzzy logic in one of its modules to determine whether the market is bullish, bearish, or neutral.


int

elligent Agent : Design and Construction

An agent is anything that can be viewed as

--- perceiving its environment through sensors and

---acting upon that environment through actuators or effectors.


Specifying the task environment (PEAS)

- Performance Measure

- Environment

- Sensors

- Actuators

In designing an agent, the first step must always be to specify the task environment (PEAS) as fully as possible PEAS for an automated taxi driver

- Performance measure: Safe, fast, legal, comfortable trip, maximize profits

- Environment: Roads, other traffic, pedestrians, customers


- Actuators: Steering wheel, accelerator, brake, signal, horn

- Sensors: Cameras, sonar, speedometer, GPS, odometer, engine sensors, keyboard

PEAS for a medical diagnosis system

- Performance measure: Healthy patient, minimize costs, lawsuits

- Environment: Patient, hospital, staff

- Actuators: Screen display (questions, tests, diagnoses, treatments, referrals)

- Sensors: Keyboard (entry of symptoms, findings, patient's answers)

PEAS for Interactive English tutor

- Performance measure: Maximize student's score on test

- Environment: Set of students

- Actuators: Screen display (exercises, suggestions, corrections)

- Sensors: Keyboard

Environment types

The critical decision an agent faces is determining which action to perform to best satisfy its design objectives.

- Accessible vs. Inaccessible

• Deterministic vs. stochastic

• Episodic vs. sequential


• Static vs. dynamic

• Discrete vs. continuous

• Single agent vs. multi agent

Accessible vs. Inaccessible:- An environment is fully accessible if an agent's sensors give it access to the complete state of the environment at each

point in time where as An environment might be inaccessible because of noisy and inaccurate sensors;

• Examples: vacuum cleaner with local dirt sensor, taxi driver

Deterministic vs. stochastic:- The environment is deterministic if the next state of the environment is completely determined by the current state and

the action executed by the agent where as If the environment is partially observable then it could appear to be stochastic

• Examples: Vacuum world is deterministic while taxi driver is not

Episodic vs. sequential:-In episodic environments, the agent's experience is divided into atomic "episodes" (each episode consists of the agent

perceiving and then performing a single action), and the choice of action in each episode depends only on the episode itself. Examples:

classification tasks.

• In sequential environments, the current decision could affect all future decisions. Examples: chess and taxi driver

Static vs. dynamic: The static environment is unchanged while an agent is deliberating where as Dynamic environments continuously ask the agent

what it wants to do.

• Examples: taxi driving is dynamic, crossword puzzles are static.

Discrete vs. continuous: A limited number of distinct, clearly defined states, percepts and actions.

Examples: Chess has discrete set of percepts and actions.

While Taxi driving has continuous states, and actions

Single agent vs. multiagent: An agent operating by itself in an environment is single agent


Examples: Crossword is a single agent while taxi driving is a multi agent environment

Environment types

Task Observable Deterministic Episodic Static Discrete Agents

Environment

Crossword puzzle Fully Deterministic Sequential Static Discrete Single

Chess with a Fully Strategic Sequential Semi Discrete Multi

clock

Poker Partially Stochastic Sequential Static Discrete Multi

Backgammon Fully Stochastic Sequential Static Discrete Multi

Taxi driving Partially Stochastic Sequential Dynamic Continuous Multi

Medical Partially Stochastic Sequential Dynamic Continuous Single

Diagnosis

Image Analysis Fully Deterministic Episodic Semi Continuous Single

Part-picking robot Partially Stochastic Episodic Dynamic Continuous Single

Refinery Partially Stochastic Sequential Dynamic Continuous Single

controller Interactive

Partially Stochastic Sequential Dynamic Discrete MultiEnglish Tutor

• The environment type largely determines the agent design • The real world is (of course) partially observable, stochastic, sequential, dynamic, continuous, multi-agent


Agent types

Four basic types in order to increasing generality:

- Simple reflex agents

- Model-based reflex agents

- Goal based agents

- Utility based agents

Simple reflex agents


Model-based reflex agents


Goal-based agents


Utility-based agents


Learning agents

Agents and Objects

The designers of an object oriented system work towards a common goal where as agents may be built for different and organizations, no such common goal can be assumed.

“Objects invoke, agents request” or as

one researcher said that

“Objects do it for free; agents do it for money”. Agents and Expert systems

Expert system could not be considered as agents. Expert systems typically do not exist in an environment they are disembodied. expert systems do not act on any environment but instead give feedback or advice to a third party. This does not mean that an expert system cannot be an agent. In fact, some real-time (typically process control) expert systems are agents.

What Kinds of Things Can Intelligent Agents Do?

- Search for information automatically

- Answer specific questions

- Inform you when an event has occurred.


- Provide custom news to you on a just-in-time format

- Provide intelligent tutoring

- Find you the best prices on nearly any item

- Provide automatic services, such as checking web pages for changes or broken links

Features of an Agent

- Responsive (explicit: programmed, implicit:learn)

- Predictable

- Interactive (accessible)

- Trustworthy

- Expertise

- Skill

- Quick

- Accurate

Other properties • Mobility: the ability to move around an electronic environment • veracity: an agent will not knowingly communicate false information. • Benevolence: agents do not have conflicting goals and every agent will therefore always try to do what is asked of it. • Rationality: an agent will act in order to achieve its goals insofar as its beliefs permit. • Learning/adaptation: agents improve performance over time

summary

• Agent-based systems technology is a vibrant and rapidly expanding field of academic research and business world applications. • Agent technology is greatly hyped as a panacea for the current ills of system design and development, but the developer is cautioned to be

aware of the pitfalls inherent in any new and untested technology. • The potential is there but the full benefit is yet to be realized. • Much work is yet to be done.

artificial intelligence and expert systmmycsvtunotes.weebly.com/uploads/1/0/1/7/10174835/... ·...

Documents