expert systems, 4 th nf association rules cs 157b prof. sin-min lee

Post on 03-Jan-2016

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Expert Systems, 4th NFAssociation Rules

CS 157BProf. Sin-Min Lee

COMPONENTS OF AN EXPERT SYSTEM

KNOWLEDGE BASE

KNOWLEDGE REPRESENTATION

. PREDICATE CALCULUS

. LISTS

. FRAMES

. SCRIPTS

. SEMANTIC NETWORKS

. PRODUCTION RULES

Knowledgebase

DynamicData base

Inference engine

Expert system architecture

People and ComputersWHAT COMPUTERS CAN DO BETTER THAN PEOPLE

. Numerical Computation

. Information storage

. Repetitive Operations

. Computers are “Just Machines”

The major application areas of AI

Artificial Intelligence

-GENERAL PROBLEM SOLVING

-EXPERT SYSTEMS

-NATURAL LANGUAGE PROCESSING

-COMPUTER VISION

-ROBOTICS

-COMPUTER AIDED INSTRUCTION

-AUTOMATIC PROGRAMMING

- PLANNING AND DECISION SUPPORT

Heuristics

- Subconscious Heuristic

- Conscious Heuristic

COMPONENTS OF AN EXPERT SYSTEM KNOWLEDGE BASE - PRODUCTION RULES

ADVANTAGES OF USING A PRODUCTION SYSTEM

TO REPRESENT KNOWLEDGE

. Explanation

. Modification

. Understanding

COMPONENTS OF AN EXPERT SYSTEMINFERENCE ENGINE

BLIND SEARCH TECHNIQUES

. BREADTH FIRST OR DEPTH FIRST SEARCH

. FORWARD OR BACKWARD CHAINING

COMPONENTS OF AN EXPERT SYSTEMINFERENCE ENGINE

SEARCH TECHNIQUES. IN PRACTICE A COMBINATION OF THE TWO CHAINNING TECHNIQUES ARE USED . BIDIRECTIONAL SEARCH

. PRUNING THE SEARCH TREE . USE HEURISTIC SEARCH TECHNIQUES

COMPONENTS OF AN EXPERT SYSTEMINFERENCE ENGINE

HEURISTIC SEARCH TECHNIQUES

LIMIT THE SEARCH PROCESS IN AN EFFORT TO REACH THE SOLUTION

FASTER

COMPONENTS OF AN EXPERT SYSTEMINFERENCE ENGINE

HEURISTIC SEARCH TECHNIQUES

. BACKTRACKING

. MINIMAX

. STATIC EVALUATION

An Expert system is a program which has a wide base of knowledge in a restricted domain, and uses complex inferential reasoning to perform tasks which a human expert could do.

Wehbank 1983

SOLUTION

FACTS

INTERPRETER

KNOWLEDGE BASE

DATA

BASIC ARCHITECTURE OF EXPERT SYSTEM

PHYSIOLOGY OF EXPERT SYSTEM

Two methods for triggering ( I.e putting into operation ) rules

. Forward Chaining ( fact-directed reasoning )A process of examining the left part of each rule in turn and applying the rule whenever the conditions for this past are found to hold, the process ends when it ceases to give any new fact.

. Backward Chaining ( goal-directed reasoning )The goal to be attained is given and the right parts of the rule are examined to find which of these include this goal, this sets up new goals which are subgoals for the original goal, and so on, until a known fact is established.

FEATURES OF AN EXPERT SYSTEM

THE PROGRAM SHOULD BE

. DEVELOPED TO MEET A SPECIFIC NEED

. EASY TO USE

. EDUCATIONAL, WHEN APPROPRIATE

. ABLE TO EXPLAIN ITS ADVICE

. ABLE TO RESPOND TO SIMPLE QUESTIONS

. ABLE TO LEARN NEW KNOWLEDGE

KNOWLEDGE IN THE PROGRAM SHOULD BE EASY MODIFIED

. CORRECT ERROR

. ADD NEW INFORMATION

EXAMPLE

R5 IF Z AND L THEN SR1 IF A AND N THEN ER3 IF D OR M THEN ZR2 IF A THEN MR4 IF Q AND (NOT W) AND (NOT Z) THEN NR6 IF L AND M THEN ER7 IF B AND C THEN Q

KNOWN FACTS (FACTS BASE) - (A,L)GOAL TO BE ESTABLISHED = E

FORWARD CHAININGRegard the process as a sequence of iterations through the rules

First iteration R2 and R6 are triggered ( A, L, M)Second iteration R3 is triggered ( A, L, M, E, Z)Third iteration R5 is triggered ( A, L, M, E, Z, S)

The goal E is reached at this stage; as it happens, not further iteration is possible and the process stops here.

BACKWARD CHAINING

B

C

QR7

NOT W

NOT Z

NR4

ATRUE

R1

R6

LTRUE

M

E

GOAL

ATRUE

Knowledge Representation

1- Semantics Networks2- Object- Attribute - Value triplets3- Rules4- Frames

Object - Attribute - ValueTriplets

-- Static knowledge vs. Instances-- Objects are ordered and related-- Handles uncertainty with a certainty factorO - A - V

Frames-- a description of an object that contains slot for all the information associated with the object--slots may contain default values, pointers of other frames, set of rules, or procedures by which values may be obtained.Figure 4.12 Frame for Wilson’s coat

COAT

Slots: Entries:OwnerConditionCondition of cuffsCondition of elbow

Number of armsFabricsPockets?

Size

Style

WilsonRumpledWorm, shinyWorm, shiny

Default: 2Default: woolDefault: yes

If needed, find owner’s height and weight, and compare to Table X.

If needed, find out collar; pockets; pockets and length; the look in table Y

Inference EngineFunctions:

1- Inference:Examines existing facts and rules, and add new facts when possible.

2- Control:Decides the order in which inference are made.

Limitations-- confined to well defined problem, unable to reason over a field of expertise.

-- cannot reason from axioms or general theory.

-- do not learn, limited to use specific facts and heuristics that were “taught” by a human expert.

-- lack common sense, cannot reason by analog

-- performance deteriorates rapidly when problems extend beyond the narrow task they were designed to perform.

ControlDepth-First v.s Breadth-First Search:

-- In a depth-first search, the inference engine takes every opportunity to produce a subgoal, searching for detail first in a depth-first manner.

-- A breadth-first search sweeps across all premises in a rule before digging of greater detail.

Advantages

-- do not display biased judgment-- do not jump to conclusion and then seek to maintain those conclusions in the face if disconfirming evidence.

-- do not have “bad day”

-- always attend to details, always systematically consider all of the possible alternatives

-- equipped with thousands of heuristic rules, able to perform their specialized task better than a human expert.

Multivalued Dependencies

• There are database schemas in BCNF that do not seem to be sufficiently normalized

• Consider a database classes(course, teacher, book)

such that (c,t,b) classes means that t is qualified to teach c, and b is a required textbook for c

• The database is supposed to list for each course the set of teachers any one of which can be the course’s instructor, and the set of books, all of which are required for the course (no matter who teaches it).

• There are no non-trivial functional dependencies and therefore the relation is in BCNF

• Insertion anomalies – i.e., if Sara is a new teacher that can teach database, two tuples need to be inserted

(database, Sara, DB Concepts)(database, Sara, Ullman)

course teacher book

databasedatabasedatabasedatabasedatabasedatabaseoperating systemsoperating systemsoperating systemsoperating systems

AviAviHankHankSudarshanSudarshanAviAvi Jim Jim

DB ConceptsUllmanDB ConceptsUllmanDB ConceptsUllmanOS ConceptsShawOS ConceptsShaw

classes

Multivalued Dependencies

• Therefore, it is better to decompose classes into:

course teacher

databasedatabasedatabaseoperating systemsoperating systems

AviHankSudarshanAvi Jim

teaches

course book

databasedatabaseoperating systemsoperating systems

DB ConceptsUllmanOS ConceptsShaw

text

We shall see that these two relations are in Fourth Normal Form (4NF)

Multivalued Dependencies

What are Association Rules?- Techniques used to detect

associations or relationships between elements in large data sets.

- Show value conditions that occur frequently in a data set.

- Association Rules are basically if-then rules supported by data

- Application of this is called Market Basket Analysis

What are Association Rules?

• Finding frequent patterns, correlations, or associations among a set of items or objects in transaction databases, relational databases, or other information repositories

Applications of Association Rules• Market Basket Analysis• Classification• Clustering• Cross-marketing• Loss-leader Analysis

Characteristics of Association Rules

• Consists of a set of items, the rule body, leading to another item, the rule head

X Y• Association Rules relate the rule

body X to the rule head Y

Itemsets

• itemset = a set of itemsX = {apple, orange} is an itemset

• k-itemset = a set of k itemsX = {apple, orange, banana} = 3-itemset

Rules

• Let I = {i1, i2, …, im} be a set of items

• Let transaction t be a set of items where

t I• Let T be the Transaction Database

or set of transactions where T = {t1, t2, …, tn}.

Rules

• Transaction t contains itemset X which belongs to I, a set of items

• Formal association rule is defined as:

X Y, where X, Y I, and X Y =

Support

• Support is the percentage of transactions that support an association rule.

• It is the percentage of transactions that contain product X and product Y

Support = Probability(X Y).

Confidence• Confidence is the strength or reliability of

an association rule• It is the probability that if there is X in a

transaction, then Y will also be present.

Confidence = Support (X,Y) / Support (X) or

Confidence = Probability (Y | X)

Thresholds

• Minimum Support Threshold and Minimum Confidence Threshold

• The association rule is more valuable if they satisfy these minimum values

• The higher the percentage, the more useful the data is

Uses of Association Rules?

• Retail Shopping• Credit Card transactions• Online purchases• Medical patient histories• Banking services• Insurance claims

Market Basket Analysis• Modeling technique based on theory

that if you buy one group of items, you are also likely to buy another group of items

• In consumer behavior, most purchases are bought on impulse

• Market Basket Analysis seeks to find a relationship between purchases

Uses of Market Basket Analysis• Identify unexpected shopping

patterns• Targeted marketing towards specific

types of people• Predicting customer response rates to

marketing campaigns• Distinguish between profitable and

unprofitable customers

Maximizing Profitability- Arrangement of items

in a store- Planning of specific

sales during times of the year

- Pricing Policy of certain goods

Example-A market has 100 transactions-15 transactions contain product X-Of the 15 transactions, 5

transactions contain product Y

Support = 5/100 or 5%Confidence = 5/15 or 33.3%

Example•Milk + Bread => Butter

X = Milk + Bread

Y = Butter

“Milk + Bread” is the Rule Body and

“Butter” is the Rule Head

Example

- 5% of all transactions will contain combination of Milk, Bread, and Butter.

-If customers buy Milk and Bread, there is a 33.3% possibility that they will buy Butter

Support = 5%Confidence = 33.3%

Mining Algorithms

• Analyze a set of data, looking for patterns or trends

• They differ in the strategy and data structure used

• Their efficiency and memory requirements also differ

Apriori Algorithm

• Most classic and well-known algorithm for association rules

• Useful in databases that contain transactions

• Principle: Any subset of a frequent itemset must be frequent

if {A,B} is a frequent itemset, {A} and {B} must also be frequent itemsets

Apriori Algorithm Method

• Count 1-itemsets, then the 2-itemsets, then the 3-itemsets and so on

• When counting k-itemsets, only consider those itemsets where all subsets of length k have been determined to be frequent in the previous step

• Non frequent itemsets are pruned or discarded

Apriori Algorithm Diagramnull

AB AC AD AE BC BD BE CD CE DE

A B C D E

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDEPruned supersets

top related