expert systems, 4 th nf association rules cs 157b prof. sin-min lee
TRANSCRIPT
Expert Systems, 4th NFAssociation Rules
CS 157BProf. Sin-Min Lee
COMPONENTS OF AN EXPERT SYSTEM
KNOWLEDGE BASE
KNOWLEDGE REPRESENTATION
. PREDICATE CALCULUS
. LISTS
. FRAMES
. SCRIPTS
. SEMANTIC NETWORKS
. PRODUCTION RULES
Knowledgebase
DynamicData base
Inference engine
Expert system architecture
People and ComputersWHAT COMPUTERS CAN DO BETTER THAN PEOPLE
. Numerical Computation
. Information storage
. Repetitive Operations
. Computers are “Just Machines”
The major application areas of AI
Artificial Intelligence
-GENERAL PROBLEM SOLVING
-EXPERT SYSTEMS
-NATURAL LANGUAGE PROCESSING
-COMPUTER VISION
-ROBOTICS
-COMPUTER AIDED INSTRUCTION
-AUTOMATIC PROGRAMMING
- PLANNING AND DECISION SUPPORT
Heuristics
- Subconscious Heuristic
- Conscious Heuristic
COMPONENTS OF AN EXPERT SYSTEM KNOWLEDGE BASE - PRODUCTION RULES
ADVANTAGES OF USING A PRODUCTION SYSTEM
TO REPRESENT KNOWLEDGE
. Explanation
. Modification
. Understanding
COMPONENTS OF AN EXPERT SYSTEMINFERENCE ENGINE
BLIND SEARCH TECHNIQUES
. BREADTH FIRST OR DEPTH FIRST SEARCH
. FORWARD OR BACKWARD CHAINING
COMPONENTS OF AN EXPERT SYSTEMINFERENCE ENGINE
SEARCH TECHNIQUES. IN PRACTICE A COMBINATION OF THE TWO CHAINNING TECHNIQUES ARE USED . BIDIRECTIONAL SEARCH
. PRUNING THE SEARCH TREE . USE HEURISTIC SEARCH TECHNIQUES
COMPONENTS OF AN EXPERT SYSTEMINFERENCE ENGINE
HEURISTIC SEARCH TECHNIQUES
LIMIT THE SEARCH PROCESS IN AN EFFORT TO REACH THE SOLUTION
FASTER
COMPONENTS OF AN EXPERT SYSTEMINFERENCE ENGINE
HEURISTIC SEARCH TECHNIQUES
. BACKTRACKING
. MINIMAX
. STATIC EVALUATION
An Expert system is a program which has a wide base of knowledge in a restricted domain, and uses complex inferential reasoning to perform tasks which a human expert could do.
Wehbank 1983
SOLUTION
FACTS
INTERPRETER
KNOWLEDGE BASE
DATA
BASIC ARCHITECTURE OF EXPERT SYSTEM
PHYSIOLOGY OF EXPERT SYSTEM
Two methods for triggering ( I.e putting into operation ) rules
. Forward Chaining ( fact-directed reasoning )A process of examining the left part of each rule in turn and applying the rule whenever the conditions for this past are found to hold, the process ends when it ceases to give any new fact.
. Backward Chaining ( goal-directed reasoning )The goal to be attained is given and the right parts of the rule are examined to find which of these include this goal, this sets up new goals which are subgoals for the original goal, and so on, until a known fact is established.
FEATURES OF AN EXPERT SYSTEM
THE PROGRAM SHOULD BE
. DEVELOPED TO MEET A SPECIFIC NEED
. EASY TO USE
. EDUCATIONAL, WHEN APPROPRIATE
. ABLE TO EXPLAIN ITS ADVICE
. ABLE TO RESPOND TO SIMPLE QUESTIONS
. ABLE TO LEARN NEW KNOWLEDGE
KNOWLEDGE IN THE PROGRAM SHOULD BE EASY MODIFIED
. CORRECT ERROR
. ADD NEW INFORMATION
EXAMPLE
R5 IF Z AND L THEN SR1 IF A AND N THEN ER3 IF D OR M THEN ZR2 IF A THEN MR4 IF Q AND (NOT W) AND (NOT Z) THEN NR6 IF L AND M THEN ER7 IF B AND C THEN Q
KNOWN FACTS (FACTS BASE) - (A,L)GOAL TO BE ESTABLISHED = E
FORWARD CHAININGRegard the process as a sequence of iterations through the rules
First iteration R2 and R6 are triggered ( A, L, M)Second iteration R3 is triggered ( A, L, M, E, Z)Third iteration R5 is triggered ( A, L, M, E, Z, S)
The goal E is reached at this stage; as it happens, not further iteration is possible and the process stops here.
BACKWARD CHAINING
B
C
QR7
NOT W
NOT Z
NR4
ATRUE
R1
R6
LTRUE
M
E
GOAL
ATRUE
Knowledge Representation
1- Semantics Networks2- Object- Attribute - Value triplets3- Rules4- Frames
Object - Attribute - ValueTriplets
-- Static knowledge vs. Instances-- Objects are ordered and related-- Handles uncertainty with a certainty factorO - A - V
Frames-- a description of an object that contains slot for all the information associated with the object--slots may contain default values, pointers of other frames, set of rules, or procedures by which values may be obtained.Figure 4.12 Frame for Wilson’s coat
COAT
Slots: Entries:OwnerConditionCondition of cuffsCondition of elbow
Number of armsFabricsPockets?
Size
Style
WilsonRumpledWorm, shinyWorm, shiny
Default: 2Default: woolDefault: yes
If needed, find owner’s height and weight, and compare to Table X.
If needed, find out collar; pockets; pockets and length; the look in table Y
Inference EngineFunctions:
1- Inference:Examines existing facts and rules, and add new facts when possible.
2- Control:Decides the order in which inference are made.
Limitations-- confined to well defined problem, unable to reason over a field of expertise.
-- cannot reason from axioms or general theory.
-- do not learn, limited to use specific facts and heuristics that were “taught” by a human expert.
-- lack common sense, cannot reason by analog
-- performance deteriorates rapidly when problems extend beyond the narrow task they were designed to perform.
ControlDepth-First v.s Breadth-First Search:
-- In a depth-first search, the inference engine takes every opportunity to produce a subgoal, searching for detail first in a depth-first manner.
-- A breadth-first search sweeps across all premises in a rule before digging of greater detail.
Advantages
-- do not display biased judgment-- do not jump to conclusion and then seek to maintain those conclusions in the face if disconfirming evidence.
-- do not have “bad day”
-- always attend to details, always systematically consider all of the possible alternatives
-- equipped with thousands of heuristic rules, able to perform their specialized task better than a human expert.
Multivalued Dependencies
• There are database schemas in BCNF that do not seem to be sufficiently normalized
• Consider a database classes(course, teacher, book)
such that (c,t,b) classes means that t is qualified to teach c, and b is a required textbook for c
• The database is supposed to list for each course the set of teachers any one of which can be the course’s instructor, and the set of books, all of which are required for the course (no matter who teaches it).
• There are no non-trivial functional dependencies and therefore the relation is in BCNF
• Insertion anomalies – i.e., if Sara is a new teacher that can teach database, two tuples need to be inserted
(database, Sara, DB Concepts)(database, Sara, Ullman)
course teacher book
databasedatabasedatabasedatabasedatabasedatabaseoperating systemsoperating systemsoperating systemsoperating systems
AviAviHankHankSudarshanSudarshanAviAvi Jim Jim
DB ConceptsUllmanDB ConceptsUllmanDB ConceptsUllmanOS ConceptsShawOS ConceptsShaw
classes
Multivalued Dependencies
• Therefore, it is better to decompose classes into:
course teacher
databasedatabasedatabaseoperating systemsoperating systems
AviHankSudarshanAvi Jim
teaches
course book
databasedatabaseoperating systemsoperating systems
DB ConceptsUllmanOS ConceptsShaw
text
We shall see that these two relations are in Fourth Normal Form (4NF)
Multivalued Dependencies
What are Association Rules?- Techniques used to detect
associations or relationships between elements in large data sets.
- Show value conditions that occur frequently in a data set.
- Association Rules are basically if-then rules supported by data
- Application of this is called Market Basket Analysis
What are Association Rules?
• Finding frequent patterns, correlations, or associations among a set of items or objects in transaction databases, relational databases, or other information repositories
Applications of Association Rules• Market Basket Analysis• Classification• Clustering• Cross-marketing• Loss-leader Analysis
Characteristics of Association Rules
• Consists of a set of items, the rule body, leading to another item, the rule head
X Y• Association Rules relate the rule
body X to the rule head Y
Itemsets
• itemset = a set of itemsX = {apple, orange} is an itemset
• k-itemset = a set of k itemsX = {apple, orange, banana} = 3-itemset
Rules
• Let I = {i1, i2, …, im} be a set of items
• Let transaction t be a set of items where
t I• Let T be the Transaction Database
or set of transactions where T = {t1, t2, …, tn}.
Rules
• Transaction t contains itemset X which belongs to I, a set of items
• Formal association rule is defined as:
X Y, where X, Y I, and X Y =
Support
• Support is the percentage of transactions that support an association rule.
• It is the percentage of transactions that contain product X and product Y
Support = Probability(X Y).
Confidence• Confidence is the strength or reliability of
an association rule• It is the probability that if there is X in a
transaction, then Y will also be present.
Confidence = Support (X,Y) / Support (X) or
Confidence = Probability (Y | X)
Thresholds
• Minimum Support Threshold and Minimum Confidence Threshold
• The association rule is more valuable if they satisfy these minimum values
• The higher the percentage, the more useful the data is
Uses of Association Rules?
• Retail Shopping• Credit Card transactions• Online purchases• Medical patient histories• Banking services• Insurance claims
Market Basket Analysis• Modeling technique based on theory
that if you buy one group of items, you are also likely to buy another group of items
• In consumer behavior, most purchases are bought on impulse
• Market Basket Analysis seeks to find a relationship between purchases
Uses of Market Basket Analysis• Identify unexpected shopping
patterns• Targeted marketing towards specific
types of people• Predicting customer response rates to
marketing campaigns• Distinguish between profitable and
unprofitable customers
Maximizing Profitability- Arrangement of items
in a store- Planning of specific
sales during times of the year
- Pricing Policy of certain goods
Example-A market has 100 transactions-15 transactions contain product X-Of the 15 transactions, 5
transactions contain product Y
Support = 5/100 or 5%Confidence = 5/15 or 33.3%
Example•Milk + Bread => Butter
X = Milk + Bread
Y = Butter
“Milk + Bread” is the Rule Body and
“Butter” is the Rule Head
Example
- 5% of all transactions will contain combination of Milk, Bread, and Butter.
-If customers buy Milk and Bread, there is a 33.3% possibility that they will buy Butter
Support = 5%Confidence = 33.3%
Mining Algorithms
• Analyze a set of data, looking for patterns or trends
• They differ in the strategy and data structure used
• Their efficiency and memory requirements also differ
Apriori Algorithm
• Most classic and well-known algorithm for association rules
• Useful in databases that contain transactions
• Principle: Any subset of a frequent itemset must be frequent
if {A,B} is a frequent itemset, {A} and {B} must also be frequent itemsets
Apriori Algorithm Method
• Count 1-itemsets, then the 2-itemsets, then the 3-itemsets and so on
• When counting k-itemsets, only consider those itemsets where all subsets of length k have been determined to be frequent in the previous step
• Non frequent itemsets are pruned or discarded
Apriori Algorithm Diagramnull
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDEPruned supersets