decision tree learning - wordpress.com...2016/10/03 · decision tree learning decision tree...
TRANSCRIPT
Outline
◘ What is Decision Tree Learning?
◘ What is Decision Tree?
◘ Decision Tree Examples
◘ Decision Trees to Rules
◘ Decision Tree Construction
◘ Decision Tree Algorithms
◘ Decision Tree Overfitting
Paradigms of Machine Learning
Machine
Learning
Neural Network
Genetic Algorithms
Decision Trees
Bayesian Learning
Decision Tree technique is one of the machine learning techniques
Learning Types
Learning
Supervised Learning Unsupervised Learning
Classification
Regression
Clustering
Association AnalysisDecision Tree Learning
Bayesian Learning
Nearest Neighbour
Neural Networks
Support Vector Machines
Sequence Analysis
Summerization
Descriptive Statistics
Outlier Analysis
Scoring
Decision Tree Learning is in the supervised learning type.
Decision Tree Learning
◘ Decision Tree Learning is a method for approximating discrete-
valued target functions, in which the learned function is represented
by a decision tree.
◘ Decision Tree Learning is robust to noisy data and capable of
learning disjunctive expressions.
◘ One of the most widely used method for inductive inference.
Salary < 1 M
Job = teacher
Good
Age < 30
BadBad Good
House Hiring
Decision Tree Representation
◘ Decision Trees classify instances by sorting them down the tree from
the root to some leaf node, which provides the classification of the
instance.
◘ Each node in the tree specifies a test of some attribute of the instance
◘ Each branch descending from that node corresponds to one of the
possible values for this attributes
Decision Trees
◘ Decision Tree is a tree where
– internal nodes are simple decision rules on one or more attributes
– each branch corresponds to an attribute value
– leaf nodes are predicted class labels
◘ Decision trees are used for deciding between several courses of action
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
age?
student? credit rating?
<=30 >40
no yes yes
yes
31..40
FairExcellentYesNo
Attribute
Value
Classification
Desicion Tree Applications
class1
class1class2
class3class5
class3class1
class4
◘ Has been used for
1. Classification
2. Data Reduction
◘ Initial attribute set: {A1, A2, A3, A4, A5, A6}
◘ Reduced attribute set: {A1, A4, A6}
A4 ?
A1? A6?
Class 1 Class 2 Class 1 Class 2
Decision Tree Example
◘ A credit card company receives thousands of applications for new cards. Each application contains information about an applicant,
– age
– marital status
– annual salary
– outstanding debts
– credit rating
– etc.
◘ Problem: to decide whether an application should approved, or to classify applications into two categories, approved and not approved.
Decision Tree Example (Cont)
◘ Construct a classification model from the data
◘ Use the model to classify future loan applications into
– Yes (approved) and
– No (not approved)
◘ What is the class for following case/instance?
Use the Decision Tree (Cont)
No
Once the tree is trained, then a new instance is classified by starting at the root and
following the path as dictated by the test results for this instance.
Decision Tree Example
◘ Problem: decide whether to wait for a table at a restaurant
◘ Attributes:
1. Alternate: is there an alternative restaurant nearby?
2. Bar: is there a comfortable bar area to wait in?
3. Fri/Sat: is today Friday or Saturday?
4. Hungry: are we hungry?
5. Patrons: number of people in the restaurant (None, Some, Full)
6. Price: price range ($, $$, $$$)
7. Raining: is it raining outside?
8. Reservation: have we made a reservation?
9. Type: kind of restaurant (French, Italian, Thai, Burger)
10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)
Decision Trees to Rules
◘ It is easy to derive a rule set from a decision tree
◘ Write a rule for each path in the decision tree from the root to a leaf.
◘ Can be represented as if-then rules
Example:
IF (Outlook=Sunny) (Humidity=High)
THEN PlayTennis = No
Decision Tree
◘ Each node tests some attribute of the instance
◘ Instances are represented by attribute-value pairs
◘ High information gain attributes close to the root
◘ Root: best attribute for classification
Which attribute is the best classifier?
answer based on information gain
Entropy
◘ Entropy specifies the minimum number of bits of information
needed to encode the classification of an arbitrary member of S
◘ In general:
◘ Example for two class labels
m
1ii2i plogp)S(Entropy
222121 plogpplogp)S(Entropy
Information Gain
◘ Measures the expected reduction in entropy given the value of some
attribute A
Values(A): Set of all possible values for attribute A
Si: Subset of S for which attribute A has value v
)S(Entropy|S|
|S|)S(Entropy)A,S(Gain i
i
Ai
Decision Tree Example (Cont.)
940,0)14/5(log)14/5()14/9(log)14/9()( 22 SEntropi
)S(Entropy|S|
|S|)S(Entropy
|S|
|S|)S(Entropy)Wind,S(Gain Strong
Strong
WeakWeak
048,0
0,1*14
6811,0*
14
8940,0
)S(Entropy|S|
|S|)S(Entropy
|S|
|S|)S(Entropy)Huminity,S(Gain Normal
NormalHigh
High
151,0
0,1*14
7985,0*
14
7940,0
Gain(S, Outlook) = 0,246
Gain(S, Temperature) = 0,029
Gain(S, Huminity) = 0,151
Gain(S, Wind) = 0,048
Decision Tree Construction
◘ Which attribute is next?
Outlook
SunnyOvercast Rain
? Yes?
019,0970,0918,0)5/3(0,1)5/2(970,0)Wind,S(Gain Sunny
970,00,0)5/2(0,0)5/3(970,0)Huminity,S(Gain Sunny
570,00)5/1(1)5/2(0)5/2(970,0)eTemperatur,S(Gain Sunny
Another Example
At the weekend:
- go shopping,
- watch a movie,
- play tennis or
- just stay in.
What you do depends on three things:
- the weather (windy, rainy or sunny);
- how much money you have (rich or poor)
- whether your parents are visiting.
height hair eyes class
short blond blue +
tall blond brown -
tall red blue +
short dark blue -
tall dark blue -
tall blond blue +
tall dark brown -
short blond brown -
I(3+, 5-) = -3/8log23/8 – 5/8log25/8 = 0.954434003
Height: short (1+, 2-) tall(2+, 3-)
Gain(height) = 0.954434003 - 3/8*I(1+,2-) - 5/8*I(2+,3-) =
= 0.954434003 – 3/8(-1/3log21/3 - 2/3log22/3) – 5/8(-2/5log22/5 - 3/5log23/5) = 0.003228944
Hair: blond(2+, 2-) red(1+, 0-) dark(0+, 3-)
Gain(hair) = 0.954434003 – 4/8(-2/4log22/4 – 2/4log22/4) – 1/8(-1/1log21/1-0) –
-3/8(0-3/3log23/3) = 0.954434003 – 0.5 = 0.454434003
Eyes: blue(3+, 2-) brown(0+, 3-)
Gain(eyes) = 0.954434003 – 5/8(-3/5log23/5 – 2/5log22/5) -5/8(=
= 0.954434003 - 0.606844122 = 0.347589881
“Hair” is the best attribute.
Another Example
34
height hair eyes class
short blond blue +
tall blond brown -
tall red blue +
short dark blue -
tall dark blue -
tall blond blue +
tall dark brown -
short blond brown - hair
dark red blond
short, dark, blue: -tall, dark, blue: -tall, bark, brown: -
tall, red, blue: + short, blond, blue: +tall, blond, brown: -tall, blond, blue: +short, blond, brown: -
Another Example (Cont.)
Decision Tree Algorithms
◘ ID3
– Quinlan (1981)
– Tries to reduce expected number of comparison
◘ C 4.5
– Quinlan (1993)
– It is an extension of ID3
– Just starting to be used in data mining applications
– Also used for rule induction
◘ CART
– Breiman, Friedman, Olshen, and Stone (1984)
– Classification and Regression Trees
◘ CHAID
– Kass (1980)
– Oldest decision tree algorithm
– Well established in database marketing industry
◘ QUEST
– Loh and Shih (1997)
Complexity of Tree Induction
◘ Assume
– m attributes
– n training instances
– tree depth O (log n)
◘ Building a tree O (m n log n)
◘ Total cost: O (m n log n)
Decision Tree Adv. DisAdv.
Positives (+)
+ Reasonable training time
+ Fast application
+ Easy to interpret
+ Rule extraction from trees
(can be re-represented as if-then-else
rules)
+ Easy to implement
+ Can handle large number of features
+ Does not require any prior knowledge
of data distribution
Negatives (-)
- Cannot handle complicated
relationship between features
- Problems with lots of missing data
- Output attribute must be categorical
- Limited to one output attribute
- Difficulties involving in design an
optimal decision tree
- Overlap especially when the number of
classes is large