decision tree & techniques

1

Decision Tree

Nitin Jain 5VZ07SSZ04

IV Sem Software EngineeringSJCE Mysore

Under the Guidance of

Dr. C.N. Ravi KumarHOD, Dept. of Computer Science

SJCE, Mysore

2

Content

• Decision Tree• The Construction Principal• The Best Split• Splitting Indices• Splitting Criteria• Decision Tree Construction Algorithms• CART• ID3• C4.5

3

Content

• CHAID• Decision Tree Construction with Presorting• Rain Forest• Approximate Method• CLOUDS• BOAT• Pruning Technique• Integration of Pruning and Construction• Conclusion

4

Decision Tree

• It’s a classification scheme which generate a tree and a set of rules, representing the model of different class, from a given data set.

• Training set- -derive classifier

• Test set - -to measure accuracy

5

Eg of decision tree

6

Rules to classify

7

A Decision Tree

8

Rule 1= 50%Rule 2= 50%Rule 3= 66%

9

Example 2

10

Advantage of Decision tree Classification

• Easy to understood

• Can handle both numerical & categorical attribute.

• Provide clear indication of which field is important for predication and classification.

11

Shortcoming of Decision tree Classification

• Some decision tree can deal only with binary-valued.

• The process of growing tree is expensive

12

Principal to Construct tree

• Construction Phase: Initial Decision tree is Constructed in this Phase

• Pruning Phase: In this stage lower branches are removed to improve the performance

• Processing the Pruned tree to improve the performance.

13

OVERFIT

• A tree is set to over fit the training data if there exist some other tree T’ which is a simplification of T, such as T has smaller error then T’ over the training set but T’ has a smaller error than T over the entire distribution of instances.

14

OVERFIT Leads to

• Incorrect model• Requirement of space be increased and more

computational resource,• Require unnecessary collection of features• Difficult to comprehend

• OVERFIT lead to difficulties when there is a noise in the training data, or when the number of training example is too small.

15

The Best Split

• Evaluation of splits for each attribute and the selection of the best split; determination of the splitting attribute

• Determination of the splitting condition on the selected splitting attribute and

• Partition the data using best split.

16

The Best Split

• The splitting depend on domain of the attribute being numerical or categorical.

• The best split is one which does the best job of separating the record in to group, which predominates.

• But how to determine the BEST ONE

17

Splitting INDICES

• ENTROPY

• INFORMATION FOR PARTITION

• GAIN

• GINI INDEX

18

ENTROPY

• Entropy gives an idea of how to split an attribute from a tree.

• ‘yes’ or ‘no’ in our example.

• Entropy provides an information approach to measure the goodness of a split.

• Entropy (P)=-[p1log(p1)+p2log(p2)+…+pnlog(pn)]

19

Example Entropy

20

INFORMATION FOR PARTITION

• If T is partitioned based on the value of non-class attribute X, into set T1,T2,….Tn then the information needed to identify the class of an element T become the weight average of the information to identify the class of the element of Ti, i.e the Weighted Average

21

GAIN

• The information Gain represents the difference between the information needed to identify an element of T and the information needed to identify an element of T after the value of attribute X is obtained.

GAIN (X,T)= Info (T)-Info (X,T)

22

GAIN RATIO

GINI INDEX

23

CART Classification And Regression Tree

• CART (Classification And Regression Tree) is one of the popular methods of building decision trees in the machine learning community.

• CART builds a binary decision tree by splitting the records at each node, according to a function of a single attribute.

• CART uses the gini index for determining the best split.

• The initial split produces two nodes, each of which we now attempt to split in the same manner as the root node

24

CART

• At the end of the tree-growing process, every record of the training set has been assigned to some leaf of the full decision tree. Each leaf can now be assigned a class and an error rate. The error rate of a leaf node is the percentage of incorrect classification at that node. The error rate (If an entire decision tree is a weighted sum of the error rates of all the leaves. Each leaf ‘s contribution to the total is the error rate at that leaf multiplied by the probability that a record will end up in there.

25

ID3

• Quinlan introduced the ID3, Iterative Dichotomizer 3, for constructing the decision trees from data.

• In ID3, each node corresponds to a splitting attribute and each arc is a possible value of that attribute. At each node the spIitting attribute is selected to be the most informative among the attributes not yet considered in the path from the root.

• Entropy is used to measure how informative is a node. This algorithm uses the criterion of information gain to determine the goodness of a split. The attribute with .the greatest information gain is taken as the splitting attribute, and the data set is split for all distinct values of the attribute.

26

C4.5

• C4.5 is an extension of ID3 that accounts for unavailable values, continuous attribute value ranges, pruning of decision trees and rule derivation.

• In building a decision tree, we can deal with training sets that have records with unknown attribute values by evaluating the gain, or the gain ratio, for an attribute by considering only those records where those attribute values are available.

27

CHAID

• CHAID, proposed by Kass in 1980, is a derivative of AID (Automatic Interaction Detection), proposed by Hartigan in 1975. CHAID attempts to stop growing the tree before overfitting occurs, whereas the above algorithms generate a fully grown tree and then carry out pruning as post-processing step. In that sense, CHAID avoids the pruning phase.

28

Rain Forest

• Rainforest, like other algorithms, is a top-down decision tree induction scheme. It is similar to SPRINT, but instead of attribute lists it uses a different structure called the A VC-set. The attribute list has one tuple corresponding to each record of the training data set. But the A VC-set has one entry for each distinct value of an attribute. Thus, the size of the A VC-set for any attribute is much smaller,

29

CLOUDS (Classification of Large or Out-of-core Data Sets)

• CLOUDS is a kind of approximate version of the SPRINT method. It also uses a ,breadth-first strategy to b,uild the decision tree. CLOUDS uses the gini index for evaluating the split index of the attributes. There are different criteria for splitting the categorical and numerical attributes. The method of finding the gini index for the categorical attributes is the same as that used in thc SPRlNT algorithm. Categorical attributes do not require any sorting, but it refers to the count matrix. We discuss the steps in which CLOUDS differs from the SPRINT method

30

BOAT (Bootstrap Optimistic Algorithm for Tree

Construction

• BOAT (Bootstrap Optimistic Algorithm for Tree Construction) is another approximate algorithm based on sampling.

• As the name suggests, it is based on a well-known' statistical principle called boot straping.

• We take a very large sample D from the training data set T, so that D fits into the main memory.

• Now, using the boot straping technique, we take many small samples with replacement of D as D1, D2, •.• , Dk and construct, by any of the known methods, decision trees DT1, DT2, ••• , DTk, respectively, for each of these samples.

31

BOAT

• From these trees, called bootstrap trees, we construct a single decision tree for the sample data set D. We call this tree the sample tree.

32

Pruning Technique

• The decision tree built using the training set, deals correctly with most of the records in the training set. This is inherent to the way it was built.

• Overfitting is one of unavoidable situations that may arise due to such construction. Moreover, if the tree becomes very deep, lopsided or bushy, the rules yielding from the trees become unmanageable and difficult to comprehend. Therefore, a pruning phase is invoked after the construction to arrive at a (sort of) optimal decision tree,

33

COST OF ENCODING TREE

• THE COST OF ENCODING THE STRUCTURE OF THE TREE The structure of the tree can be encoded by using a single bit, in order to specify whether a node of the tree is an internal node or a leaf node. A non-leaf node can be represented by 1 and a leaf node by O.

34

Now that we have formulated the cost of a tree, we next turn our attention to computing the minimum cost subtree

of the tree constructed in the building phase. The main idea is that if minimum cost at a node is the cost of the node, then the splitting of the node would result in a higher cost

and hence the subtree at this node is removed.

35

Integration of Pruning and Construction

• A natural question that arises is whether we can incorporate the pruning step within the construction process of the decision tree. In other words, instead of pruning being a post-processing step after the tree is fully built, whether it is possible to identify a node that need not be expanded right at the time of construction as it is likely to be pruned eventually.

36

SUMMARY: AN IDEAL ALGORITHM

• After studying different strategies for building a decision tree for a large database let us try to build an ideal decision tree algorithm based on these principles. We are given a training data set.

• First of all, we must discover the concept hierarchy of each of the attributes. This technique is essentially similar to dimension modeling.

• This study helps in removing certain attribute from the database.

37

AN IDEAL ALGORITHM

• As we move up this hierarchy, some of the numerical attributes may turn into categorical ones,

• Then we identify the numerical and categorical attribute Just having the numerical data type does not necessarily imply that the attribute i numerical (for example,pincode),

• We take a large sample of the training set so that the sample fits into the memory. Sampling involves select operations (tuple sampling) an project operations (attribute removal), Construct the attribute list for each attribute Construct a generalized attribute list for each attribute, based on dimension hierarch

38

• #include<stdio.h>• void main()• {• int h,t,windy,out;• clrscr();• printf("1 for Sunny\n2 for Overcast\n3 for Rain");• scanf("%d",&out);• printf("ENTER THE TEMP,HUMIDITY");• scanf("%d%d",&t,&h);• printf("\nENTER 1 for WINDY 0 for not windy");• scanf("%d",&windy);• if(out==1 && h<75)• printf("PLAY");• else if(out==1 && h>75)• printf("NO PLAy");• else if(out==2)• printf("PLAY");• else if(out==3 && windy==0)• printf("PLAy");• else if(out==3 && windy==1)• printf("NO PLAY");• getch();• }

39

#include<stdio.h>void main(){int humi,temp,windy;int out;clrscr();printf("Enter 1 for Sunny\n2 for Overcast\n3 for Rain");scanf("%d",&out);

if(out==1){printf("ENTER THE HUMIDITY");scanf("%d",&humi);if (humi>75)

printf("no play");else

printf("play");}

if(out==2){

printf("play");}

if(out==3){printf("enter wind condition 1 for windy 2 for no wind");scanf("%d",windy);if(windy==0)

printf("play");else

printf("no play");}

getch();}

40

BIBLIOGRAPHY• Data Mining & Ware House by Arun K Pujari.• SPRINT: A Scalable Parallel Classifier for Data Mining

John Shafer Rakeeh Agrawal Manish Mehta IBM Almaden Research Center

• Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Introduction to Data Mining by Tan, Steinbach, Kumar

• Advanced Data mining Techniques by David L Olson Dursun Delen

• Data Mining Concept and Techniques by Jiawei Han Micheline Lamber

• Google, Wekipedia Web RESOURCE

41

THNK YOU

decision tree & techniques

Documents