data mining assignment
DESCRIPTION
Data mining, Decision tree, classifier.TRANSCRIPT
Course Number: COMP7650Name: Xu YichangStudent ID: 14402513
Assignment I
Part 1: Binary decision tree: (Gini Index)My Data:
One missing value here, use the Imputation: Estimate Missing Values. 1/2 Benz have Hands 0, So I suppose the missing value here is 0.
Here is my new data:
Discretization:
According to the range set given by teacher:
After the Discretization:My data become:
First of all, we ignore the color attribute in all tables below:
Choose the Root Node:
By Gini Index:Class attribute: BrandGini(root)=0.5
Gini split:(1) Seats:Gini(Seat)=0.450(2) CC:Case1:( Low, Medium),(High): Gini(CC1)=0.5Case2:(Low, High), (Medium): Gini(CC2)=0.440Case3: (Medium, High),(Low): Gini(CC3)=0.440(3) Transmission:Gini(Transmission)=0.474(4) Year:Case1: (New, Medium), (Old): Gini(Year1)=0.440Case2:(New, Old), (Medium): Gini(Year2)=0.495Case3:(Medium, Old), (New): Gini(Year3)=0.444(5) HandsCase1: (Low, Medium), (High): Gini(Hands1)=0.490Case2: (Low, High), (Medium): Gini(Hands2)=0.469Case3: (Medium, High), (Low): Gini( Hands3)=0.495(6) PriceCase1: (Expensive, Medium), (Cheap):Gini(Price1)=0.490Case2:(Expensive, Cheap), (Medium): Gini(Price2)=0.444Case3:(Medium, Cheap), (Expensive): Gini(Price3)=0.440
Choose Attribute Price and its splitting (Medium, Cheap), (Expensive)
The Data set table is divided by the Price:
Price=(Medium, Cheap):
Gini=0.32Price=(Expensive):
Gini=0.48
Price=(Medium, Cheap): Ginis Showed below are all the Smallest one of different Cases in each attribute.
Gini split:Attribute: CC:Gini=0.2Attribute: Year:Gini=0.3Attribute: HandsGini=0
The attribute Hands and splitting subset (Low, Medium) gives the minimum Gini index Overall(i.e.0)
Hands=(Low, Medium):
Stop expanding the node, because all the records belong to the same class.
Hands=(High):
Stop expanding the node, because all the records belong to the same class.
Price=(Expensive):Ginis Showed below are all the Smallest one of different Cases in each attribute.
Gini Split:Attribute: TransmissionGini=0.457Attribute: SeatsGini=0.457Attribute: CCGini=0.4Attribute: YearGini=0.429Attribute: HandsGini=0.440
The attribute CC and splitting subset (Medium, High) gives the minimum Gini index Overall (i.e.0.4)