data mining assignment

24
Course Number: COMP7650 Name: Xu Yichang Student ID: 14402513 Assignment I Part 1: Binary decision tree: (Gini Index) My Data: One missing value here, use the Imputation: Estimate Missing Values. 1/2 Benz have Hands 0, So I suppose the missing value here is 0. Here is my new data:

Upload: jackxu

Post on 04-Oct-2015

29 views

Category:

Documents


5 download

DESCRIPTION

Data mining, Decision tree, classifier.

TRANSCRIPT

Course Number: COMP7650Name: Xu YichangStudent ID: 14402513

Assignment I

Part 1: Binary decision tree: (Gini Index)My Data:

One missing value here, use the Imputation: Estimate Missing Values. 1/2 Benz have Hands 0, So I suppose the missing value here is 0.

Here is my new data:

Discretization:

According to the range set given by teacher:

After the Discretization:My data become:

First of all, we ignore the color attribute in all tables below:

Choose the Root Node:

By Gini Index:Class attribute: BrandGini(root)=0.5

Gini split:(1) Seats:Gini(Seat)=0.450(2) CC:Case1:( Low, Medium),(High): Gini(CC1)=0.5Case2:(Low, High), (Medium): Gini(CC2)=0.440Case3: (Medium, High),(Low): Gini(CC3)=0.440(3) Transmission:Gini(Transmission)=0.474(4) Year:Case1: (New, Medium), (Old): Gini(Year1)=0.440Case2:(New, Old), (Medium): Gini(Year2)=0.495Case3:(Medium, Old), (New): Gini(Year3)=0.444(5) HandsCase1: (Low, Medium), (High): Gini(Hands1)=0.490Case2: (Low, High), (Medium): Gini(Hands2)=0.469Case3: (Medium, High), (Low): Gini( Hands3)=0.495(6) PriceCase1: (Expensive, Medium), (Cheap):Gini(Price1)=0.490Case2:(Expensive, Cheap), (Medium): Gini(Price2)=0.444Case3:(Medium, Cheap), (Expensive): Gini(Price3)=0.440

Choose Attribute Price and its splitting (Medium, Cheap), (Expensive)

The Data set table is divided by the Price:

Price=(Medium, Cheap):

Gini=0.32Price=(Expensive):

Gini=0.48

Price=(Medium, Cheap): Ginis Showed below are all the Smallest one of different Cases in each attribute.

Gini split:Attribute: CC:Gini=0.2Attribute: Year:Gini=0.3Attribute: HandsGini=0

The attribute Hands and splitting subset (Low, Medium) gives the minimum Gini index Overall(i.e.0)

Hands=(Low, Medium):

Stop expanding the node, because all the records belong to the same class.

Hands=(High):

Stop expanding the node, because all the records belong to the same class.

Price=(Expensive):Ginis Showed below are all the Smallest one of different Cases in each attribute.

Gini Split:Attribute: TransmissionGini=0.457Attribute: SeatsGini=0.457Attribute: CCGini=0.4Attribute: YearGini=0.429Attribute: HandsGini=0.440

The attribute CC and splitting subset (Medium, High) gives the minimum Gini index Overall (i.e.0.4)