classification and decision tree classifier machine learning

4

Click here to load reader

Upload: francisco-e-figueroa-nigaglioni

Post on 11-Apr-2017

32 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Classification and decision tree classifier machine learning

Marketing Campaign Effectiveness Classification and Decision Tree Classifier

CIS 435 Francisco E. Figueroa

I. Introduction

Classification is a data mining task or function that assign objects to one of several

predefined categories or classes. The classification model encompasese diverse of applications such as identifying load applicants as low, medium or high credit scores, detect spam email messages based on the message header, among other examples. We must consider that the classification model is the middle process where an input of attribute (x) that goes through the classification model to obtain the output of the class label (y). The classification task begins with a data set in which the class assignments are known. The classifications are discrete and do not imply any type of order. If the class label is a continuous attribute, then regression models will be used as predictive model. The simplest type of classification problem is binary, where two possible values are possible. In the case that has more values, then we have a multiclass. (Tan, 2006)

When building the classification model, after preparing the data, the training process is key to the classification algorithm to find the relationships between the values of the predictors and the values of the target. Descriptive modeling support the training process because it serve as an explanatory tool to distinguish between objects of different classes. In the case, of the predictive modeling, is used to predict the class label of unknown records. It’s important to point out that classification techniques are suited for predicting or describing data sets with binary or nominal categories. (SAS,2016)

In general, the classification technique requires a learning algorithm to identify a model that best fits the relationship between the attribute set and the class label of the input data. The objective of the algorithm is to build models with good generalization capability. To solve classification problems we need to use a training set that will be applied to the test set, which consist of records with unknown class labels. The evaluation of the performance of the classification model is based on the confusion matrix.

The classification model has many application in customer segmentation, business modeling, marketing, and credit analysis, among others.

II. Overview of Decision Tree The decision tree is a classifier and is a powerful form to perform multiple variable

analysis. Decision trees are produced by algorithms that identify various ways of splitting a data set into branch-like segments. Multiple variable analyses allow us to predict, explain, describe, or classify an outcome (or target). An example of a multiple variable analysis is a probability of sale or the likelihood to respond to a marketing campaign as a result of the combined effects of multiple input variables, factors, or dimensions. This multiple variable analysis capability of decision trees enables to go beyond simple one-cause, one-effect relationships and to discover and describe things in the context of multiple influences. (SAS,2016)

Page 2: Classification and decision tree classifier machine learning

In a decision tree is created from a series of questions and their possible answers that are organized in a hierarchical structure consisting of nodes and directed edges. The tree has three types of nodes: a) root node - has no incoming edges and zero or more outgoing edges; b) internal nodes - each of which has exactly one incoming edge and two or more outgoing edges; and c) leaf or terminal nodes - each of which has exactly one incoming edge and not outgoing edges.

Efficient algorithms have been developed to induce a reasonably accurate decision trees. The algorithms usually employ a greedy strategy that grows a decision tree by making a series of locally optimum decisions about which attribute to use for partitioning the data. The Hunt’s algorithm is the bases of many existing decision tree induction algorithms.

One of the biggest questions is how to split the training records and when to stop the splitting. The decision induction algorithm must provide a method for expressing an attribute test condition and its corresponding outcomes for different attribute types. There are measures that can be used to determine the best way to split the records. The measures are defined in terms of the class distribution of the record before and after the splitting. The measures developed for selecting the best split are often based on the degree of impurity of the child ones. Examples of impurity measures include the Gini (t) and Entropy(t). (Tan,2006) Entropy is the quantitative measure of disorder in a system. It is used to calculate to find homogeneity in the dataset to divide dataset into several classes. Entropy is used for when node belongs to only one class, then entropy will become zero, when disorder of dataset is high or classes are equally divided then entropy will be maximal and help in making decision at several stages. (Gulati,2016). The information gain ratio reduce the bias of info gain. The Gini index is used by CART and is an impurity measure of dataset. It’s an alternative of information gain. Entropy and Gini are primary factors of measuring data impurity for classification. Entropy is best for categorical attributes and Gini more numeric and continuous attributes. III. Parameters Used for Model Accuracy

The evaluation metrics available for binary classification models are: Accuracy,

Precision, Recall, and AUC. The module outputs a confusion matrix showing the number of true positives, false negatives, false positives, and true negatives, as well as ROC, Precision/Recall, and Lift curves. When you see the accuracy is the proportion of correctly classified instances and it is usually the first metric you look to evaluate a classifier. In the case that the data is is unbalanced (where most of the instances belong to one of the classes), or you are more interested in the performance on either one of the classes, accuracy doesn’t really capture the effectiveness of a classifier.

The precision of the model let us understand which is the proportion of positives that are

classified correctly: TP/(TP+FP). The Recall let us now how many records did the classifier classify correctly TP/(TP+FN) of the classifier. It is interesting that there is a trade-off between precision and recall. Other areas that generates value to the accuracy model is the inspection of the true positive rate vs. the false positive rate in the Receiver Operating Characteristic (ROC) curve and the corresponding Area Under the Curve (AUC) value. The closer this curve is

Page 3: Classification and decision tree classifier machine learning

to the upper left corner, the better the classifier’s performance is (that is maximizing the true positive rate while minimizing the false positive rate). (Azure,2016) IV. Weka Exercises

According to the exercise, we are trying to predict if a client will subscribe to a term deposit. In this case when we apply the training set with all the attributes we obtained the following results: Correctly Classified Instances 4023 88.9847 % Incorrectly Classified Instances 498 11.0153 %

No Yes

No 3838 (TN) 162 (FP)

Yes 336 (FN) 185 (TP)

The Accuracy = (TP + TN ) / (P+N) = (185+3,838)/4,521 = .889. The decision tree has 104 Leaves and the size of the tree is 146. When eliminating contact, day, month, and duration we obtained the following : Correctly Classified Instances 4025 89.029 % Incorrectly Classified Instances 496 10.971 %

No Yes

No 3961 (TN) 39 (FP)

Yes 457 (FN) 64 (TP)

The Accuracy = (TP + TN ) / (P+N) = (64+3,961)/4,521 = .890. The decision tree has 30 leaves and the size of the tree is 42. In summary, the training data when eliminating the contact, day, month, and duration becomes more effective in terms of accuracy and the decision tree is less complex.

V. Use Cases Decision Tree is one of the successful data mining techniques used in the diagnosis of heart disease. Yet its accuracy is not perfect. Most research applies the J4.8 Decision Tree that is based on Gain Ratio and binary discretization. (Showman,2011). Another application is for marketing when a marketing manager at a company needs to analyze a customer with a given profile, who will buy a new item.

Page 4: Classification and decision tree classifier machine learning

References Gulati,P., Sharma, A., Gupta, M. Theorical Study of Decision Tree Algorithms to Identify Pivotal Factors for Performance Improvement: A Review. May 2016. International Journal of Computer Applications. Vol 141 - No. 14. Magee, J. Decision Trees for Decision Making. Microsoft Azure. How to evaluate model performance in Azure Machine Learning. Retrieved from https://azure.microsoft.com/en-us/documentation/articles/machine-learning-evaluate-model-performance/ SAS. Decision Trees - What are They. Retrieved from http://support.sas.com/publishing/pubcat/chaps/57587.pdf Shouman,M. ,Turner T., Stocker R. Using Decision Tree for Diagnosing Heart Disease Patients Retrieved from http://crpit.com/confpapers/CRPITV121Shouman.pdf