classification based machine learning algorithms

41
Classification Based Machine Learning Algorithms Md Main Uddin Rony, Software Engineer . 1

Upload: md-main-uddin-rony

Post on 16-Apr-2017

153 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Classification Based Machine Learning Algorithms

Classification Based Machine Learning AlgorithmsMd Main Uddin Rony, Software Engineer.

1

Page 2: Classification Based Machine Learning Algorithms

What is Classification?Classification is a data mining task of predicting the value of a categorical variable (target or class)

This is done by building a model based on one or more numerical and/or categorical variables ( predictors, attributes or features)

Considered an instance of supervised learning

Corresponding unsupervised procedure is known as clustering

2

Page 3: Classification Based Machine Learning Algorithms

Classification Based Algorithms

Four main groups of classification algorithms are:

● Frequency Table- ZeroR- OneR- Naive Bayesian- Decision Tree

● Covariance Matrix- Linear Discriminant Analysis- Logistic Regression

● Similarity Functions- K Nearest Neighbours

● Others- Artificial Neural Network- Support Vector Machine

3

Page 4: Classification Based Machine Learning Algorithms

4

Naive Bayes Classifier● Works based on Bayes’ theorem● Why its is called Naive?

- Because it assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature

● Easy to build● Useful for very large data sets

Page 5: Classification Based Machine Learning Algorithms

Bayes’ Theorem

The theorem can be stated mathematically as follow:

P(A) and P(B) are the probabilities of observing A and B without regard to each other. Also known as Prior Probability.

P(A | B), a conditional (Posterior) probability, is the probability of observing event A given that B is true.

P(B | A) is the conditional (Posterior)probability of observing event B given that A is true.

So, how does naive bayes classifier work based on this?

5

Page 6: Classification Based Machine Learning Algorithms

How Naive Bayes works?● Let D be a training set of tuples and each tuple is represented by n-dimensional

attribute vector, X = ( x1, x2, ….., xn)● Suppose that there are m classes, C1, C2,...., Cm. Given a tuple, X, the classifier will

predict that X belongs to the class having the highest posterior probability, conditioned on X. That is, the Naive Bayesian classifier predicts that tuple X belongs to the class Ci if and only if

● By Bayes’ theorem

● P(X) is constant for all classes, only needs to be maximized

6

Page 7: Classification Based Machine Learning Algorithms

How Naive Bayes works? (Contd.)● To reduce computation in evaluating , the naive assumption of

class-conditional independence is made. This presumes that the attributes’ values are conditionally independent of one another, given the class label of the tuple (i.e., that there are no dependence relationships among the attributes). This assumption is called class conditional independence.

● Thus,

7

Page 8: Classification Based Machine Learning Algorithms

How Naive Bayes Works? (Hands on Calculation)

Given all the previous patient's symptoms and diagnosis

Does the patient with the following symptoms have the flu?

8

chills runny nose headache fever flu?

Y N Mild Y N

Y Y No N Y

Y N Strong Y Y

N Y Mild Y Y

N N No N N

N Y Strong Y Y

N Y Strong N N

Y Y Mild Y Y

chills runny nose headache fever flu?

Y N Mild Y ?

Page 9: Classification Based Machine Learning Algorithms

How Naive Bayes Works? (Hands on Calculation)Contd.

First, we compute all possible individual probabilities conditioned on the target attribute (flu).

9

P(Flu=Y) 0.625 P(Flu=N) 0.375

P(chills=Y|flu=Y) 0.6 P(chills=Y|flu=N) 0.333

P(chills=N|flu=Y) 0.4 P(chills=N|flu=N) 0.666

P(runny nose=Y|flu=Y) 0.8 P(runny nose=Y|flu=N) 0.333

P(runny nose=N|flu=Y) 0.2 P(runny nose=N|flu=N) 0.666

P(headache=Mild|flu=Y) 0.4 P(headache=Mild|flu=N) 0.333

P(headache=No|flu=Y) 0.2 P(headache=No|flu=N) 0.333

P(headache=Strong|flu=Y) 0.4 P(headache=Strong|flu=N) 0.333

P(fever=Y|flu=Y) 0.8 P(fever=Y|flu=N) 0.333

P(fever=N|flu=Y) 0.2 P(fever=N|flu=N) 0.666

Page 10: Classification Based Machine Learning Algorithms

How Naive Bayes Works? (Hands on Calculation)Contd.

And then decide:

P(flu=Y|Given attribute) = P(chills = Y|flu=Y).P(runny nose = N|flu=Y).P(headache = Mild|flu=Y).P(fever = N|flu=Y).P(flu=Y)= 0.6 * 0.2 * 0.4 * 0.2 * 0.625= 0.006

VS

P(flu=N|Given attribute) = P(chills = Y|flu=N).P(runny nose = N|flu=N).P(headache = Mild|flu=N).P(fever = N|flu=N).P(flu=N)= 0.333 * 0.666 * 0.333 * 0.666 * 0.375= 0.0184

So, Naive Bayes classifier predicts that the patient doesn’t have the flu.

10

Page 11: Classification Based Machine Learning Algorithms

The Decision Tree Classifier

11

Page 12: Classification Based Machine Learning Algorithms

Decision Tree● Decision tree builds classification or regression models in the form of a

tree structure● It breaks down a dataset into smaller and smaller subsets while at the

same time an associated decision tree is incrementally developed.● The final result is a tree with decision nodes and leaf nodes.

- A decision node has two or more branches- Leaf node represents a classification or decision

● The topmost decision node in a tree which corresponds to the best predictor called root node

● Decision trees can handle both categorical and numerical data

12

Page 13: Classification Based Machine Learning Algorithms

Example Set we will work on...

13

Outlook Temp Humidity Windy Play Golf

Rainy Hot High False No

Rainy Hot High True No

Overcast Hot High False Yes

Sunny Mild High False Yes

Sunny Cool Normal False Yes

Sunny Cool Normal True No

Overcast Cool Normal True Yes

Rainy Mild High False No

Rainy Cool Normal False Yes

Sunny Mild Normal False Yes

Rainy Mild Normal True Yes

Overcoast Mild High True Yes

Overcoast Hot Normal False Yes

Sunny Mild High True No

Page 14: Classification Based Machine Learning Algorithms

So, our tree looks like this...

14

Page 15: Classification Based Machine Learning Algorithms

How it works

● The core algorithm for building decision trees called ID3 by J. R. Quinlan

● ID3 uses Entropy and Information Gain to construct a decision tree

15

Page 16: Classification Based Machine Learning Algorithms

Entr

opy ● A decision tree is built top-down from a root node and

involves partitioning the data into subsets that contain instances with similar values (homogeneous)

● ID3 algorithm uses entropy to calculate the homogeneity of a sample

● If the sample is completely homogeneous the entropy is zero and if the sample is an equally divided it has entropy of one

16

Page 17: Classification Based Machine Learning Algorithms

Compute Two Types of Entropy

● To build a decision tree, we need to calculate two types of entropy using frequency tables as follows:

● a) Entropy using the frequency table of one attribute(Entropy of the Target):

17

Page 18: Classification Based Machine Learning Algorithms

● b) Entropy using the frequency table of two attributes:

18

Page 19: Classification Based Machine Learning Algorithms

Information Gain

● The information gain is based on the decrease in entropy after a dataset is split on an attribute

● Constructing a decision tree is all about finding attribute that returns the highest information gain (i.e., the most homogeneous branches)

19

Page 20: Classification Based Machine Learning Algorithms

Example ● Step 1: Calculate entropy of the target

20

Page 21: Classification Based Machine Learning Algorithms

Example

● Step 2: The dataset is then split on the different attributes.The entropy for each branch is calculated.

● Then it is added proportionally, to get total entropy for the split.

● The resulting entropy is subtracted from the entropy before the split.

● The result is the Information Gain, or decrease in entropy

21

Page 22: Classification Based Machine Learning Algorithms

Example

22

Page 23: Classification Based Machine Learning Algorithms

Example● Step 3: Choose attribute with the largest information gain as the decision

node

23

Page 24: Classification Based Machine Learning Algorithms

Example● Step 4a: A branch with entropy of 0 is a leaf node.

24

Page 25: Classification Based Machine Learning Algorithms

Example● Step 4b: A branch with entropy more than 0 needs further splitting.

25

Page 26: Classification Based Machine Learning Algorithms

Example● Step 5: The ID3 algorithm is run

recursively on the non-leaf branches, until all data is classified.

26

Page 27: Classification Based Machine Learning Algorithms

Decision Tree to Decision Rules

● A decision tree can easily be transformed to a set of rules by mapping from the root node to leaf nodes one by one

27

Page 28: Classification Based Machine Learning Algorithms

Any idea about Random Forest??

After all, Forests are made of trees….

28

Page 29: Classification Based Machine Learning Algorithms

K Nearest Neighbors Classification

29

Page 30: Classification Based Machine Learning Algorithms

k-NN Algorithm● K nearest neighbors is a simple algorithm that stores all available cases

and classifies new cases based on a similarity measure (e.g., distance functions)

● KNN has been used in statistical estimation and pattern recognition already in the beginning of 1970’s

● A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function

● If k =1 , the what will it do?

30

Page 31: Classification Based Machine Learning Algorithms

Diagram

31

Page 32: Classification Based Machine Learning Algorithms

Distance measures for cont. variables

32

Page 33: Classification Based Machine Learning Algorithms

How many neighbors?

● Choosing the optimal value for K is best done by first inspecting the data

● In general, a large K value is more precise as it reduces the overall noise but there is no guarantee

● Cross-validation is another way to retrospectively determine a good K value by using an independent dataset to validate the K value

● Historically, the optimal K for most datasets has been between 3-10. That produces much better results than 1NN

33

Page 34: Classification Based Machine Learning Algorithms

Example● Consider the following data concerning credit default. Age and Loan are

two numerical variables (predictors) and Default is the target

34

Page 35: Classification Based Machine Learning Algorithms

Example ● We can now use the training set to classify an unknown case (Age=48 and Loan=$142,000) using Euclidean distance.

● If K=1 then the nearest neighbor is the last case in the training set with Default=Y

● D = Sqrt[(48-33)^2 + (142000-150000)^2] = 8000.01 >> Default=Y

● With K=3, there are two Default=Y and one Default=N out of three closest neighbors. The prediction for the unknown case is again Default=Y

35

Page 36: Classification Based Machine Learning Algorithms

Standardized Distance

● One major drawback in calculating distance measures directly from the training set is in the case where variables have different measurement scales or there is a mixture of numerical and categorical variables.

● For example, if one variable is based on annual income in dollars, and the other is based on age in years then income will have a much higher influence on the distance calculated.

● One solution is to standardize the training set

36

Page 37: Classification Based Machine Learning Algorithms

Standardized Distance

Using the standardized distance on the same training set, the unknown case returned a different neighbor which is not a good sign of robustness.

37

Page 38: Classification Based Machine Learning Algorithms

Some Confusions...

What will happen if k equals to multiple of category or label type?

What will happen if k = 1?

What will happen if we take k’s value equal to dataset size?

38

Page 39: Classification Based Machine Learning Algorithms

Acknowledgements...Contents are borrowed from…

1. Data Mining Concepts and Techniques By Jiawei Han, Micheline Kamber, Jian Pei

2. Naive Bayes Example (Youtube Video) By Francisco Icaobelli (https://www.youtube.com/watch?v=ZAfarappAO0)

3. Predicting the Future ClassificationPresented By: Dr. Noureddin Sadawi (https://github.com/nsadawi/DataMiningSlides/blob/master/Slides.pdf)

39

Page 40: Classification Based Machine Learning Algorithms

Questions??

40

Page 41: Classification Based Machine Learning Algorithms

Then Thanks...

41