machine learning

20
Machine Learning [email protected] Winter 2012

Post on 21-Oct-2014

1.198 views

Category:

Technology


0 download

DESCRIPTION

Overview of common classifiers in machine learning.

TRANSCRIPT

Page 1: Machine Learning

Machine Learning

[email protected] 2012

Page 2: Machine Learning

Machine Learning

Classification: Predicting discrete values Regression: Predicting continuous valuesClustering: Detecting similar groupsOptimization: Finding input that maximizes output

Page 3: Machine Learning

Machine Learning

Classification and Regression

Imagine an omniscient oracle who answers question:

oracle( question ) = answer

Goal: From previous questions and answers, create a function that approximates the oracle

f( question ) -> oracle( question ) as examples -> ∞

Page 4: Machine Learning

ClassificationClassification: Predicting discrete values

ExampleGiven:

Shape Color Width(in) Weight(g) Calories Taste TypeRound Red 4.2 205 73 Sweet AppleRound Green 3.7 145 52 Sour AppleRound Orange 3.2 131 62 Sweet OrangeRound Orange 5.7 181 75 Bitter GrapefruitCylinder Yellow 1.5 140 123 Sweet BananaOval Yellow 2.2 58 17 Sour LemonRound Purple 0.7 2.4 2 Sweet GrapeRound Green 2.0 65 45 Tart KiwiRound Green 8.0 4518 1366 Sweet Watermelon

Predict Type:

Shape Color Width Weight Calories Taste TypeRound Red 5.2 193 78 Bitter ?

Page 5: Machine Learning

ClassificationClassification: Predicting discrete values

Decision TreesInputs: Discrete and ContinuousLabels: nRule: Which leaf of a tree do I belong of a binary search tree?

Support Vector Machines Inputs: Continuous Labels: 2Rule: Which side of a hyperplane am I on?

Nearest Neighbors Inputs: ContinuousLabels: nRule: Who am I closest to?

Naïve BayesInputs: Discrete and ContinuousLabels: nRule: What am I most likely to be?

Neural NetworksInputs: ContinuousLabels: nRule: Which node do I map to after moving through a weighted network?

Page 6: Machine Learning

ClassificationClassification: Predicting discrete values

Decision TreesInputs: Discrete and ContinuousLabels: nRule: Which leaf of a tree do I belong of a binary search tree?

Nearest Neighbors Inputs: ContinuousLabels: nRule: Who am I closest to?

Naïve BayesInputs: Discrete and ContinuousLabels: nRule: What am I most likely to be?

Neural NetworksInputs: ContinuousLabels: nRule: Which node do I map to after moving through a weighted network?

Page 7: Machine Learning

ClassificationClassification: Predicting discrete values

Support Vector Machines Inputs: ContinuousLabels: 2Rule: Which side of a hyperplane am I on?

Page 8: Machine Learning

ClassificationClassification: Predicting discrete values

Support Vector Machines Inputs: ContinuousLabels: 2Rule: Which side of a hyperplane am I on?

Page 9: Machine Learning

ClassificationClassification: Predicting discrete values

Support Vector Machines Inputs: ContinuousLabels: 2Rule: Which side of a hyperplane am I on?

Kernel Trick: You can compute distances in higher dimensions, even infinite, without actually moving there. (Mercer’s condition)

Page 10: Machine Learning

ClassificationClassification: Predicting discrete values

Support Vector Machines Inputs: ContinuousLabels: 2Rule: Which side of a hyperplane am I on?

Common Kernels:

Page 11: Machine Learning

ClassificationClassification: Predicting discrete values

Support Vector Machines Inputs: ContinuousLabels: 2Rule: Which side of a hyperplane am I on?

Kernel: Homogenous Polynomial k(x1 x2)=(x1 x2)**2∙ ∙

Page 12: Machine Learning

ClassificationClassification: Predicting discrete values

Naïve BayesInputs: Discrete and ContinuousLabels: nRule: What am I most likely to be?

Naïve Assumption:

Page 13: Machine Learning

ClassificationClassification: Predicting discrete values

How the classifiers see the same data:

Decision Trees Support Vector Machines Naïve Bayes

Page 14: Machine Learning

Classifier Ensembles

An ensemble is a collection of classifiers, each trained on a different subset of the training data. At prediction time, the classifiers vote on the correct label. The result is a probability associated with each label based on proportion of classifiers that voted for them, and one often takes the label with highest probability.

Voting strategies:

Bagging Multiple classifiers vote of the correct label, all votes counted equally

BoostingMultiple classifiers vote but the votes are weighted by their error rate on a reserved test set

Page 15: Machine Learning

Random Forests

A Random Forest is an ensemble of decision trees. Random Forests have been proven to extract all information possible from a dataset and, in fact, converge to the oracle.

Gerard BiauEcole Normale SuperieureAnalysis of a Random Forests ModelJournal of Machine Learning Research (2012)

“Despite growing interest and practical use, there has been little exploration of the statistical properties of random forests, and little is known about the mathematical forces driving the algorithm.

In this paper, we […] show that the procedure is consistent and adapts to sparsity, in the sense that [the] rate of convergence depends only on the number of strong features and not on how many noise variables are present. “

Page 16: Machine Learning

Naïve Bayes

Naïve Bayes classifiers appear to give good results in practice even though they are based on a potentially unrealistic assumption. Recently, researches have been attempting to explain the conditions that lead to successful Naïve Bayes classifiers.

Harry ZhangUniversity of New BrunswickThe Optimality of Naïve BayesAmerican Association for Artificial Intelligence (2004)

“In a given dataset, two attributes may depend on each other, but the dependence may distribute evenly in each class. Clearly, in this case, the conditional independence assumption is violated, but Naïve Bayes is still the optimal classifier.

Furthermore […] if we look at two attributes, there may exist strong dependence between them that affects the classification. When the dependencies among all attributes work together, however, they may cancel each other out and no longer affect the classification.“

Page 17: Machine Learning

Choosing a Classifier

Random ForestsProsHighly convergentIgnores noiseConsWhole forest must be retrained (batch)Time to classify depends on number of trees

Naïve BayesProsFast to train Fast to classifyIncremental updates (stream)Evaluates in O(1) timeSeem to work in practiceConsDoes not capture feature covarianceContinuous inputs need distribution estimation

Page 18: Machine Learning

LinksNaïve Bayes in 50 lines of Pythonhttp://ebiquity.umbc.edu/blogger/2010/12/07/naive-bayes-classifier-in-50-lines/

NIST Special Database 19 - Handwriting Sampleshttp://gorillamatrix.com/files/nist-sd19.rar

Analysis of a Random Forests Modelhttp://jmlr.csail.mit.edu/papers/volume13/biau12a/biau12a.pdf

The Optimality of Naïve Bayeshttp://courses.ischool.berkeley.edu/i290-dm/s11/SECURE/Optimality_of_Naive_Bayes.pdf

Apache Mahouthttp://mahout.apache.org/

Programming Collective Intelligence http://shop.oreilly.com/product/9780596529321.do

Page 19: Machine Learning

Thank you

Page 20: Machine Learning

“Vision without action is a daydream. Action without vision is a nightmare.”

- Japanese Proverb