Transcript
Page 1: Machine Learning : Foundations

Machine Learning: Foundations

Yishay MansourTel-Aviv University

Page 2: Machine Learning : Foundations

Typical Applications

• Classification/Clustering problems:– Data mining– Information Retrieval– Web search

• Self-customization– news– mail

Page 3: Machine Learning : Foundations

Typical Applications

• Control– Robots– Dialog system

• Complicated software– driving a car– playing a game (backgammon)

Page 4: Machine Learning : Foundations

Why Now?

• Technology ready:– Algorithms and theory.

• Information abundant:– Flood of data (online)

• Computational power– Sophisticated techniques

• Industry and consumer needs.

Page 5: Machine Learning : Foundations

Example 1: Credit Risk Analysis

• Typical customer: bank.• Database:

– Current clients data, including:– basic profile (income, house ownership,

delinquent account, etc.)– Basic classification.

• Goal: predict/decide whether to grant credit.

Page 6: Machine Learning : Foundations

Example 1: Credit Risk Analysis

• Rules learned from data:IF Other-Delinquent-Accounts > 2 and Number-Delinquent-Billing-Cycles >1THEN DENAY CREDITIF Other-Delinquent-Accounts = 0 and Income > $30kTHEN GRANT CREDIT

Page 7: Machine Learning : Foundations

Example 2: Clustering news

• Data: Reuters news / Web data• Goal: Basic category classification:

– Business, sports, politics, etc.– classify to subcategories (unspecified)

• Methodology:– consider “typical words” for each category.– Classify using a “distance “ measure.

Page 8: Machine Learning : Foundations

Example 3: Robot control

• Goal: Control a robot in an unknown environment.

• Needs both – to explore (new places and action)– to use acquired knowledge to gain benefits.

• Learning task “control” what is observes!

Page 9: Machine Learning : Foundations
Page 10: Machine Learning : Foundations

A Glimpse in to the future

• Today status:– First-generation algorithms:– Neural nets, decision trees, etc.

• Well-formed data-bases• Future:

– many more problems:– networking, control, software.– Main advantage is flexibility!

Page 11: Machine Learning : Foundations

Relevant Disciplines

• Artificial intelligence• Statistics• Computational learning theory• Control theory• Information Theory• Philosophy• Psychology and neurobiology.

Page 12: Machine Learning : Foundations

Type of models

• Supervised learning– Given access to classified data

• Unsupervised learning– Given access to data, but no classification

• Control learning– Selects actions and observes consequences.– Maximizes long-term cumulative return.

Page 13: Machine Learning : Foundations

Learning: Complete Information

• Probability D1 over and probability D2 for

• Equally likely.• Computing the

probability of “smiley” given a point (x,y).

• Use Bayes formula.• Let p be the probability.

Page 14: Machine Learning : Foundations

Predictions and Loss Model

• Boolean Error– Predict a Boolean value.– each error we lose 1 (no error no loss.)– Compare the probability p to 1/2.– Predict deterministically with the higher value.– Optimal prediction (for this loss)

• Can not recover probabilities!

Page 15: Machine Learning : Foundations

Predictions and Loss Model

• quadratic loss– Predict a “real number” q for outcome 1.– Loss (q-p)2 for outcome 1– Loss ([1-q]-[1-p])2 for outcome 0– Expected loss: (p-q)2

– Minimized for p=q (Optimal prediction)• recovers the probabilities• Needs to know p to compute loss!

Page 16: Machine Learning : Foundations

Predictions and Loss Model

• Logarithmic loss– Predict a “real number” q for outcome 1.– Loss log 1/q for outcome 1– Loss log 1/(1-q) for outcome 0– Expected loss: -p log q -(1-p) log (1-q)– Minimized for p=q (Optimal prediction)

• recovers the probabilities• Loss does not depend on p!

Page 17: Machine Learning : Foundations

The basic PAC Model

Unknown target function f(x)

Distribution D over domain X

Goal: find h(x) such that h(x) approx. f(x)

Given H find hH that minimizes PrD[h(x) f(x)]

Page 18: Machine Learning : Foundations

Basic PAC Notions

S - sample of m examples drawn i.i.d using D

True error (h)= PrD[h(x)=f(x)]

Observed error ’(h)= 1/m |{ x S | h(x) f(x) }|

Example (x,f(x))

Basic question: How close is (h) to ’(h)

Page 19: Machine Learning : Foundations

Bayesian Theory

Prior distribution over H

Given a sample S compute a posterior distribution:

Maximum Likelihood (ML) Pr[S|h]Maximum A Posteriori (MAP) Pr[h|S]Bayesian Predictor h(x) Pr[h|S].

]Pr[]Pr[]|Pr[]|Pr[

ShhSSh

Page 20: Machine Learning : Foundations

Nearest Neighbor Methods

Classify using near examples.

Assume a “structured space” and a “metric”

+

+

+

+

-

-

-

-?

Page 21: Machine Learning : Foundations

Computational Methods

How to find an h e H with low observed error.

Heuristic algorithm for specific classes.

Most cases computational tasks are provably hard.

Page 22: Machine Learning : Foundations

Separating Hyperplane

Perceptron: sign( xiwi ) Find w1 .... wn

Limited representationx1 xn

w1wn

sign

Page 23: Machine Learning : Foundations

Neural Networks

Sigmoidal gates: a= xiwi and output = 1/(1+ e-a)

Back Propagation

x1 xn

Page 24: Machine Learning : Foundations

Decision Trees

x1 > 5

x6 > 2

+1 -1

+1

Page 25: Machine Learning : Foundations

Decision Trees

Limited Representation

Efficient Algorithms.

Aim: Find a small decision tree with low observed error.

Page 26: Machine Learning : Foundations

Decision Trees

PHASE I:Construct the tree greedy, using a local index function.Ginni Index : G(x) = x(1-x), Entropy H(x) ...

PHASE II:Prune the decision Tree while maintaining low observed error.

Good experimental results

Page 27: Machine Learning : Foundations

Complexity versus Generalization

hypothesis complexity versus observed error.

More complex hypothesis have lower observed error, but might have higher true error.

Page 28: Machine Learning : Foundations

Basic criteria for Model Selection

Minimum Description Length

’(h) + |code length of h|

Structural Risk Minimization:

’(h) + sqrt{ log |H| / m }

Page 29: Machine Learning : Foundations

Genetic Programming

A search Method.

Local mutation operations

Cross-over operations

Keeps the “best” candidates.

Change a node in a tree

Replace a subtree by another tree

Keep trees with low observed error

Example: decision trees

Page 30: Machine Learning : Foundations

General PAC Methodology

Minimize the observed error.

Search for a small size classifier

Hand-tailored search method for specific classes.

Page 31: Machine Learning : Foundations

Weak Learning

Small class of predicates H

Weak Learning:Assume that for any distribution D, there is some predicate hHthat predicts better than 1/2+.

Weak Learning Strong Learning

Page 32: Machine Learning : Foundations

Boosting Algorithms

Functions: Weighted majority of the predicates.

Methodology: Change the distribution to target “hard” examples.

Weight of an example is exponential in the number of incorrect classifications.

Extremely good experimental results and efficient algorithms.

Page 33: Machine Learning : Foundations

Support Vector Machine

n dimensions m dimensions

Page 34: Machine Learning : Foundations

Support Vector Machine

Use a hyperplane in the LARGE space.

Choose a hyperplane with a large MARGIN.

+

++

+

-

--

Project data to a high dimensional space.

Page 35: Machine Learning : Foundations

Other Models

Membership Queries

x f(x)

Page 36: Machine Learning : Foundations

Fourier Transform

f(x) = zz(x) z(x) = (-1)<x,z>

Many Simple classes are well approximated using large coefficients.

Efficient algorithms for finding large coefficients.

Page 37: Machine Learning : Foundations

Reinforcement Learning

Control Problems.

Changing the parameters changes the behavior.

Search for optimal policies.

Page 38: Machine Learning : Foundations

Clustering: Unsupervised learning

Page 39: Machine Learning : Foundations

Unsupervised learning: Clustering

Page 40: Machine Learning : Foundations

Basic Concepts in Probability

• For a single hypothesis h:– Given an observed error– Bound the true error

• Markov Inequality• Chebyshev Inequality• Chernoff Inequality

Page 41: Machine Learning : Foundations

Basic Concepts in Probability

• Switching from h1 to h2:– Given the observed errors– Predict if h2 is better.

• Total error rate• Cases where h1(x) h2(x)

– More refine


Top Related