dual strategy active learning presenter: pinar donmez 1 joint work with jaime g. carbonell 1 &...

DUAL STRATEGY ACTIVE LEARNING

presenter: Pinar Donmez1

Joint work with Jaime G. Carbonell1 & Paul N. Bennett2

1 Language Technologies Institute, Carnegie Mellon University2 Microsoft Research

Active Learning (Pool-based)

unlabeled data

Expert

Data Source

Learning Mechanism

label request

labeled data

output

learn a new model

Two different trends on Active Learning Uncertainty Sampling:

selects the example with the lowest certainty i.e. closest to the boundary, maximum entropy,...

Density-based Sampling: considers the underlying data distribution selects representatives of large clusters aims to cover the input space quickly

i.e. representative sampling, active learning using pre-clustering, etc.

Goal of this Work

Find an active learning method that works well everywhere Some work best when very few instances

sampled (i.e. density-based sampling) Some work best after substantial sampling

(i.e. uncertainty sampling)

Combine the best of both worlds for superior performance

Main Features of DUAL

DUAL is dynamic rather than static is context-sensitive builds upon the work titled “Active Learning with Pre-

Clustering”, (Nguyen & Smeulders, 2004) proposes a mixture model of density and uncertainty

DUAL’s primary focus is to outperform static strategies over a large operating

range improve learning for the later iterations rather than

concentrating on the initial data labeling

Related Work

DUAL AL with Pre-Clustering

Representative Sampling

Clustering Yes Yes Yes No

Uncertainty + Density

Yes Yes Yes No

Dynamic Yes No No Yes

Active Learning with Pre-Clustering

We call it Density Weighed Uncertainty Sampling (DWUS in short). Why?

assumes a hidden clustering structure of the data calculates the posterior P(y | x) as

x and y are conditionally independent given k since points in one cluster assumed to share the same label

1 1( | ) ( , | ) ( | , ) ( | )

k kP y x P y k x P y k x P k x

^2argmax [( ) | ] ( )

i i i ii I

s E y y x p x

1 1( | ) ( , | ) ( | ) ( | )

k kP y x P y k x P y k P k x

selection criterion

uncertainty score density score

Outline of DWUS

1. Cluster the data using K-medoid algorithm to find the cluster centroids ck

2. Estimate P(k|x) by a standard EM procedure3. Model P(y|k) as a logistic regression classifier

4. Estimate P(y|x) using5. Select an unlabeled instance using Eq. 16. Update the parameters of the logistic regression

model (hence update P(y|k) )7. Repeat steps 3-5 until stopping criterion

1( | )

1 exp( ( . ))k

P y ky c a b

1 1( | ) ( , | ) ( | ) ( | )

Notes on DWUS

Posterior class distribution:

P(y | k) is calculated via

P(k|x) is estimated using an EM procedure after the clustering

p(x | k) is a multivariate Gaussian with the same σ for all clusters

The logistic regression model to estimate parameters

|| ||( | ) (2 ) exp{ }

2d d kx c

1( ) ( | ) ( )

kp x p x k P k

1( | )

1 exp( ( . ))k

P y ky c a b

ln ( | ; , )l

i ii I

L P y x a b

1 1( | ) ( , | ) ( | ) ( | )

Motivation for DUAL

Strength of DWUS: favors higher density samples close to the decision boundary fast decrease in error But!

DWUS establishes diminishing returns! Why?

• Early iterations -> many points are highly uncertain• Later iterations -> points with high uncertainty no longer in dense regions• DWUS wastes time picking instances with no direct effect on the error

How does DUAL do better? Runs DWUS until it estimates a cross-over

Monitor the change in expected error at each iteration to detect when it is stuck in local minima

DUAL uses a mixture model after the cross-over ( saturation ) point

Our goal should be to minimize the expected future error If we knew the future error of Uncertainty Sampling (US) to

be zero, then we’d force But in practice, we do not know it

( ) [( ) | ] 0i i it

DWUS E y y xn

^* 2argmax * [( ) | ] (1 ) * ( )

is i i ii I

x E y y x p x

dual strategy active learning presenter: pinar donmez 1 joint work with jaime g. carbonell 1 &...

criterion slide

uncertainty sampling

new model slide

large unlabeled data

uncertainty density

large labeled data

initial data labeling

active learning pool

Documents

pinar yolacan

pinar - saludelcocuy.gov.co

podomatic pinar

jaume carbonell

carbonell ofertas

naqaa pinar

carbonell en lacandonia

reflexiones carbonell

carbonell e

academic and professional positions...

miguel carbonell - infocdmx

miguel carbonell sánchez

carbonell athenaze

learning to suggest: a machine learning framework for...

jaime carbonell (cs.cmu/~jgc) with vamshi ambati and...

ricardo ruiz carbonell

pinar donmez - kabbage at the chief analytics officer forum,...

la historiografia carbonell

5 carbonell, jaume

pinar contribución