active learning and the importance of feedback in sampling

Active Learningand the Importance of Feedback in Sampling

Rui CastroRebecca Willett and Robert Nowak

Motivation – “twenty questions”

Goal: Accurately “learn” a concept, as fast as possible, by strategically focusing in regions of interest

Learning by asking carefully chosen questions, constructed using information gleaned from previous observations

Active Sampling in Regression

Sample locations are chosen a priori, before any observations are made

Passive Sampling

Sample locations are chosen as a function of previous observations

Active Sampling

Problem Formulation

Passive vs. Active

Passive Sampling:

Active Sampling:

Estimation and Sampling Strategies

Goal:

The estimator :

The sampling strategy :

Classical Smoothness Spaces

Functions with homogeneous complexity over the entire domain

- Hölder smooth function class

Smooth Functions – minimax lower bound

Theorem (Castro, RW, Nowak ’05)

The performance one can achieve with active learning is the same achievable with passive learning!!!

Inhomogeneous Functions

Homogenous functionsspread-out complexity

Inhomogeneous functionslocalized complexity

The relevant features of inhomogeneous functions are very localized in space, making active sampling promising

Piecewise Constant Functions – d≥2

best possible rate

Passive Learning in the PC Class

Estimation using Recursive Dyadic Partitions (RDP)

Prune the partition, adapting to the dataRecursively divide the domain into hypercubesDecorate each partition set with a constantDistribute sample points uniformly over [0,1]d

RDP-based Algorithm

Choose an RDP that fits the data well, but it is not overly complicated

empirical riskmeasures fit of the data Complexity penalty

This estimator can be computed efficiently using a tree-pruning algorithm.

Error BoundsOracle bounding techniques, akin to the work of Barron’91, can be used to upper bound the performance of our estimator

approximation error complexity penalty

balancing the two terms

Active Sampling in the PC class

Active Sampling Key: learn the location of the boundary

Use Recursive Dyadic Partitions to find the boundary

Active Sampling in the PC Class

Stage 1: “Oversample” at coarse resolution

• n/2 samples uniformly distributed

• Limit the resolution: many more samples than cells

• biased, but very low variance result(high approximation error, but low estimation error)

“boundary zone” is reliably detected

Active Sampling in the PC Class

Stage 2: Critically sample in boundary zone

• n/2 samples uniformly distributed within boundary zone

• construct fine partition around boundary

• prune partition according to standard multiscale methods

high resolution estimate of boundary

Main Theorem

* Cusp-free boundaries cannot behave like the graph of |x|1/2 at the origin, but milder “kinks” like |x| at 0 are allowable.

Main Theorem (Castro ’05):

*

Sketch of the Proof - Approach

Controlling the Bias

Not a problem after shift

Potential Problem Area

Cells intersecting the boundary may be pruned if ‘aligned’ with cell edge

Solution:• Repeat Stage 1 d-times,

using d slightly offset partitions

• Small cells remaining in any of the d+1 partitions are passed on to Stage 2

Iterating the approach yields a L-step method

Compare with minimax lower bound:

Multi-Stage Approach

Passive Sampling:

Learning PC Functions - Summary

Active Sampling:

This rates are nearly achieved using RDP-based estimators, that are easily implemented and have low computational complexity.

Spatially adaptive estimators based on “sparse” model selection (e.g., wavelet thresholding) may provide automatic mechanisms for guiding active learning processes

Instead of choosing “where-to-sample” one can also choose “where-to-compute” to actively reduce computation.

Can active learning provably work in even more realistic situations and under little or no prior assumptions ?

Spatial Adaptivity and Active Learning

Piecewise Constant Functions – d =1

Consider first the simplest non-homogenous function class

step function

This is a parametric class

Passive Sampling

Distribute sample points uniformly over [0,1] and use a maximum likelihood estimator

Active Sampling

Learning Rates – d =1

Passive Sampling:

Active Sampling:

(Burnashev & Zigangirov ’74)

Sketch of the Proof - Stage 1

Intuition tells us that this should be the error we experience away from the boundary

Error due to approximationof the boundary regions estimation error


Key: Limit the resolution of the RDPs

1/k

This is the performance away from the boundary

1/k


Are we finding more than the boundary?

Lemma:

At least we are not detecting too many areas outside the boundary.


n/2 more samples distributed uniformly over the boundary

Total error contribution from boundary zone:

Sketch of the Proof – Overall ErrorError away from the boundary

Balancing the two errors yields

Error in the boundary region

active learning and the importance of feedback in sampling

Documents

activepassive sampling

estimatoractive sampling

sampling strategy

boundaryactive sampling

sampling strategiesgoal

pc classactive sampling

pc classstage

boundary prune partition