modern topics in multivariate methods for data analysis

38
Modern Topics in Multivariate Methods for Data Analysis

Upload: amberly-doyle

Post on 02-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Modern Topics in Multivariate Methods for Data Analysis

• Semi-Supervised LearningSemi-Supervised Learning

• Transfer LearningTransfer Learning

• Active LearningActive Learning

• SummarySummary

Modern Topics in Multivariate Methods for Data Analysis

Semi-Supervised Learning

This is an extension to supervised learning. We have two sets of data:

Motivation: labeled data is sometimes hard to obtain.

Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007

An example from Mars Data Analysis

Digital Elevation MapGeomorphic Map

Martian landscape

Manually drawn geomorphic map of this landscape

Geomorphic map shows landforms chosen and defined by a domain expert.

Segmentation

Segmentation: Results.

Displayed on an elevation background.

2631 segments homogeneous in slope, curvature and flood.

Classification: Labeling.

A representative subset of objects are labeled as one of the following six classes: Plain Crater Floor Convex Crater Walls Concave Crater Walls Convex Ridges Concave Ridges

Labeled segments.

Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007

How do we approach semi-supervised learning?

Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007

A Case with No Unlabeled Data

Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007

A Case with Unlabeled Data

Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007

A Case with Unlabeled Data

Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007

A Case with Unlabeled Data

Graph-Based Models

Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007

How can we learn from unlabeled data at all?

The answer lies in the set of assumptions about theunlabeled data distribution.

If assumptions are right, an advantage can be obtainedusing unlabeled data

But a decrease in performance is possible if assumptions are incorrect.

Assumptions in Semi-Supervised Learning

• Semi-Supervised LearningSemi-Supervised Learning

• Transfer LearningTransfer Learning

• Active LearningActive Learning

• SummarySummary

Modern Topics in Multivariate Methods for Data Analysis

• The goal is to transfer knowledge gathered from previous experience.

• Also called Inductive Transfer or Learning to Learn.

• Example: Invariant transformations across tasks.

Transfer Learning

Motivation for transfer learning

Once a predictive model is built, there are reasons to believe the model will cease to be valid at some point in time.

The difference is that now source and target domains can be completely different.

Motivation Transfer Learning

Traditional Approach to Classification

DB1 DB2 DBn

Learning System

Learning System

Learning System

Transfer Learning

DB1 DB2

DB new

Learning System

Learning System

Learning SystemKnowledge

Source domain

Target domain

Transfer Learning

Scenarios:

1.Labeling in a new domain is costly.

DB1 (labeled)

Classification of Patients G1

DB2 (unlabeled)

Classification of Patients G2

Transfer Learning

Scenarios:

2. Data is outdated. Model created with one survey buta new survey is now available.

Survey 1

Learning System

Survey 2

?

Input nodesInput nodes

Internal nodesInternal nodes

Output nodesOutput nodes

Left Left StraightStraight RightRight

Functional Transfer: Multitask Learning

Train in Parallel with Combined Architecture

Figure obtained from Brazdil, et. Al. Metalearning: Applications to Data Mining, Chapter 7, Springer, 2009.

Knowledge of Parameters

Assume prior distribution of parameters

Source domain

Learn parameters and adjust prior distribution

Target domain

Learn parameters using the source priordistribution.

P(y|x) = P(x|y) P(y) / P(x)

Parameter Similarity

Task A Parameter A

Task B Parameter B ~ A

Assume hyper-distribution with low variance.

Assume Parameter Similarity

Knowledge of Parameters

Find coefficients ws using SVMs

Find coefficients wT using SVMsinitializing the search with ws

Feature Transfer

Feature Transfer:

Target domain

Source domain

Shared representation across tasks

Minimize Loss-Function( y, f(x))

The minimization is done over multiple tasks (multiple regions on Mars).

Feature Transfer

Identify commonFeatures to all tasks

Instance Transfer Learning

Instance Transfer:

Learning System

Target domainSource

domainFilter samples

Larger target dataset

New program calledTrAdaboost

• Semi-Supervised LearningSemi-Supervised Learning

• Transfer LearningTransfer Learning

• Active LearningActive Learning

• SummarySummary

Modern Topics in Multivariate Methods for Data Analysis

Active learning is part of the field of supervised learning.

We have labeled and unlabeled data. The novel idea is thatwe can choose which examples to label during learning.

It is also called “Query Learning”.

Labeled Data

Unlabeled Data Select examples

Active Learning

Types of Active Learning:

1. Query Synthesis.

The learner can request an example from anywhere in theinstance space. It is only appropriate with small finite

domains.Some examples may have no meaning.

Active Learning

Types of Active Learning:

2. Stream-Based Selective Sampling

Instances are drawn from the input space according to a distribution, and the learner can decide to discard it or not. For example, one can only choose examples from regions of uncertainty.

Active Learning

Types of Active Learning:

3. Pool-Based Sampling

Assume a small set of labeled examples and a large set of unlabeled examples. Here we evaluate and rank the whole set of unlabeled examples; we then choose one or more examples.

Active Learning

Sampling Based on Uncertainty

Figure taken from “Active Learning” by Burr Settles, Morgan & Claypool, 2012.

70% accuracy 90% accuracy

Uncertainty: 1.0 0.5 1.0

Sampling Based on Uncertainty

• Semi-Supervised LearningSemi-Supervised Learning

• Transfer LearningTransfer Learning

• Active LearningActive Learning

• SummarySummary

Modern Topics in Multivariate Methods for Data Analysis

Few labeled examples, labeling is expensive, Few labeled examples, labeling is expensive,

many unlabeled examples many unlabeled examples Semi-Supervised Semi-Supervised

Similar classification tasks but there is indication that Similar classification tasks but there is indication that

the distributions have changed the distributions have changed Transfer Learning Transfer Learning

Few training examples, labeling is expensive Few training examples, labeling is expensive Active Learning Active Learning

Summary