nips2010 reading: semi-supervised learning with adversarially missing label information

16
Semi-supervised learning with adversarially missing label information by U. Syad and B. Taskar @ Univ. Penn Akisato Kimura, Ph.D [E-mail] [email protected] [Twitter] @_akisato

Upload: akisato-kimura

Post on 15-Jan-2015

696 views

Category:

Technology


1 download

DESCRIPTION

Semi-supervised learning with adversarially missing label information

TRANSCRIPT

Page 1: NIPS2010 reading: Semi-supervised learning with adversarially missing label information

Semi-supervised learning with adversarially missing label information

by U. Syad and B. Taskar @ Univ. Penn

Akisato Kimura, Ph.D[E-mail] [email protected] [Twitter] @_akisato

Page 2: NIPS2010 reading: Semi-supervised learning with adversarially missing label information

Abstract

December 26, 2010NIPS2010 reading @ Cybozu2

Semi-supervised learning in an adversarial setting Consider a general framework, including

Covariate shift : Missing labels are not necessarily selected at random.

Multiple instance learning : Correspondences between labels and instances are not provided explicitly.

Graph-based regularization : Only information about which instances are likely to have the same label.

but not including Malicious label noise setting : Labelers are allowed to mislabel a

part of the training set. Labelers can remove labels, but cannot change them.

Page 3: NIPS2010 reading: Semi-supervised learning with adversarially missing label information

Framework

December 26, 2010NIPS2010 reading @ Cybozu3

: m labeled training examples : examples : labels : unknown PDF

: label regularization function : soft labeling of the training examples

: probability so that has label The algorithm can access to only the examples and

the label regularization function . Correct labeling might be near the minimum of ,

but not necessarily (thus it is called “adversarial”).

Page 4: NIPS2010 reading: Semi-supervised learning with adversarially missing label information

Label regularization functions (1)

December 26, 2010NIPS2010 reading @ Cybozu4

Fully supervised learning

[Note] Labels also denotes the correct soft labeling. Semi-supervised learning

Ambiguous labeling ( multiple instance learning)

otherwise

otherwise

Don’t mind how q labels examples not in I

a set of possible labels of the example

otherwise

A set of indexes of non-zero components

A set of indexes of unlabeled examples

Page 5: NIPS2010 reading: Semi-supervised learning with adversarially missing label information

Label regularization functions (2)

December 26, 2010NIPS2010 reading @ Cybozu5

Laplacian regularization

is small and are believed to have the same label

Any combinations are also in this framework. Ex. SSL with Laplacian regularization

We don’t specify how the function is selected. Ex. Semi-supervised learning :

Usually, we can’t know how the unlabeled samples were selected (“covariate shift” setting).

positive semi-definite

Page 6: NIPS2010 reading: Semi-supervised learning with adversarially missing label information

Loss functions for model learning

December 26, 2010NIPS2010 reading @ Cybozu6

Any types of loss functions can be applicable, but Particularly we focus on Negative log likelihood of a log-linear PDF

Binary loss function of a linear classifier

a model/classifier parameter a feature function (“basis function” in PRML)

Page 7: NIPS2010 reading: Semi-supervised learning with adversarially missing label information

Algorithm

December 26, 2010NIPS2010 reading @ Cybozu7

Objective: Find a parameter achieving

Its validity will be disclosed later. For brevity, we abbreviate the objective as .

Basic algorithm:Game for Adversarially Missing Evidence (GAME)1. Constants are given in advance.2. Find s.t. 3. Find s.t.4. Output the parameter estimate

The order is opposite !!

Page 8: NIPS2010 reading: Semi-supervised learning with adversarially missing label information

Algorithm in detail

December 26, 2010NIPS2010 reading @ Cybozu8

Consider the case of the negative log likelihood Step 3 The label regularization function can be ignored. This step is equivalent to maximization of the likelihood of

a log-linear model.

Step 2 Take the dual of the inner minimization

concave

Easy to optimize

Shannon entropy

Many solvers we have

Page 9: NIPS2010 reading: Semi-supervised learning with adversarially missing label information

Convergence properties (1)

December 26, 2010NIPS2010 reading @ Cybozu9

Theorem 1 (Converse theorem) If , then for all parameters and label

regularization functions

satisfies with probability . Similar to the objective function of the algorithm

Page 10: NIPS2010 reading: Semi-supervised learning with adversarially missing label information

Convergence properties (2)

December 26, 2010NIPS2010 reading @ Cybozu10

Theorem 2 (Direct theorem: lower bound) Consider the binary loss function, and a finite set .

For all learning algorithms and example distributions ,there exists a labeling function that satisfies

with probability under some assumptions.

The assumptions can be seen in the paper (1,2 and 3). Some detailed conditions are omitted for the simplicity. The parameter is obtained by an learning algorithm.

Page 11: NIPS2010 reading: Semi-supervised learning with adversarially missing label information

Convergence properties (3)

December 26, 2010NIPS2010 reading @ Cybozu11

Theorem 3 (Direct theorem: upper bound) Consider the binary loss function. If ,

there exists a family of label regularization functionand a learning algorithm that satisfy

with probability under some assumptions.

Assumption 3 can be removed. Some detailed conditions are omitted for the simplicity. The parameter is obtained by the learning algorithm.

Page 12: NIPS2010 reading: Semi-supervised learning with adversarially missing label information

Back to the algorithm part …

December 26, 2010NIPS2010 reading @ Cybozu12

Objective: Find a parameter achieving

Minimum upper bound in the converse theorem

Do not minimize over . Instead, add a regularization part to leave unconstrained.

Page 13: NIPS2010 reading: Semi-supervised learning with adversarially missing label information

Experiments

December 26, 2010NIPS2010 reading @ Cybozu13

Exp. 1: Binary classification Datasets: COIL (object recognition), BCI (EEG scans)

Exp. 2: Multi-class categorization Dataset: Labeled Faces in the Wild

Page 14: NIPS2010 reading: Semi-supervised learning with adversarially missing label information

Results

December 26, 2010NIPS2010 reading @ Cybozu14

BCI

COIL

Faces

Page 15: NIPS2010 reading: Semi-supervised learning with adversarially missing label information

この論文から何を思うか?

December 26, 2010NIPS2010 reading @ Cybozu15

「一般化」論文は得てして面白くない。

その特殊ケースの結果を知っているので、「おお、そうだったのか」という驚きを得られにくい。

が、一般化が新しい見方を提供することは見逃せない。

知られていない特殊ケースの発見

特殊ケースの組み合わせによる新しい知見/手法

さらに、一般化で技術を俯瞰できるようになることも重要。

なので、ここからもう一歩先を考えてみましょう!

Page 16: NIPS2010 reading: Semi-supervised learning with adversarially missing label information

Fin.

December 26, 2010NIPS2010 reading @ Cybozu16

Thank you for your kind attention.