1 analysis of ensemble learning using simple perceptrons based on online learning theory seiji...

14
1 Analysis of Ensemble Lear ning using Simple Percept rons based on Online Lear ning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OK ADA 3,4,5 1 Kobe City College of Tech., 2 Tokyo Metropolitan College of Tech., 3 University of Tokyo 4 RIKEN BSI, 5 Intelligent Cooperation and Control, PREST O, JST

Upload: magdalen-copeland

Post on 23-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City

1

Analysis of Ensemble Learning using Simple Perceptrons base

d on Online Learning Theory

Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5

1 Kobe City College of Tech., 2 Tokyo Metropolitan College of Tech.,

3 University of Tokyo4 RIKEN BSI,

5 Intelligent Cooperation and Control, PRESTO, JST

Page 2: 1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City

2

ABSTRACTABSTRACT

Ensemble learning of K simple perceptrons, which determine their outputs by sign functions, is discussed within the framework of online learning and statistical mechanics. One purpose of statistical learning theory is to theoretically obtain the generalization error. We show that ensemble generalization error can be calculated by using two order parameters, that is, the similarity between a teacher and a student, and the similarity among students. The differential equations that describe the dynamical behaviors of these order parameters are derived in the case of general learning rules. The concrete forms of these differential equations are derived analytically in the cases of three well-known rules: Hebbian learning, perceptron learning and AdaTron learning. Ensemble generalization errors of these three rules are calculated by using the results determined by solving their differential equations. As a result, these three rules show different characteristics in their affinity for ensemble learning, that is “maintaining variety among students”.   Results show that AdaTron learning is superior to the other two rules with respect to that affinity.                 ‘                     

Page 3: 1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City

3

BACKGROUNDBACKGROUND• Ensemble learning has recently attracted the atten

tion of many researchers. Ensemble learning means to combine many rules or learning machines (students in the following) that perform poorly. Theoretical studies analyzing the generalization performance by using statistical mechanics have been performed vigorously.

• Hara and Okada theoretically analyzed the case in which students are linear perceptrons.

• Hebbian learning, perceptron learning and AdaTron learning are well-known as learning rules for a nonlinear perceptron, which decides its output by sign function. Determining differences among ensemble learnings with Hebbian learning, perceptron learning and AdaTron learning, is a very attractive problem, but it is one that has never been analyzed.

OBJECTIVEOBJECTIVE• We discuss ensemble learning of K simp

le perceptrons within the framework of online learning and finite K.

Page 4: 1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City

4

MODELMODEL

• Common input x to teacher and all students in the same order.

• Input x , once used for an update, is abandoned. (Online learning)

• Update of student is independent each other.

• Two methods are treated to decide an ensemble output. One is the majority vote (MV) of students, and the other is the weight mean (WM).

length of student

Input:Teacher: 

Student:

Teacher Students

1 2 K

Page 5: 1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City

5

Generalization Error εg:

Probability that an ensemble output disagrees with that of the teacher for a new input x

THEORYTHEORY

Similarity between teacher and student

Similarity among students

Page 6: 1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City

6

Differential equations describing Differential equations describing l l and and RR (known result)(known result)

Differential equation describing qDifferential equation describing q(new result)(new result)

Page 7: 1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City

7

HebbianHebbian

(known result)

(new result)

RESULTSRESULTS

Page 8: 1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City

8

PerceptronPerceptron

(known result)

(new result)

Page 9: 1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City

9

AdaTronAdaTron

(known result)

(new result)

Page 10: 1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City

10

Generalization ErrorGeneralization Error

HebbianHebbian

PerceptronPerceptron

AdaTronAdaTron

MV, Theory

K-1

No

rma

lize

d

(t=

50

) ε

g

0 0.2 0.4 0.60.6

0.8

0.8

0.9

0.7

1

1

W M, The oryMV, S imu lati on

WM, Si mul ation

MV, T heo ry

K-1

No

rma

lize

d

(t=

50

) ε

g

0 0.2 0.4 0.60.6

0.8

0.8

0.9

0. 7

1

1

WM, The oryMV, Simulation

WM, Simulation

MV, Theory

K-1

No

rma

liz

ed

(

t=5

0)

εg

0 0.2 0.4 0.6 0.8

1. 02

1

0. 98

0.96

1

WM, TheoryMV, S imulation

WM, Simulation

Theory (K=3,MV)Theory (K=1)

Simulation (K=3,MV)

0 2 4 6 8

Time

0.5

0.4

0.3

0.2

0.1

0 2 4 6 8 10

ε g

0 2 4 6 8

Time

0.5

0.4

0.3

0.2

0.1

0 2 4 6 8 10

ε g

Theory (K=3,MV)Theory (K=1)

Simulation (K=3,MV)

0 2 4 6 8

Time

0.5

0.4

0.3

0.2

0.1

0 2 4 6 8 10

ε g

Theory (K=3,MV)Theory (K=1)

Simulation (K=3,MV)

K=∞ K=1

Page 11: 1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City

11

Similarity between teacher and student

Similarity among students

BJk

Jk'qkk'

B

JkJk'

Rk'Rk

DISCUSSIONDISCUSSION

Page 12: 1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City

12

To maintain the variety of students is important in ensemble learning.

→Relationship between R and q is essential.

B

Jk Jk'

B

JkJk'

qkk'

q is small →

Effect of ensemble is strong.

q is large →

Effect of ensemble is small.

Page 13: 1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City

13

Dynamical behaviors of Dynamical behaviors of RR and and qq

R

q1

0.8

0.6

0.4

0.2

0

Ove

rla

p

0 2 4 6 8 10

Time

R

q

Ov

erl

ap

1

0.8

0.6

0.4

0.2

0

0 2 4 6 8 10

Time

Rq

Ov

erl

ap

1

0.8

0.6

0.4

0.2

0

0 2 4 6 8 10

Time

Hebbian Perceptron

AdaTron

AdaTron

Hebbian

Perceptron

00

0.2

0.2

0.4

0.4

0.6

0.6

0.8

0.8

1

1

Similarity R

Sim

ila

rity

q

Relationship between Relationship between RR and and qq

Page 14: 1 Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City