1 analysis of ensemble learning using simple perceptrons based on online learning theory seiji...

1

Analysis of Ensemble Learning using Simple Perceptrons base

d on Online Learning Theory

Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5

1 Kobe City College of Tech., 2 Tokyo Metropolitan College of Tech.,

3 University of Tokyo4 RIKEN BSI,

5 Intelligent Cooperation and Control, PRESTO, JST

2

ABSTRACTABSTRACT

Ensemble learning of K simple perceptrons, which determine their outputs by sign functions, is discussed within the framework of online learning and statistical mechanics. One purpose of statistical learning theory is to theoretically obtain the generalization error. We show that ensemble generalization error can be calculated by using two order parameters, that is, the similarity between a teacher and a student, and the similarity among students. The differential equations that describe the dynamical behaviors of these order parameters are derived in the case of general learning rules. The concrete forms of these differential equations are derived analytically in the cases of three well-known rules: Hebbian learning, perceptron learning and AdaTron learning. Ensemble generalization errors of these three rules are calculated by using the results determined by solving their differential equations. As a result, these three rules show different characteristics in their affinity for ensemble learning, that is “maintaining variety among students”. 　 Results show that AdaTron learning is superior to the other two rules with respect to that affinity. 　　　　　　　　　　　　　　　 ‘　　　　　　　　　　　　　　　　　　　　　

3

BACKGROUNDBACKGROUND• Ensemble learning has recently attracted the atten

tion of many researchers. Ensemble learning means to combine many rules or learning machines (students in the following) that perform poorly. Theoretical studies analyzing the generalization performance by using statistical mechanics have been performed vigorously.

• Hara and Okada theoretically analyzed the case in which students are linear perceptrons.

• Hebbian learning, perceptron learning and AdaTron learning are well-known as learning rules for a nonlinear perceptron, which decides its output by sign function. Determining differences among ensemble learnings with Hebbian learning, perceptron learning and AdaTron learning, is a very attractive problem, but it is one that has never been analyzed.

OBJECTIVEOBJECTIVE• We discuss ensemble learning of K simp

le perceptrons within the framework of online learning and finite K.

4

MODELMODEL

• Common input x to teacher and all students in the same order.

• Input x , once used for an update, is abandoned. (Online learning)

• Update of student is independent each other.

• Two methods are treated to decide an ensemble output. One is the majority vote (MV) of students, and the other is the weight mean (WM).

length of student

Input:Teacher：　

Student:

Teacher Students

1 2 K

5

Generalization Error εg:

Probability that an ensemble output disagrees with that of the teacher for a new input x

THEORYTHEORY

Similarity between teacher and student

Similarity among students

6

Differential equations describing Differential equations describing l l and and RR (known result)(known result)

Differential equation describing qDifferential equation describing q(new result)(new result)

7

HebbianHebbian

(known result)

(new result)

RESULTSRESULTS

8

PerceptronPerceptron

(known result)

(new result)

9

AdaTronAdaTron

(known result)

(new result)

10

Generalization ErrorGeneralization Error

HebbianHebbian

PerceptronPerceptron

AdaTronAdaTron

MV, Theory

K-1

No

rma

lize

d

(t=

50

) ε

g

0 0.2 0.4 0.60.6

0.8

0.8

0.9

0.7

1

1

W M, The oryMV, S imu lati on

WM, Si mul ation

MV, T heo ry

K-1

No

rma

lize

d

(t=

50

) ε

g

0 0.2 0.4 0.60.6

0.8

0.8

0.9

0. 7

1

1

WM, The oryMV, Simulation

WM, Simulation

MV, Theory

K-1

No

rma

liz

ed

(

t=5

0)

εg

0 0.2 0.4 0.6 0.8

1. 02

1

0. 98

0.96

1

WM, TheoryMV, S imulation

WM, Simulation

Theory (K=3,MV)Theory (K=1)

Simulation (K=3,MV)

0 2 4 6 8

Time

0.5

0.4

0.3

0.2

0.1

0 2 4 6 8 10

ε g

0 2 4 6 8

Time

0.5

0.4

0.3

0.2

0.1

0 2 4 6 8 10

ε g


Simulation (K=3,MV)

0 2 4 6 8

Time

0.5

0.4

0.3

0.2

0.1

0 2 4 6 8 10

ε g


Simulation (K=3,MV)

K=∞ K=1

11

Similarity between teacher and student

Similarity among students

BJk

Jk'qkk'

B

JkJk'

Rk'Rk

DISCUSSIONDISCUSSION

12

To maintain the variety of students is important in ensemble learning.

→Relationship between R and q is essential.

B

Jk Jk'

B

JkJk'

qkk'

q is small →

Effect of ensemble is strong.

q is large →

Effect of ensemble is small.

13

Dynamical behaviors of Dynamical behaviors of RR and and qq

R

q1

0.8

0.6

0.4

0.2

0

Ove

rla

p

0 2 4 6 8 10

Time

R

q

Ov

erl

ap

1

0.8

0.6

0.4

0.2

0

0 2 4 6 8 10

Time

Rq

Ov

erl

ap

1

0.8

0.6

0.4

0.2

0

0 2 4 6 8 10

Time

Hebbian Perceptron

AdaTron

AdaTron

Hebbian

Perceptron

00

0.2

0.2

0.4

0.4

0.6

0.6

0.8

0.8

1

1

Similarity R

Sim

ila

rity

q

Relationship between Relationship between RR and and qq

1 analysis of ensemble learning using simple perceptrons based on online learning theory seiji...

Documents