1 analysis of ensemble learning using simple perceptrons based on online learning theory seiji...
TRANSCRIPT
1
Analysis of Ensemble Learning using Simple Perceptrons base
d on Online Learning Theory
Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5
1 Kobe City College of Tech., 2 Tokyo Metropolitan College of Tech.,
3 University of Tokyo4 RIKEN BSI,
5 Intelligent Cooperation and Control, PRESTO, JST
2
ABSTRACTABSTRACT
Ensemble learning of K simple perceptrons, which determine their outputs by sign functions, is discussed within the framework of online learning and statistical mechanics. One purpose of statistical learning theory is to theoretically obtain the generalization error. We show that ensemble generalization error can be calculated by using two order parameters, that is, the similarity between a teacher and a student, and the similarity among students. The differential equations that describe the dynamical behaviors of these order parameters are derived in the case of general learning rules. The concrete forms of these differential equations are derived analytically in the cases of three well-known rules: Hebbian learning, perceptron learning and AdaTron learning. Ensemble generalization errors of these three rules are calculated by using the results determined by solving their differential equations. As a result, these three rules show different characteristics in their affinity for ensemble learning, that is “maintaining variety among students”. Results show that AdaTron learning is superior to the other two rules with respect to that affinity. ‘
3
BACKGROUNDBACKGROUND• Ensemble learning has recently attracted the atten
tion of many researchers. Ensemble learning means to combine many rules or learning machines (students in the following) that perform poorly. Theoretical studies analyzing the generalization performance by using statistical mechanics have been performed vigorously.
• Hara and Okada theoretically analyzed the case in which students are linear perceptrons.
• Hebbian learning, perceptron learning and AdaTron learning are well-known as learning rules for a nonlinear perceptron, which decides its output by sign function. Determining differences among ensemble learnings with Hebbian learning, perceptron learning and AdaTron learning, is a very attractive problem, but it is one that has never been analyzed.
OBJECTIVEOBJECTIVE• We discuss ensemble learning of K simp
le perceptrons within the framework of online learning and finite K.
4
MODELMODEL
• Common input x to teacher and all students in the same order.
• Input x , once used for an update, is abandoned. (Online learning)
• Update of student is independent each other.
• Two methods are treated to decide an ensemble output. One is the majority vote (MV) of students, and the other is the weight mean (WM).
length of student
Input:Teacher:
Student:
Teacher Students
1 2 K
5
Generalization Error εg:
Probability that an ensemble output disagrees with that of the teacher for a new input x
THEORYTHEORY
Similarity between teacher and student
Similarity among students
6
Differential equations describing Differential equations describing l l and and RR (known result)(known result)
Differential equation describing qDifferential equation describing q(new result)(new result)
7
HebbianHebbian
(known result)
(new result)
RESULTSRESULTS
8
PerceptronPerceptron
(known result)
(new result)
9
AdaTronAdaTron
(known result)
(new result)
10
Generalization ErrorGeneralization Error
HebbianHebbian
PerceptronPerceptron
AdaTronAdaTron
MV, Theory
K-1
No
rma
lize
d
(t=
50
) ε
g
0 0.2 0.4 0.60.6
0.8
0.8
0.9
0.7
1
1
W M, The oryMV, S imu lati on
WM, Si mul ation
MV, T heo ry
K-1
No
rma
lize
d
(t=
50
) ε
g
0 0.2 0.4 0.60.6
0.8
0.8
0.9
0. 7
1
1
WM, The oryMV, Simulation
WM, Simulation
MV, Theory
K-1
No
rma
liz
ed
(
t=5
0)
εg
0 0.2 0.4 0.6 0.8
1. 02
1
0. 98
0.96
1
WM, TheoryMV, S imulation
WM, Simulation
Theory (K=3,MV)Theory (K=1)
Simulation (K=3,MV)
0 2 4 6 8
Time
0.5
0.4
0.3
0.2
0.1
0 2 4 6 8 10
ε g
0 2 4 6 8
Time
0.5
0.4
0.3
0.2
0.1
0 2 4 6 8 10
ε g
Theory (K=3,MV)Theory (K=1)
Simulation (K=3,MV)
0 2 4 6 8
Time
0.5
0.4
0.3
0.2
0.1
0 2 4 6 8 10
ε g
Theory (K=3,MV)Theory (K=1)
Simulation (K=3,MV)
K=∞ K=1
11
Similarity between teacher and student
Similarity among students
BJk
Jk'qkk'
B
JkJk'
Rk'Rk
DISCUSSIONDISCUSSION
12
To maintain the variety of students is important in ensemble learning.
→Relationship between R and q is essential.
B
Jk Jk'
B
JkJk'
qkk'
q is small →
Effect of ensemble is strong.
q is large →
Effect of ensemble is small.
13
Dynamical behaviors of Dynamical behaviors of RR and and qq
R
q1
0.8
0.6
0.4
0.2
0
Ove
rla
p
0 2 4 6 8 10
Time
R
q
Ov
erl
ap
1
0.8
0.6
0.4
0.2
0
0 2 4 6 8 10
Time
Rq
Ov
erl
ap
1
0.8
0.6
0.4
0.2
0
0 2 4 6 8 10
Time
Hebbian Perceptron
AdaTron
AdaTron
Hebbian
Perceptron
00
0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
1
1
Similarity R
Sim
ila
rity
q
Relationship between Relationship between RR and and qq