connectionist temporal classification with maximum entropy ...06-09... · hu liu sheng jin...

Connectionist Temporal Classification with Maximum EntropyRegularization

Hu Liu Sheng Jin Changshui Zhang

Department of AutomationTsinghua University

NeurIPS, 2018

Hu Liu Sheng Jin Changshui Zhang CTC with Maximum Entropy Regularization NeurIPS, 2018 1 / 9

Introduction to Connectionist Temporal Classification (CTC)

P(‘dog’| )

dd_oo___gdoo____gg_____dogg

dddoooggg

= P

The most suitable path

@Lctc

@ytk

= � 1

p(l|X)ytk

X

{⇡|⇡2B�1(l),⇡t=k}p(⇡|X)

CTC lacks exploration and is prone to fall into worse local minima.Output overconfident paths (overfitting).Output paths with peaky distribution.

CTC Drawbacks:

positive feedbackerror signal


Maximum Conditional Entropy Regularization for CTC (EnCTC)

EnCTC:

Lenctc = Lctc � �H(p(⇡|l,X))

god

CTC:

expected:(High Entropy)

H(p(⇡|l,X)) = �X

⇡2B�1(l)

p(⇡|X, l) log p(⇡|X, l).

Entropy-based regularization

Better generalization and exploration.Solve peaky distribution problem.Depict ambiguous segmentation boundaries.


Equal Spacing CTC (EsCTC)

The spacing of two consecutive elements is nearly the same in many sequential tasks.

g

o

d

d o g

d o g

gd o

We adopt equal spacing as a pruning method to rule out unreasonable CTC paths.


Equal Spacing CTC (EsCTC)

z1 z2 z3

o o o o o o

d d d

_ _ _ _ _ o

_ _ _ _ _ d

d d d d d d

_ _ d _ d d

o o o

_ _ o _ o o

g g g

_ _ g _ g g

og g _ g

dg g _ g

g

o

dTheorem 3.1. Among all segmentation sequences, the equal spacing one has the maximum entropy.

Equal spacing is the best prior without any subjective assumptions.

zs ⌧T

|l|

zargmax max

pH(p(⇡|z, l, X)) = zes

Lesctc = � logX

z2C⌧,T

X

⇡2B�1z (l)

p(⇡|X)


Algorithm and Complexity Analysis

We propose dynamic programming algorithms for EnCTC, EsCTC� and EnEsCTC.

�̄(t, s) =

(�(t � 1, s) + �(t � 1, s � 1) if l0s = b or l0s�2 = l0s�(t � 1, s) + �(t � 1, s � 1) + �(t � 1, s � 2) otherwise

Q(l) = �(T, |l0|) + �(T, |l0| � 1)

EnCTC

EsCTC

EnEsCTC

p⌧ (l|X1:T ) = ↵⌧ (T, |l|) +

⌧ T|l|X

t0=1

↵⌧ (T � t0, |l|)�(T � t0 + 1, T, 0)↵⌧ (t, s) =

8<:

P⌧ T|l|

t0=1 ↵⌧ (t � t0, s � 1)�(t � t0 + 1, t, s) if ls�1 6= lsP⌧ T

|l|t0=2 ↵⌧ (t � t0, s � 1)yt�t0+1

b �(t � t0 + 2, t, s) otherwise

Q⌧ (l) =�⌧ (T, |l|) +

⌧ T|l|X

t0=1

�⌧ (T � t0, |l|)�(T � t0 + 1, T, 0)

+ ↵⌧ (T � t0, |l|)�(T � t0 + 1, T, 0) log �(T � t0 + 1, T, 0)

�⌧ (t, s) =

8>>>>>>>>><>>>>>>>>>:

P⌧ T|l|

t0=1 �⌧ (t � t0, s � 1)�(t � t0 + 1, t, s) + ↵⌧ (t � t0, s � 1)⌘(t � t0 + 1, t, s)

if ls�1 6= lsP⌧ T

|l|t0=2 �⌧ (t � t0, s � 1)yt�t0+1

b �(t � t0 + 2, t, s)+

↵⌧ (t � t0, s � 1)yt�t0+1b ⌘(t � t0 + 2, t, s)+

↵⌧ (t � t0, s � 1)yt�t0+1b log yt�t0+1

b �(t � t0 + 2, t, s)

otherwise


Qualitative Analysis

(a) (b) (c) (d)

Error Signal in Training

p a r i t y

CTCEnCTCEsCTC

EnEsCTC

Alignment Evaluation


Results on Scene Text Recognition Benchmarks

Evaluation of model generalization.Method Synth5K

CTC 38.1CTC + LS 42.9CTC + CP 44.4

EnCTC 45.5EsCTC 46.3EnEsCTC 47.2

Comparisons with the state-of-the-art methods.Method IC03 IC13 IIIT5K SVT

CRNN 89.4 86.7 78.2 80.8STAR-Net 89.9 89.1 83.3 83.6R2AM 88.7 90.0 78.4 80.7RARE 90.1 88.6 81.9 81.9

EnCTC 90.8 90.0 82.6 81.5EsCTC 92.6 87.4 81.7 81.5EnEsCTC 92.0 90.6 82.0 80.6


For more results and analyses, please come

Poster: Room 210 & 230 AB #106

https://github.com/liuhu-bigeye/enctc.crnn


connectionist temporal classification with maximum entropy ...06-09... · hu liu sheng jin...

Documents