ryota tomioka & stefan haufe tokyo tech / tu berlin / fraunhofer first

Combined classification and channel/basis selection withL1-L2 regularization with application to P300 speller

system

Ryota Tomioka & Stefan HaufeTokyo Tech / TU Berlin / Fraunhofer FIRST

P300 speller system

EvokedResponse

Farwell & Donchin 1988

P300 speller systemA B C D E FG H I J K LM N O P Q RS T U V W XY Z 1 2 3 45 6 7 8 9 _

A B C D E FG H I J K LM N O P Q RS T U V W XY Z 1 2 3 45 6 7 8 9 _

ER detected!

ER detected!

The character must be “P”

Common approach

Feature extraction

P300 detection

Decoding

e.g., ICA or channel selection

e.g., Binary SVM classifier

e.g., Compare the detector outputs

EEG signal

Feature vector

Detector outpus(6 cols& 6rows)

Decoded character(36 class)

?

?

Lots of intemediate goals!!

Our approach

e.g., ICA or channel selection

e.g., Binary SVM classifier

Compare the detector outputs

Decoding

EEG signal


P300 detection

Feature extraction

Define a “detector” fW(X)

Our approach

minimize L(W) + lW(W)

Data-fit Regularization

Regularized empirical risk minimization:

Decoding

EEG signal


P300 detection

Feature extraction

Detect P300

Extract structure

Learning the decoding model

• Suppose that we have a detector fw(X) that detects the P300 response in signal X.

f1 f2 f3 f4 f5 f6

f7

f8

f9

f10

f11

f12

This is nothing but learning 2 x 6-class classifier

How we do this

12 2 8 1 3 4 11 9 5 6 10 7 …

Multinomial likelihood f. Multinomial likelihood f.

-log PW(col | Xi) -log PW(row | Xi)+Si=1

nL(w) =

…

( )

Detector

fW(X) =<W, X>

X#samples

#cha

nnel

s

W#samples

#cha

nnel

s

L1-L2 regularization

2 4 6 8 10 12 14 16

2

4

6

8

10

12

14

16

W#samples

#cha

nnel

s

2 4 6 8 10 12 14 16

2

4

6

8

10

12

14

16

2 4 6 8 10 12 14 16

2

4

6

8

10

12

14

16

(1) Channel selection (linear sum of row norms)

(2) Time sample selection(linear sum of col norms)

(3) Component selection(linear sum of component norms)

The method

minimize L(W) + lW(W)

2 x 6-class multinomial loss L1-L2 regularization

Nonlinear convex optimization with second order cone constraint

Results - BCI competition III dataset II [Albany](1) Channel selection regularizer

l=5.46Subject A:99% (97%)72% (72%)

Subject B:93% (96%)80% (75%)

(Rakotomamonjy & Gigue)

15 repetitions5 repetitions

Results- BCI competition III dataset II [Albany](2) Time sample selection regularizer

l=5.46Subject A:98% (97%) 70% (72%)

Subject B:94% (96%)81% (75%)



Results- BCI competition III dataset II [Albany](3) Component selection regularizer


l=100Subject A:98% (97%) 70% (72%)

Subject B:94% (96%)82% (75%)


Filters(1) Channel selection regularizer

(2) Time sample selection regularizer

(3) Component selection regularizer

Summary

• Unified feature extraction and classifier learning– L1-L2 regularization

• Use decoding model to learn the classifier– 2x 6-class multinomial model

• Solve the problem in a convex regularized empirical risk minimization problem– Nonlinear second-order cone problem(efficient subgradient based optimization routine will

be made available soon!)

ryota tomioka & stefan haufe tokyo tech / tu berlin / fraunhofer first

Documents

subject b

channel selectione

rowsdecoded character36

class multinomial modelsolve

detector fwxour

p300 response

channelsl1l2 regularization

class classifierhow