outliers and inconsistency

15
Inconsistency and Outliers Ac#ve Learning by Outlier Detec#on Inconsistency Robustness Symposium 2011 Neil Rubens Assistant Professor University of ElectroCommunica#ons Tokyo, Japan

Upload: neil-rubens

Post on 05-Dec-2014

943 views

Category:

Technology


6 download

DESCRIPTION

presentation at the Inconsistency Robustness Symposium 2011 at Stanford University

TRANSCRIPT

Page 1: Outliers and Inconsistency

Inconsistency  and  Outliers  Ac#ve  Learning  by  Outlier  Detec#on    Inconsistency  Robustness  Symposium  2011  

Neil  Rubens  Assistant  Professor        University  of  Electro-­‐Communica#ons  Tokyo,  Japan  

Page 2: Outliers and Inconsistency

Outline  

Inconsistency  Robustness  is  a  mul#-­‐disciplinary  issue.    We  discuss  some  of  the  aspect  of  Inconsistency  Robustness  from  the  perspec#ve  of  Machine  Learning:    •  What  is  Inconsistency  •  Can  Inconsistency  be  Useful  •  Measuring  Inconsistency  

Page 3: Outliers and Inconsistency

Inconsistency/outlier:  data  that  does  not  agree  with  the  model.  

Inconsistency-­‐Outlier  

Page 4: Outliers and Inconsistency

Outlier  Types  

•  Spa#al  Outlier  – unlabeled  data    

•  Model  Outlier  –  labeled  data    

Our  Focus  

Page 5: Outliers and Inconsistency

Causes  of  Outliers  

•  Faulty  data  – Entry  error,  malfunc#on,  etc.  

•  Incorrect  Model  

hQp://www.dkimages.com/discover/previews/852/20223083.JPG  

•  Chance/Devia#on  

Our  Focus  

Page 6: Outliers and Inconsistency

Typical  Treatment  of  Outliers  •  Assume  that  the  learned  model  is  correct  and  discard  points  that  don’t  agree  with  the  model  

 

Page 7: Outliers and Inconsistency

Atypical  Treatment  of  Outliers  

•  Assume  that  data  is  right,  and  that  the    model  is  wrong  

Our  Focus  

Page 8: Outliers and Inconsistency

Obtaining Data could be “COSTLY”

Medicine:diagnosis: pain, time, $drug discovery: $$$, time

User Interaction:effort, time

Expertise Elicitation:$, time

2

assumption that the current model is accurate, and requires justsome tweaking. However, if the current model is inaccurate,it should be changed significantly; instead of ignoring theincompatability and keep making minor tweaks.

—x1

x2

y

.by—Practicality:Due to abundance of data; one may mistakenly dismiss this

problem as impractical. While the ulabeled data is abundant,labeled data is rather scarce. Even if overal, the amount oflabeled data is large enough; there may still be a need foradditional labeled data as to enable personalization (a commonfocus).

Moreover obtaining labeled data could be expensive. La-beled data is needed for personaliization ...

–This issue is exhorbated, in al settins in which ...This phenomena occurs frequently during the early stages

of the learning process [7], [6], or in a non-stationary envi-ronment in which changes may occur in the underlying model[2].

–Contributionsgradient descent ... (except the number of samples we can

make is very small)—

a) State the problem:b) Say why it’s an interesting problem: Not all of the

outliers are badc) Say what your solution achieves:d) Say what follows from your solution: If we discard

outliers, we might be discarding most informative data points====The goal of machine learning is to learn an accurate

predictive model from the data. Data that is inconsistent withthe learned model and/or existing data is refered to as anoutlier.

—Learned model is often assumed to be approximately cor-

rect, therefore using outliers for learning is considered to beundesireable, and hence outliers tend to be ignored. By playingit safe and learning only from consistent data just reinforcesour believes in what we think is correct (which may notneccesarily be accurate); instead of trying to learn what isnot yet known,

<<< TODO: need a nice illustrative example >>>—-just needs tweaking–AL estimates how informative points areoutliers are considered not informativewe think that they are very informative ...——

active learning aims at estimating how useful a point is forlearning. typically there is a lot of unlabeled data, some dataneeds to be labeled

—if some point is inconsistent, it may potentially be much

more informative; information that consistent points bringis rather limited since by virtue of being consistent, thisinformation is already captured within existing data/model,and consisten point mostly reinforces prior beliefs (w/o chaingthe outcomes).

–——outliers are discarded (since they are considered deterimen-

tal to learning the underlying pattern);unless objective is to learn to detect outliers, refered to as

anomaly detection [3], e.g. ...——AL: often contradictory items are considered to be outliers

and are ignored we argue that they are not outliers, but mayindeed contain a lot of information

http://jeffjonas.typepad.com/jeff_jonas/2010/11/big-data-new-physics.html

2. Bad data good. More specifically, natural variability indata including spelling errors, transposition errors, and evenprofessionally fabricated lies – all helpful. A bit more aboutthis here:It Turns Out Both Bad Data and a Teaspoon of DirtMay Be Good For YouandThere Is No Such Thing As A SingleVersion of Truth.

II. PROBLEM DEFINITION

type of outliers: error, non-error, ...f(x, ✓)

III. RELATED WORKS

Active learning and outlier detection has been jointly studiedbefore.

Typical approach is to use outlier detect to remove samplesfrom active learning process: e.g. [?] used outlier detectionduring the active learning to identify and remove the samplesjuged to be outliers.

we use outlier criterion as an active learning criterion;previous work of [1] ... has done an opposite using activelearning criterion as an outlier criterion i.e. outlier detectionby active learning; we propose the opposity active learning byoutlier detection ...

QBCVC dimmensionYChange [rubens] is similar to cook’s distance outlier

criterion ...kolmogorov/compression: given that underlying pattern has

already been learned; outlier carries more additional informa-tion

2

assu

mp

tio

nth

atth

ecu

rren

tm

od

elis

accu

rate

,an

dre

qu

ires

just

som

etw

eak

ing

.H

owev

er,

ifth

ecu

rren

tm

od

elis

inac

cura

te,

itsh

ou

ldb

ech

ang

edsi

gn

ifica

ntl

y;

inst

ead

of

ign

ori

ng

the

inco

mp

atab

ilit

yan

dkee

pm

akin

gm

ino

rtw

eak

s.— x

1

x

2

y . by — P

ract

ical

ity

:D

ue

toab

un

dan

ceo

fd

ata;

on

em

aym

ista

ken

lyd

ism

iss

this

pro

ble

mas

imp

ract

ical

.W

hil

eth

eu

lab

eled

dat

ais

abu

nd

ant,

lab

eled

dat

ais

rath

ersc

arce

.E

ven

ifover

al,

the

amo

un

to

fla

bel

edd

ata

isla

rge

eno

ug

h;

ther

em

ayst

ill

be

an

eed

for

add

itio

nal

lab

eled

dat

aas

toen

able

per

son

aliz

atio

n(a

com

mo

nfo

cus)

.M

ore

over

ob

tain

ing

lab

eled

dat

aco

uld

be

exp

ensi

ve.

La-

bel

edd

ata

isn

eed

edfo

rp

erso

nal

iiza

tio

n..

.– T

his

issu

eis

exh

orb

ated

,in

alse

ttin

sin

wh

ich

...

Th

isp

hen

om

ena

occ

urs

freq

uen

tly

du

rin

gth

eea

rly

stag

eso

fth

ele

arn

ing

pro

cess

[7],

[6],

or

ina

no

n-s

tati

on

ary

env

i-ro

nm

ent

inw

hic

hch

ang

esm

ayo

ccu

rin

the

un

der

lyin

gm

od

el[2

]. – Co

ntr

ibu

tio

ns

gra

die

nt

des

cen

t..

.(e

xce

pt

the

nu

mb

ero

fsa

mp

les

we

can

mak

eis

ver

ysm

all)

—a)

Stat

eth

epr

oble

m:

b)Sa

yw

hyit

’san

inte

rest

ing

prob

lem

:N

ot

all

of

the

ou

tlie

rsar

eb

adc)

Say

wha

tyo

urso

luti

onac

hiev

es:

d)Sa

yw

hat

foll

ows

from

your

solu

tion

:If

we

dis

card

ou

tlie

rs,

we

mig

ht

be

dis

card

ing

mo

stin

form

ativ

ed

ata

po

ints

==

==

Th

eg

oal

of

mac

hin

ele

arn

ing

isto

lear

nan

accu

rate

pre

dic

tive

mo

del

fro

mth

ed

ata.

Dat

ath

atis

inco

nsi

sten

tw

ith

the

lear

ned

mo

del

and

/or

exis

tin

gd

ata

isre

fere

dto

asan

ou

tlie

r.— L

earn

edm

od

elis

oft

enas

sum

edto

be

app

rox

imat

ely

cor-

rect

,th

eref

ore

usi

ng

ou

tlie

rsfo

rle

arn

ing

isco

nsi

der

edto

be

un

des

irea

ble

,an

dh

ence

ou

tlie

rste

nd

tob

eig

no

red

.B

yp

lay

ing

itsa

fean

dle

arn

ing

on

lyfr

om

con

sist

ent

dat

aju

stre

info

rces

ou

rb

elie

ves

inw

hat

we

thin

kis

corr

ect

(wh

ich

may

no

tn

ecce

sari

lyb

eac

cura

te);

inst

ead

of

try

ing

tole

arn

wh

atis

no

ty

etk

now

n,

<<

<T

OD

O:

nee

da

nic

eil

lust

rati

ve

exam

ple

>>

>—

-ju

stn

eed

stw

eak

ing

– AL

esti

mat

esh

owin

form

ativ

ep

oin

tsar

eo

utl

iers

are

con

sid

ered

no

tin

form

ativ

ew

eth

ink

that

they

are

ver

yin

form

ativ

e..

.—

acti

ve

lear

nin

gai

ms

ates

tim

atin

gh

owu

sefu

la

po

int

isfo

rle

arn

ing

.ty

pic

ally

ther

eis

alo

to

fu

nla

bel

edd

ata,

som

ed

ata

nee

ds

tob

ela

bel

ed— if

som

ep

oin

tis

inco

nsi

sten

t,it

may

po

ten

tial

lyb

em

uch

mo

rein

form

ativ

e;in

form

atio

nth

atco

nsi

sten

tp

oin

tsb

rin

gis

rath

erli

mit

edsi

nce

by

vir

tue

of

bei

ng

con

sist

ent,

this

info

rmat

ion

isal

read

yca

ptu

red

wit

hin

exis

tin

gd

ata/

mo

del

,an

dco

nsi

sten

po

int

mo

stly

rein

forc

esp

rio

rb

elie

fs(w

/och

ain

gth

eo

utc

om

es).

– ——

ou

tlie

rsar

ed

isca

rded

(sin

ceth

eyar

eco

nsi

der

edd

eter

imen

-ta

lto

lear

nin

gth

eu

nd

erly

ing

pat

tern

);u

nle

sso

bje

ctiv

eis

tole

arn

tod

etec

to

utl

iers

,re

fere

dto

asan

om

aly

det

ecti

on

[3],

e.g

...

.—

—A

L:

oft

enco

ntr

adic

tory

item

sar

eco

nsi

der

edto

be

ou

tlie

rsan

dar

eig

no

red

we

arg

ue

that

they

are

no

to

utl

iers

,bu

tm

ayin

dee

dco

nta

ina

lot

of

info

rmat

ion

htt

p:/

/jef

fjo

nas

.ty

pep

ad.c

om

/jef

f_jo

nas

/20

10

/11

/big

-dat

a-n

ew-p

hy

sics

.htm

l2

.B

add

ata

go

od

.M

ore

spec

ifica

lly,

nat

ura

lva

riab

ilit

yin

dat

ain

clu

din

gsp

elli

ng

erro

rs,

tran

spo

siti

on

erro

rs,

and

even

pro

fess

ion

ally

fab

rica

ted

lies

–al

lh

elp

ful.

Ab

itm

ore

abo

ut

this

her

e:It

Tu

rns

Ou

tB

oth

Bad

Dat

aan

da

Tea

spo

on

of

Dir

tM

ayB

eG

oo

dF

or

Yo

uan

dT

her

eIs

No

Su

chT

hin

gA

sA

Sin

gle

Ver

sio

no

fT

ruth

.—

II.

PR

OB

LE

MD

EF

INIT

ION

typ

eo

fo

utl

iers

:er

ror,

no

n-e

rro

r,..

.f(x

,✓)

III.

RE

LA

TE

DW

OR

KS

Act

ive

lear

nin

gan

do

utl

ier

det

ecti

on

has

bee

njo

intl

yst

ud

ied

bef

ore

.T

yp

ical

app

roac

his

tou

seo

utl

ier

det

ect

tore

move

sam

ple

sfr

om

acti

ve

lear

nin

gp

roce

ss:

e.g

.[?

]u

sed

ou

tlie

rd

etec

tio

nd

uri

ng

the

acti

ve

lear

nin

gto

iden

tify

and

rem

ove

the

sam

ple

sju

ged

tob

eo

utl

iers

.w

eu

seo

utl

ier

crit

erio

nas

anac

tive

lear

nin

gcr

iter

ion

;p

rev

iou

sw

ork

of

[1]

...

has

do

ne

ano

pp

osi

teu

sin

gac

tive

lear

nin

gcr

iter

ion

asan

ou

tlie

rcr

iter

ion

i.e.

ou

tlie

rd

etec

tio

nb

yac

tive

lear

nin

g;

we

pro

po

seth

eo

pp

osi

tyac

tive

lear

nin

gb

yo

utl

ier

det

ecti

on

...

QB

CV

Cd

imm

ensi

on

YC

han

ge

[ru

ben

s]is

sim

ilar

toco

ok

’sd

ista

nce

ou

tlie

rcr

iter

ion

...

ko

lmo

go

rov

/co

mp

ress

ion

:g

iven

that

un

der

lyin

gp

atte

rnh

asal

read

yb

een

lear

ned

;o

utl

ier

carr

ies

mo

read

dit

ion

alin

form

a-ti

on

Page 9: Outliers and Inconsistency

2

assumption that the current model is accurate, and requires justsome tweaking. However, if the current model is inaccurate,it should be changed significantly; instead of ignoring theincompatability and keep making minor tweaks.

—x1

x2

y

.by—Practicality:Due to abundance of data; one may mistakenly dismiss this

problem as impractical. While the ulabeled data is abundant,labeled data is rather scarce. Even if overal, the amount oflabeled data is large enough; there may still be a need foradditional labeled data as to enable personalization (a commonfocus).

Moreover obtaining labeled data could be expensive. La-beled data is needed for personaliization ...

–This issue is exhorbated, in al settins in which ...This phenomena occurs frequently during the early stages

of the learning process [7], [6], or in a non-stationary envi-ronment in which changes may occur in the underlying model[2].

–Contributionsgradient descent ... (except the number of samples we can

make is very small)—

a) State the problem:b) Say why it’s an interesting problem: Not all of the

outliers are badc) Say what your solution achieves:d) Say what follows from your solution: If we discard

outliers, we might be discarding most informative data points====The goal of machine learning is to learn an accurate

predictive model from the data. Data that is inconsistent withthe learned model and/or existing data is refered to as anoutlier.

—Learned model is often assumed to be approximately cor-

rect, therefore using outliers for learning is considered to beundesireable, and hence outliers tend to be ignored. By playingit safe and learning only from consistent data just reinforcesour believes in what we think is correct (which may notneccesarily be accurate); instead of trying to learn what isnot yet known,

<<< TODO: need a nice illustrative example >>>—-just needs tweaking–AL estimates how informative points areoutliers are considered not informativewe think that they are very informative ...——

active learning aims at estimating how useful a point is forlearning. typically there is a lot of unlabeled data, some dataneeds to be labeled

—if some point is inconsistent, it may potentially be much

more informative; information that consistent points bringis rather limited since by virtue of being consistent, thisinformation is already captured within existing data/model,and consisten point mostly reinforces prior beliefs (w/o chaingthe outcomes).

–——outliers are discarded (since they are considered deterimen-

tal to learning the underlying pattern);unless objective is to learn to detect outliers, refered to as

anomaly detection [3], e.g. ...——AL: often contradictory items are considered to be outliers

and are ignored we argue that they are not outliers, but mayindeed contain a lot of information

http://jeffjonas.typepad.com/jeff_jonas/2010/11/big-data-new-physics.html

2. Bad data good. More specifically, natural variability indata including spelling errors, transposition errors, and evenprofessionally fabricated lies – all helpful. A bit more aboutthis here:It Turns Out Both Bad Data and a Teaspoon of DirtMay Be Good For YouandThere Is No Such Thing As A SingleVersion of Truth.

II. PROBLEM DEFINITION

type of outliers: error, non-error, ...f(x, ✓)

III. RELATED WORKS

Active learning and outlier detection has been jointly studiedbefore.

Typical approach is to use outlier detect to remove samplesfrom active learning process: e.g. [?] used outlier detectionduring the active learning to identify and remove the samplesjuged to be outliers.

we use outlier criterion as an active learning criterion;previous work of [1] ... has done an opposite using activelearning criterion as an outlier criterion i.e. outlier detectionby active learning; we propose the opposity active learning byoutlier detection ...

QBCVC dimmensionYChange [rubens] is similar to cook’s distance outlier

criterion ...kolmogorov/compression: given that underlying pattern has

already been learned; outlier carries more additional informa-tion

2

assu

mp

tio

nth

atth

ecu

rren

tm

od

elis

accu

rate

,an

dre

quir

esju

stso

me

twea

kin

g.

How

ever

,if

the

curr

ent

mo

del

isin

accu

rate

,it

sho

uld

be

chan

ged

sig

nifi

cantl

y;

inst

ead

of

ign

ori

ng

the

inco

mp

atab

ilit

yan

dkee

pm

akin

gm

ino

rtw

eak

s.— x

1

x

2

y . by — P

ract

ical

ity

:D

ue

toab

un

dan

ceo

fd

ata;

on

em

aym

ista

ken

lyd

ism

iss

this

pro

ble

mas

imp

ract

ical

.W

hil

eth

eu

lab

eled

dat

ais

abu

nd

ant,

lab

eled

dat

ais

rath

ersc

arce

.E

ven

ifover

al,

the

amo

un

to

fla

bel

edd

ata

isla

rge

eno

ug

h;

ther

em

ayst

ill

be

an

eed

for

add

itio

nal

lab

eled

dat

aas

toen

able

per

son

aliz

atio

n(a

com

mo

nfo

cus)

.M

ore

over

ob

tain

ing

lab

eled

dat

aco

uld

be

exp

ensi

ve.

La-

bel

edd

ata

isn

eed

edfo

rp

erso

nal

iiza

tio

n..

.– T

his

issu

eis

exh

orb

ated

,in

alse

ttin

sin

whic

h..

.T

his

ph

eno

men

ao

ccu

rsfr

equ

entl

yd

uri

ng

the

earl

yst

ages

of

the

lear

nin

gp

roce

ss[7

],[6

],o

rin

an

on

-sta

tio

nar

yen

vi-

ron

men

tin

wh

ich

chan

ges

may

occ

ur

inth

eu

nd

erly

ing

mo

del

[2]. – C

on

trib

uti

on

sg

rad

ien

td

esce

nt

...

(ex

cep

tth

en

um

ber

of

sam

ple

sw

eca

nm

ake

isver

ysm

all)

—a)

Stat

eth

epr

oble

m:

b)Sa

yw

hyit

’san

inte

rest

ing

prob

lem

:N

ot

all

of

the

ou

tlie

rsar

eb

adc)

Say

wha

tyo

urso

luti

onac

hiev

es:

d)Sa

yw

hat

foll

ows

from

your

solu

tion

:If

we

dis

card

ou

tlie

rs,

we

mig

ht

be

dis

card

ing

most

info

rmat

ive

dat

ap

oin

ts=

==

=T

he

go

alo

fm

ach

ine

lear

nin

gis

tole

arn

anac

cura

tep

red

icti

ve

mo

del

fro

mth

ed

ata.

Dat

ath

atis

inco

nsi

sten

tw

ith

the

lear

ned

mo

del

and

/or

exis

ting

dat

ais

refe

red

toas

ano

utl

ier.

— Lea

rned

mo

del

iso

ften

assu

med

tobe

app

rox

imat

ely

cor-

rect

,th

eref

ore

usi

ng

ou

tlie

rsfo

rle

arn

ing

isco

nsi

der

edto

be

un

des

irea

ble

,an

dh

ence

ou

tlie

rste

nd

tob

eig

no

red.B

ypla

yin

git

safe

and

lear

nin

go

nly

fro

mco

nsi

sten

td

ata

just

rein

forc

eso

ur

bel

ieves

inw

hat

we

thin

kis

corr

ect

(wh

ich

may

no

tn

ecce

sari

lyb

eac

cura

te);

inst

ead

of

try

ing

tole

arn

wh

atis

no

ty

etk

now

n,

<<

<T

OD

O:

nee

da

nic

eil

lust

rati

ve

exam

ple

>>

>—

-ju

stn

eed

stw

eak

ing

– AL

esti

mat

esh

owin

form

ativ

ep

oin

tsar

eo

utl

iers

are

con

sid

ered

no

tin

form

ativ

ew

eth

ink

that

they

are

ver

yin

form

ativ

e..

.—

acti

ve

lear

nin

gai

ms

ates

tim

atin

ghow

use

ful

ap

oin

tis

for

lear

nin

g.

typ

ical

lyth

ere

isa

lot

of

unla

bel

edd

ata,

som

ed

ata

nee

ds

tob

ela

bel

ed— if

som

ep

oin

tis

inco

nsi

sten

t,it

may

pote

nti

ally

be

mu

chm

ore

info

rmat

ive;

info

rmat

ion

that

con

sist

ent

po

ints

bri

ng

isra

ther

lim

ited

sin

ceby

vir

tue

of

bei

ng

con

sist

ent,

this

info

rmat

ion

isal

read

yca

ptu

red

wit

hin

exis

ting

dat

a/m

od

el,

and

con

sist

enp

oin

tm

ost

lyre

info

rces

pri

or

bel

iefs

(w/o

chai

ng

the

ou

tco

mes

).– —

—o

utl

iers

are

dis

card

ed(s

ince

they

are

consi

der

edd

eter

imen

-ta

lto

lear

nin

gth

eu

nd

erly

ing

pat

tern

);u

nle

sso

bje

ctiv

eis

tole

arn

tod

etec

tou

tlie

rs,

refe

red

toas

ano

mal

yd

etec

tio

n[3

],e.

g.

...

——

AL

:o

ften

con

trad

icto

ryit

ems

are

consi

der

edto

be

ou

tlie

rsan

dar

eig

no

red

we

arg

ue

that

they

are

no

toutl

iers

,bu

tm

ayin

dee

dco

nta

ina

lot

of

info

rmat

ion

htt

p:/

/jef

fjo

nas

.ty

pep

ad.c

om

/jef

f_jo

nas

/20

10/1

1/b

ig-d

ata-

new

-phy

sics

.htm

l2

.B

add

ata

go

od

.M

ore

spec

ifica

lly,

nat

ura

lva

riab

ilit

yin

dat

ain

clu

din

gsp

elli

ng

erro

rs,

tran

sposi

tion

erro

rs,

and

even

pro

fess

ion

ally

fab

rica

ted

lies

–al

lh

elpfu

l.A

bit

mo

reab

ou

tth

ish

ere:

ItT

urn

sO

ut

Bo

thB

adD

ata

and

aT

easp

oon

of

Dir

tM

ayB

eG

oo

dF

or

Yo

uan

dT

her

eIs

No

Such

Thin

gA

sA

Sin

gle

Ver

sio

no

fT

ruth

.—

II.

PR

OB

LE

MD

EF

INIT

ION

typ

eo

fo

utl

iers

:er

ror,

no

n-e

rror,

...

f(x

,✓)

III.

RE

LA

TE

DW

OR

KS

Act

ive

lear

nin

gan

do

utl

ier

det

ecti

on

has

bee

njo

intl

yst

ud

ied

bef

ore

.T

yp

ical

app

roac

his

touse

ou

tlie

rdet

ect

tore

move

sam

ple

sfr

om

acti

ve

lear

nin

gp

roce

ss:

e.g

.[?

]use

do

utl

ier

det

ecti

on

du

rin

gth

eac

tive

lear

nin

gto

iden

tify

and

rem

ove

the

sam

ple

sju

ged

tob

eo

utl

iers

.w

eu

seo

utl

ier

crit

erio

nas

anac

tive

lear

nin

gcr

iter

ion

;p

rev

iou

sw

ork

of

[1]

...

has

do

ne

anop

posi

teu

sin

gac

tive

lear

nin

gcr

iter

ion

asan

ou

tlie

rcr

iter

ion

i.e.

ou

tlie

rd

etec

tio

nb

yac

tive

lear

nin

g;

we

pro

po

seth

eo

pposi

tyac

tive

lear

nin

gby

ou

tlie

rd

etec

tio

n..

.Q

BC

VC

dim

men

sio

nY

Ch

ang

e[r

ub

ens]

issi

mil

arto

coo

k’s

dis

tan

ceo

utl

ier

crit

erio

n..

.ko

lmo

go

rov

/co

mp

ress

ion

:g

iven

that

und

erly

ing

pat

tern

has

alre

ady

bee

nle

arn

ed;

outl

ier

carr

ies

more

add

itio

nal

info

rma-

tio

n

Unlabeled Data

2

assumption that the current model is accurate, and requires justsome tweaking. However, if the current model is inaccurate,it should be changed significantly; instead of ignoring theincompatability and keep making minor tweaks.

—x1

x2

y

.by—Practicality:Due to abundance of data; one may mistakenly dismiss this

problem as impractical. While the ulabeled data is abundant,labeled data is rather scarce. Even if overal, the amount oflabeled data is large enough; there may still be a need foradditional labeled data as to enable personalization (a commonfocus).

Moreover obtaining labeled data could be expensive. La-beled data is needed for personaliization ...

–This issue is exhorbated, in al settins in which ...This phenomena occurs frequently during the early stages

of the learning process [7], [6], or in a non-stationary envi-ronment in which changes may occur in the underlying model[2].

–Contributionsgradient descent ... (except the number of samples we can

make is very small)—

a) State the problem:b) Say why it’s an interesting problem: Not all of the

outliers are badc) Say what your solution achieves:d) Say what follows from your solution: If we discard

outliers, we might be discarding most informative data points====The goal of machine learning is to learn an accurate

predictive model from the data. Data that is inconsistent withthe learned model and/or existing data is refered to as anoutlier.

—Learned model is often assumed to be approximately cor-

rect, therefore using outliers for learning is considered to beundesireable, and hence outliers tend to be ignored. By playingit safe and learning only from consistent data just reinforcesour believes in what we think is correct (which may notneccesarily be accurate); instead of trying to learn what isnot yet known,

<<< TODO: need a nice illustrative example >>>—-just needs tweaking–AL estimates how informative points areoutliers are considered not informativewe think that they are very informative ...——

active learning aims at estimating how useful a point is forlearning. typically there is a lot of unlabeled data, some dataneeds to be labeled

—if some point is inconsistent, it may potentially be much

more informative; information that consistent points bringis rather limited since by virtue of being consistent, thisinformation is already captured within existing data/model,and consisten point mostly reinforces prior beliefs (w/o chaingthe outcomes).

–——outliers are discarded (since they are considered deterimen-

tal to learning the underlying pattern);unless objective is to learn to detect outliers, refered to as

anomaly detection [3], e.g. ...——AL: often contradictory items are considered to be outliers

and are ignored we argue that they are not outliers, but mayindeed contain a lot of information

http://jeffjonas.typepad.com/jeff_jonas/2010/11/big-data-new-physics.html

2. Bad data good. More specifically, natural variability indata including spelling errors, transposition errors, and evenprofessionally fabricated lies – all helpful. A bit more aboutthis here:It Turns Out Both Bad Data and a Teaspoon of DirtMay Be Good For YouandThere Is No Such Thing As A SingleVersion of Truth.

II. PROBLEM DEFINITION

type of outliers: error, non-error, ...f(x, ✓)

III. RELATED WORKS

Active learning and outlier detection has been jointly studiedbefore.

Typical approach is to use outlier detect to remove samplesfrom active learning process: e.g. [?] used outlier detectionduring the active learning to identify and remove the samplesjuged to be outliers.

we use outlier criterion as an active learning criterion;previous work of [1] ... has done an opposite using activelearning criterion as an outlier criterion i.e. outlier detectionby active learning; we propose the opposity active learning byoutlier detection ...

QBCVC dimmensionYChange [rubens] is similar to cook’s distance outlier

criterion ...kolmogorov/compression: given that underlying pattern has

already been learned; outlier carries more additional informa-tion

2

assu

mp

tio

nth

atth

ecu

rren

tm

od

elis

accu

rate

,an

dre

quir

esju

stso

me

twea

kin

g.

How

ever

,if

the

curr

ent

mo

del

isin

accu

rate

,it

sho

uld

be

chan

ged

sig

nifi

cantl

y;

inst

ead

of

ign

ori

ng

the

inco

mp

atab

ilit

yan

dkee

pm

akin

gm

ino

rtw

eak

s.— x

1

x

2

y . by — P

ract

ical

ity

:D

ue

toab

un

dan

ceo

fd

ata;

on

em

aym

ista

ken

lyd

ism

iss

this

pro

ble

mas

imp

ract

ical

.W

hil

eth

eu

lab

eled

dat

ais

abu

nd

ant,

lab

eled

dat

ais

rath

ersc

arce

.E

ven

ifover

al,

the

amo

un

to

fla

bel

edd

ata

isla

rge

eno

ug

h;

ther

em

ayst

ill

be

an

eed

for

add

itio

nal

lab

eled

dat

aas

toen

able

per

son

aliz

atio

n(a

com

mo

nfo

cus)

.M

ore

over

ob

tain

ing

lab

eled

dat

aco

uld

be

exp

ensi

ve.

La-

bel

edd

ata

isn

eed

edfo

rp

erso

nal

iiza

tio

n..

.– T

his

issu

eis

exh

orb

ated

,in

alse

ttin

sin

whic

h..

.T

his

ph

eno

men

ao

ccu

rsfr

equ

entl

yd

uri

ng

the

earl

yst

ages

of

the

lear

nin

gp

roce

ss[7

],[6

],o

rin

an

on

-sta

tio

nar

yen

vi-

ron

men

tin

wh

ich

chan

ges

may

occ

ur

inth

eu

nd

erly

ing

mo

del

[2]. – C

on

trib

uti

on

sg

rad

ien

td

esce

nt

...

(ex

cep

tth

en

um

ber

of

sam

ple

sw

eca

nm

ake

isver

ysm

all)

—a)

Stat

eth

epr

oble

m:

b)Sa

yw

hyit

’san

inte

rest

ing

prob

lem

:N

ot

all

of

the

ou

tlie

rsar

eb

adc)

Say

wha

tyo

urso

luti

onac

hiev

es:

d)Sa

yw

hat

foll

ows

from

your

solu

tion

:If

we

dis

card

ou

tlie

rs,

we

mig

ht

be

dis

card

ing

most

info

rmat

ive

dat

ap

oin

ts=

==

=T

he

go

alo

fm

ach

ine

lear

nin

gis

tole

arn

anac

cura

tep

red

icti

ve

mo

del

fro

mth

ed

ata.

Dat

ath

atis

inco

nsi

sten

tw

ith

the

lear

ned

mo

del

and

/or

exis

ting

dat

ais

refe

red

toas

ano

utl

ier.

— Lea

rned

mo

del

iso

ften

assu

med

tobe

app

rox

imat

ely

cor-

rect

,th

eref

ore

usi

ng

ou

tlie

rsfo

rle

arn

ing

isco

nsi

der

edto

be

un

des

irea

ble

,an

dh

ence

ou

tlie

rste

nd

tob

eig

no

red.B

ypla

yin

git

safe

and

lear

nin

go

nly

fro

mco

nsi

sten

td

ata

just

rein

forc

eso

ur

bel

ieves

inw

hat

we

thin

kis

corr

ect

(wh

ich

may

no

tn

ecce

sari

lyb

eac

cura

te);

inst

ead

of

try

ing

tole

arn

wh

atis

no

ty

etk

now

n,

<<

<T

OD

O:

nee

da

nic

eil

lust

rati

ve

exam

ple

>>

>—

-ju

stn

eed

stw

eak

ing

– AL

esti

mat

esh

owin

form

ativ

ep

oin

tsar

eo

utl

iers

are

con

sid

ered

no

tin

form

ativ

ew

eth

ink

that

they

are

ver

yin

form

ativ

e..

.—

acti

ve

lear

nin

gai

ms

ates

tim

atin

ghow

use

ful

ap

oin

tis

for

lear

nin

g.

typ

ical

lyth

ere

isa

lot

of

unla

bel

edd

ata,

som

ed

ata

nee

ds

tob

ela

bel

ed— if

som

ep

oin

tis

inco

nsi

sten

t,it

may

pote

nti

ally

be

mu

chm

ore

info

rmat

ive;

info

rmat

ion

that

con

sist

ent

po

ints

bri

ng

isra

ther

lim

ited

sin

ceby

vir

tue

of

bei

ng

con

sist

ent,

this

info

rmat

ion

isal

read

yca

ptu

red

wit

hin

exis

ting

dat

a/m

od

el,

and

con

sist

enp

oin

tm

ost

lyre

info

rces

pri

or

bel

iefs

(w/o

chai

ng

the

ou

tco

mes

).– —

—o

utl

iers

are

dis

card

ed(s

ince

they

are

consi

der

edd

eter

imen

-ta

lto

lear

nin

gth

eu

nd

erly

ing

pat

tern

);u

nle

sso

bje

ctiv

eis

tole

arn

tod

etec

tou

tlie

rs,

refe

red

toas

ano

mal

yd

etec

tio

n[3

],e.

g.

...

——

AL

:o

ften

con

trad

icto

ryit

ems

are

consi

der

edto

be

ou

tlie

rsan

dar

eig

no

red

we

arg

ue

that

they

are

no

toutl

iers

,bu

tm

ayin

dee

dco

nta

ina

lot

of

info

rmat

ion

htt

p:/

/jef

fjo

nas

.ty

pep

ad.c

om

/jef

f_jo

nas

/20

10/1

1/b

ig-d

ata-

new

-phy

sics

.htm

l2

.B

add

ata

go

od

.M

ore

spec

ifica

lly,

nat

ura

lva

riab

ilit

yin

dat

ain

clu

din

gsp

elli

ng

erro

rs,

tran

sposi

tion

erro

rs,

and

even

pro

fess

ion

ally

fab

rica

ted

lies

–al

lh

elpfu

l.A

bit

mo

reab

ou

tth

ish

ere:

ItT

urn

sO

ut

Bo

thB

adD

ata

and

aT

easp

oon

of

Dir

tM

ayB

eG

oo

dF

or

Yo

uan

dT

her

eIs

No

Such

Thin

gA

sA

Sin

gle

Ver

sio

no

fT

ruth

.—

II.

PR

OB

LE

MD

EF

INIT

ION

typ

eo

fo

utl

iers

:er

ror,

no

n-e

rror,

...

f(x

,✓)

III.

RE

LA

TE

DW

OR

KS

Act

ive

lear

nin

gan

do

utl

ier

det

ecti

on

has

bee

njo

intl

yst

ud

ied

bef

ore

.T

yp

ical

app

roac

his

touse

ou

tlie

rdet

ect

tore

move

sam

ple

sfr

om

acti

ve

lear

nin

gp

roce

ss:

e.g

.[?

]use

do

utl

ier

det

ecti

on

du

rin

gth

eac

tive

lear

nin

gto

iden

tify

and

rem

ove

the

sam

ple

sju

ged

tob

eo

utl

iers

.w

eu

seo

utl

ier

crit

erio

nas

anac

tive

lear

nin

gcr

iter

ion

;p

rev

iou

sw

ork

of

[1]

...

has

do

ne

anop

posi

teu

sin

gac

tive

lear

nin

gcr

iter

ion

asan

ou

tlie

rcr

iter

ion

i.e.

ou

tlie

rd

etec

tio

nb

yac

tive

lear

nin

g;

we

pro

po

seth

eo

pposi

tyac

tive

lear

nin

gby

ou

tlie

rd

etec

tio

n..

.Q

BC

VC

dim

men

sio

nY

Ch

ang

e[r

ub

ens]

issi

mil

arto

coo

k’s

dis

tan

ceo

utl

ier

crit

erio

n..

.ko

lmo

go

rov

/co

mp

ress

ion

:g

iven

that

und

erly

ing

pat

tern

has

alre

ady

bee

nle

arn

ed;

outl

ier

carr

ies

more

add

itio

nal

info

rma-

tio

n

Sampling

2

assumption that the current model is accurate, and requires justsome tweaking. However, if the current model is inaccurate,it should be changed significantly; instead of ignoring theincompatability and keep making minor tweaks.

—x1

x2

y

.by—Practicality:Due to abundance of data; one may mistakenly dismiss this

problem as impractical. While the ulabeled data is abundant,labeled data is rather scarce. Even if overal, the amount oflabeled data is large enough; there may still be a need foradditional labeled data as to enable personalization (a commonfocus).

Moreover obtaining labeled data could be expensive. La-beled data is needed for personaliization ...

–This issue is exhorbated, in al settins in which ...This phenomena occurs frequently during the early stages

of the learning process [7], [6], or in a non-stationary envi-ronment in which changes may occur in the underlying model[2].

–Contributionsgradient descent ... (except the number of samples we can

make is very small)—

a) State the problem:b) Say why it’s an interesting problem: Not all of the

outliers are badc) Say what your solution achieves:d) Say what follows from your solution: If we discard

outliers, we might be discarding most informative data points====The goal of machine learning is to learn an accurate

predictive model from the data. Data that is inconsistent withthe learned model and/or existing data is refered to as anoutlier.

—Learned model is often assumed to be approximately cor-

rect, therefore using outliers for learning is considered to beundesireable, and hence outliers tend to be ignored. By playingit safe and learning only from consistent data just reinforcesour believes in what we think is correct (which may notneccesarily be accurate); instead of trying to learn what isnot yet known,

<<< TODO: need a nice illustrative example >>>—-just needs tweaking–AL estimates how informative points areoutliers are considered not informativewe think that they are very informative ...——

active learning aims at estimating how useful a point is forlearning. typically there is a lot of unlabeled data, some dataneeds to be labeled

—if some point is inconsistent, it may potentially be much

more informative; information that consistent points bringis rather limited since by virtue of being consistent, thisinformation is already captured within existing data/model,and consisten point mostly reinforces prior beliefs (w/o chaingthe outcomes).

–——outliers are discarded (since they are considered deterimen-

tal to learning the underlying pattern);unless objective is to learn to detect outliers, refered to as

anomaly detection [3], e.g. ...——AL: often contradictory items are considered to be outliers

and are ignored we argue that they are not outliers, but mayindeed contain a lot of information

http://jeffjonas.typepad.com/jeff_jonas/2010/11/big-data-new-physics.html

2. Bad data good. More specifically, natural variability indata including spelling errors, transposition errors, and evenprofessionally fabricated lies – all helpful. A bit more aboutthis here:It Turns Out Both Bad Data and a Teaspoon of DirtMay Be Good For YouandThere Is No Such Thing As A SingleVersion of Truth.

II. PROBLEM DEFINITION

type of outliers: error, non-error, ...f(x, ✓)

III. RELATED WORKS

Active learning and outlier detection has been jointly studiedbefore.

Typical approach is to use outlier detect to remove samplesfrom active learning process: e.g. [?] used outlier detectionduring the active learning to identify and remove the samplesjuged to be outliers.

we use outlier criterion as an active learning criterion;previous work of [1] ... has done an opposite using activelearning criterion as an outlier criterion i.e. outlier detectionby active learning; we propose the opposity active learning byoutlier detection ...

QBCVC dimmensionYChange [rubens] is similar to cook’s distance outlier

criterion ...kolmogorov/compression: given that underlying pattern has

already been learned; outlier carries more additional informa-tion

2

assu

mp

tio

nth

atth

ecu

rren

tm

od

elis

accu

rate

,an

dre

qu

ires

just

som

etw

eak

ing

.H

owev

er,

ifth

ecu

rren

tm

od

elis

inac

cura

te,

itsh

ou

ldb

ech

ang

edsi

gn

ifica

ntl

y;

inst

ead

of

ign

ori

ng

the

inco

mp

atab

ilit

yan

dkee

pm

akin

gm

ino

rtw

eak

s.— x

1

x

2

y . by — P

ract

ical

ity

:D

ue

toab

un

dan

ceo

fd

ata;

on

em

aym

ista

ken

lyd

ism

iss

this

pro

ble

mas

imp

ract

ical

.W

hil

eth

eu

lab

eled

dat

ais

abu

nd

ant,

lab

eled

dat

ais

rath

ersc

arce

.E

ven

ifover

al,

the

amo

un

to

fla

bel

edd

ata

isla

rge

eno

ug

h;

ther

em

ayst

ill

be

an

eed

for

add

itio

nal

lab

eled

dat

aas

toen

able

per

son

aliz

atio

n(a

com

mo

nfo

cus)

.M

ore

over

ob

tain

ing

lab

eled

dat

aco

uld

be

exp

ensi

ve.

La-

bel

edd

ata

isn

eed

edfo

rp

erso

nal

iiza

tio

n..

.– T

his

issu

eis

exh

orb

ated

,in

alse

ttin

sin

wh

ich

...

Th

isp

hen

om

ena

occ

urs

freq

uen

tly

du

rin

gth

eea

rly

stag

eso

fth

ele

arn

ing

pro

cess

[7],

[6],

or

ina

no

n-s

tati

on

ary

env

i-ro

nm

ent

inw

hic

hch

ang

esm

ayo

ccu

rin

the

un

der

lyin

gm

od

el[2

]. – Co

ntr

ibu

tio

ns

gra

die

nt

des

cen

t..

.(e

xce

pt

the

nu

mb

ero

fsa

mp

les

we

can

mak

eis

ver

ysm

all)

—a)

Stat

eth

epr

oble

m:

b)Sa

yw

hyit

’san

inte

rest

ing

prob

lem

:N

ot

all

of

the

ou

tlie

rsar

eb

adc)

Say

wha

tyo

urso

luti

onac

hiev

es:

d)Sa

yw

hat

foll

ows

from

your

solu

tion

:If

we

dis

card

ou

tlie

rs,

we

mig

ht

be

dis

card

ing

mo

stin

form

ativ

ed

ata

po

ints

==

==

Th

eg

oal

of

mac

hin

ele

arn

ing

isto

lear

nan

accu

rate

pre

dic

tive

mo

del

fro

mth

ed

ata.

Dat

ath

atis

inco

nsi

sten

tw

ith

the

lear

ned

mo

del

and

/or

exis

tin

gd

ata

isre

fere

dto

asan

ou

tlie

r.— L

earn

edm

od

elis

oft

enas

sum

edto

be

app

rox

imat

ely

cor-

rect

,th

eref

ore

usi

ng

ou

tlie

rsfo

rle

arn

ing

isco

nsi

der

edto

be

un

des

irea

ble

,an

dh

ence

ou

tlie

rste

nd

tob

eig

no

red

.B

yp

lay

ing

itsa

fean

dle

arn

ing

on

lyfr

om

con

sist

ent

dat

aju

stre

info

rces

ou

rb

elie

ves

inw

hat

we

thin

kis

corr

ect

(wh

ich

may

no

tn

ecce

sari

lyb

eac

cura

te);

inst

ead

of

try

ing

tole

arn

wh

atis

no

ty

etk

now

n,

<<

<T

OD

O:

nee

da

nic

eil

lust

rati

ve

exam

ple

>>

>—

-ju

stn

eed

stw

eak

ing

– AL

esti

mat

esh

owin

form

ativ

ep

oin

tsar

eo

utl

iers

are

con

sid

ered

no

tin

form

ativ

ew

eth

ink

that

they

are

ver

yin

form

ativ

e..

.—

acti

ve

lear

nin

gai

ms

ates

tim

atin

gh

owu

sefu

la

po

int

isfo

rle

arn

ing

.ty

pic

ally

ther

eis

alo

to

fu

nla

bel

edd

ata,

som

ed

ata

nee

ds

tob

ela

bel

ed— if

som

ep

oin

tis

inco

nsi

sten

t,it

may

po

ten

tial

lyb

em

uch

mo

rein

form

ativ

e;in

form

atio

nth

atco

nsi

sten

tp

oin

tsb

rin

gis

rath

erli

mit

edsi

nce

by

vir

tue

of

bei

ng

con

sist

ent,

this

info

rmat

ion

isal

read

yca

ptu

red

wit

hin

exis

tin

gd

ata/

mo

del

,an

dco

nsi

sten

po

int

mo

stly

rein

forc

esp

rio

rb

elie

fs(w

/och

ain

gth

eo

utc

om

es).

– ——

ou

tlie

rsar

ed

isca

rded

(sin

ceth

eyar

eco

nsi

der

edd

eter

imen

-ta

lto

lear

nin

gth

eu

nd

erly

ing

pat

tern

);u

nle

sso

bje

ctiv

eis

tole

arn

tod

etec

to

utl

iers

,re

fere

dto

asan

om

aly

det

ecti

on

[3],

e.g

...

.—

—A

L:

oft

enco

ntr

adic

tory

item

sar

eco

nsi

der

edto

be

ou

tlie

rsan

dar

eig

no

red

we

arg

ue

that

they

are

no

to

utl

iers

,bu

tm

ayin

dee

dco

nta

ina

lot

of

info

rmat

ion

htt

p:/

/jef

fjo

nas

.ty

pep

ad.c

om

/jef

f_jo

nas

/20

10

/11

/big

-dat

a-n

ew-p

hy

sics

.htm

l2

.B

add

ata

go

od

.M

ore

spec

ifica

lly,

nat

ura

lva

riab

ilit

yin

dat

ain

clu

din

gsp

elli

ng

erro

rs,

tran

spo

siti

on

erro

rs,

and

even

pro

fess

ion

ally

fab

rica

ted

lies

–al

lh

elp

ful.

Ab

itm

ore

abo

ut

this

her

e:It

Tu

rns

Ou

tB

oth

Bad

Dat

aan

da

Tea

spo

on

of

Dir

tM

ayB

eG

oo

dF

or

Yo

uan

dT

her

eIs

No

Su

chT

hin

gA

sA

Sin

gle

Ver

sio

no

fT

ruth

.—

II.

PR

OB

LE

MD

EF

INIT

ION

typ

eo

fo

utl

iers

:er

ror,

no

n-e

rro

r,..

.f(x

,✓)

III.

RE

LA

TE

DW

OR

KS

Act

ive

lear

nin

gan

do

utl

ier

det

ecti

on

has

bee

njo

intl

yst

ud

ied

bef

ore

.T

yp

ical

app

roac

his

tou

seo

utl

ier

det

ect

tore

move

sam

ple

sfr

om

acti

ve

lear

nin

gp

roce

ss:

e.g

.[?

]u

sed

ou

tlie

rd

etec

tio

nd

uri

ng

the

acti

ve

lear

nin

gto

iden

tify

and

rem

ove

the

sam

ple

sju

ged

tob

eo

utl

iers

.w

eu

seo

utl

ier

crit

erio

nas

anac

tive

lear

nin

gcr

iter

ion

;p

rev

iou

sw

ork

of

[1]

...

has

do

ne

ano

pp

osi

teu

sin

gac

tive

lear

nin

gcr

iter

ion

asan

ou

tlie

rcr

iter

ion

i.e.

ou

tlie

rd

etec

tio

nb

yac

tive

lear

nin

g;

we

pro

po

seth

eo

pp

osi

tyac

tive

lear

nin

gb

yo

utl

ier

det

ecti

on

...

QB

CV

Cd

imm

ensi

on

YC

han

ge

[ru

ben

s]is

sim

ilar

toco

ok

’sd

ista

nce

ou

tlie

rcr

iter

ion

...

ko

lmo

go

rov

/co

mp

ress

ion

:g

iven

that

un

der

lyin

gp

atte

rnh

asal

read

yb

een

lear

ned

;o

utl

ier

carr

ies

mo

read

dit

ion

alin

form

a-ti

on

Multiple Hypothesis

2

assumption that the current model is accurate, and requires justsome tweaking. However, if the current model is inaccurate,it should be changed significantly; instead of ignoring theincompatability and keep making minor tweaks.

—x1

x2

y

.by—Practicality:Due to abundance of data; one may mistakenly dismiss this

problem as impractical. While the ulabeled data is abundant,labeled data is rather scarce. Even if overal, the amount oflabeled data is large enough; there may still be a need foradditional labeled data as to enable personalization (a commonfocus).

Moreover obtaining labeled data could be expensive. La-beled data is needed for personaliization ...

–This issue is exhorbated, in al settins in which ...This phenomena occurs frequently during the early stages

of the learning process [7], [6], or in a non-stationary envi-ronment in which changes may occur in the underlying model[2].

–Contributionsgradient descent ... (except the number of samples we can

make is very small)—

a) State the problem:b) Say why it’s an interesting problem: Not all of the

outliers are badc) Say what your solution achieves:d) Say what follows from your solution: If we discard

outliers, we might be discarding most informative data points====The goal of machine learning is to learn an accurate

predictive model from the data. Data that is inconsistent withthe learned model and/or existing data is refered to as anoutlier.

—Learned model is often assumed to be approximately cor-

rect, therefore using outliers for learning is considered to beundesireable, and hence outliers tend to be ignored. By playingit safe and learning only from consistent data just reinforcesour believes in what we think is correct (which may notneccesarily be accurate); instead of trying to learn what isnot yet known,

<<< TODO: need a nice illustrative example >>>—-just needs tweaking–AL estimates how informative points areoutliers are considered not informativewe think that they are very informative ...——

active learning aims at estimating how useful a point is forlearning. typically there is a lot of unlabeled data, some dataneeds to be labeled

—if some point is inconsistent, it may potentially be much

more informative; information that consistent points bringis rather limited since by virtue of being consistent, thisinformation is already captured within existing data/model,and consisten point mostly reinforces prior beliefs (w/o chaingthe outcomes).

–——outliers are discarded (since they are considered deterimen-

tal to learning the underlying pattern);unless objective is to learn to detect outliers, refered to as

anomaly detection [3], e.g. ...——AL: often contradictory items are considered to be outliers

and are ignored we argue that they are not outliers, but mayindeed contain a lot of information

http://jeffjonas.typepad.com/jeff_jonas/2010/11/big-data-new-physics.html

2. Bad data good. More specifically, natural variability indata including spelling errors, transposition errors, and evenprofessionally fabricated lies – all helpful. A bit more aboutthis here:It Turns Out Both Bad Data and a Teaspoon of DirtMay Be Good For YouandThere Is No Such Thing As A SingleVersion of Truth.

II. PROBLEM DEFINITION

type of outliers: error, non-error, ...f(x, ✓)

III. RELATED WORKS

Active learning and outlier detection has been jointly studiedbefore.

Typical approach is to use outlier detect to remove samplesfrom active learning process: e.g. [?] used outlier detectionduring the active learning to identify and remove the samplesjuged to be outliers.

we use outlier criterion as an active learning criterion;previous work of [1] ... has done an opposite using activelearning criterion as an outlier criterion i.e. outlier detectionby active learning; we propose the opposity active learning byoutlier detection ...

QBCVC dimmensionYChange [rubens] is similar to cook’s distance outlier

criterion ...kolmogorov/compression: given that underlying pattern has

already been learned; outlier carries more additional informa-tion

2

assu

mp

tio

nth

atth

ecu

rren

tm

od

elis

accu

rate

,an

dre

qu

ires

just

som

etw

eak

ing

.H

owev

er,

ifth

ecu

rren

tm

od

elis

inac

cura

te,

itsh

ou

ldb

ech

ang

edsi

gn

ifica

ntl

y;

inst

ead

of

ign

ori

ng

the

inco

mp

atab

ilit

yan

dkee

pm

akin

gm

ino

rtw

eak

s.— x

1

x

2

y . by — P

ract

ical

ity

:D

ue

toab

un

dan

ceo

fd

ata;

on

em

aym

ista

ken

lyd

ism

iss

this

pro

ble

mas

imp

ract

ical

.W

hil

eth

eu

lab

eled

dat

ais

abu

nd

ant,

lab

eled

dat

ais

rath

ersc

arce

.E

ven

ifover

al,

the

amo

un

to

fla

bel

edd

ata

isla

rge

eno

ug

h;

ther

em

ayst

ill

be

an

eed

for

add

itio

nal

lab

eled

dat

aas

toen

able

per

son

aliz

atio

n(a

com

mo

nfo

cus)

.M

ore

over

ob

tain

ing

lab

eled

dat

aco

uld

be

exp

ensi

ve.

La-

bel

edd

ata

isn

eed

edfo

rp

erso

nal

iiza

tio

n..

.– T

his

issu

eis

exh

orb

ated

,in

alse

ttin

sin

wh

ich

...

Th

isp

hen

om

ena

occ

urs

freq

uen

tly

du

rin

gth

eea

rly

stag

eso

fth

ele

arn

ing

pro

cess

[7],

[6],

or

ina

no

n-s

tati

on

ary

env

i-ro

nm

ent

inw

hic

hch

ang

esm

ayo

ccu

rin

the

un

der

lyin

gm

odel

[2]. – C

on

trib

uti

on

sg

rad

ien

td

esce

nt

...

(ex

cep

tth

en

um

ber

of

sam

ple

sw

eca

nm

ake

isver

ysm

all)

—a)

Stat

eth

epr

oble

m:

b)Sa

yw

hyit

’san

inte

rest

ing

prob

lem

:N

ot

all

of

the

ou

tlie

rsar

eb

adc)

Say

wha

tyo

urso

luti

onac

hiev

es:

d)Sa

yw

hat

foll

ows

from

your

solu

tion

:If

we

dis

card

ou

tlie

rs,

we

mig

ht

be

dis

card

ing

mo

stin

form

ativ

ed

ata

po

ints

==

==

Th

eg

oal

of

mac

hin

ele

arn

ing

isto

lear

nan

accu

rate

pre

dic

tive

mo

del

fro

mth

ed

ata.

Dat

ath

atis

inco

nsi

sten

tw

ith

the

lear

ned

mo

del

and

/or

exis

tin

gd

ata

isre

fere

dto

asan

ou

tlie

r.— L

earn

edm

od

elis

oft

enas

sum

edto

be

app

rox

imat

ely

cor-

rect

,th

eref

ore

usi

ng

ou

tlie

rsfo

rle

arn

ing

isco

nsi

der

edto

be

un

des

irea

ble

,an

dh

ence

ou

tlie

rste

nd

tob

eig

no

red

.B

yp

lay

ing

itsa

fean

dle

arn

ing

on

lyfr

om

con

sist

ent

dat

aju

stre

info

rces

ou

rb

elie

ves

inw

hat

we

thin

kis

corr

ect

(wh

ich

may

no

tn

ecce

sari

lyb

eac

cura

te);

inst

ead

of

try

ing

tole

arn

wh

atis

no

ty

etk

now

n,

<<

<T

OD

O:

nee

da

nic

eil

lust

rati

ve

exam

ple

>>

>—

-ju

stn

eed

stw

eak

ing

– AL

esti

mat

esh

owin

form

ativ

ep

oin

tsar

eo

utl

iers

are

con

sid

ered

no

tin

form

ativ

ew

eth

ink

that

they

are

ver

yin

form

ativ

e..

.—

acti

ve

lear

nin

gai

ms

ates

tim

atin

gh

owu

sefu

la

po

int

isfo

rle

arn

ing

.ty

pic

ally

ther

eis

alo

to

fu

nla

bel

edd

ata,

som

ed

ata

nee

ds

tob

ela

bel

ed— if

som

ep

oin

tis

inco

nsi

sten

t,it

may

po

ten

tial

lyb

em

uch

mo

rein

form

ativ

e;in

form

atio

nth

atco

nsi

sten

tp

oin

tsb

rin

gis

rath

erli

mit

edsi

nce

by

vir

tue

of

bei

ng

con

sist

ent,

this

info

rmat

ion

isal

read

yca

ptu

red

wit

hin

exis

tin

gd

ata/

mo

del

,an

dco

nsi

sten

po

int

mo

stly

rein

forc

esp

rio

rb

elie

fs(w

/och

ain

gth

eo

utc

om

es).

– ——

ou

tlie

rsar

ed

isca

rded

(sin

ceth

eyar

eco

nsi

der

edd

eter

imen

-ta

lto

lear

nin

gth

eu

nd

erly

ing

pat

tern

);u

nle

sso

bje

ctiv

eis

tole

arn

tod

etec

to

utl

iers

,re

fere

dto

asan

om

aly

det

ecti

on

[3],

e.g

...

.—

—A

L:

oft

enco

ntr

adic

tory

item

sar

eco

nsi

der

edto

be

ou

tlie

rsan

dar

eig

no

red

we

arg

ue

that

they

are

no

to

utl

iers

,bu

tm

ayin

dee

dco

nta

ina

lot

of

info

rmat

ion

htt

p:/

/jef

fjo

nas

.ty

pep

ad.c

om

/jef

f_jo

nas

/20

10

/11

/big

-dat

a-n

ew-p

hy

sics

.htm

l2

.B

add

ata

go

od

.M

ore

spec

ifica

lly,

nat

ura

lva

riab

ilit

yin

dat

ain

clu

din

gsp

elli

ng

erro

rs,

tran

spo

siti

on

erro

rs,

and

even

pro

fess

ion

ally

fab

rica

ted

lies

–al

lh

elp

ful.

Ab

itm

ore

abo

ut

this

her

e:It

Tu

rns

Ou

tB

oth

Bad

Dat

aan

da

Tea

spo

on

of

Dir

tM

ayB

eG

oo

dF

or

Yo

uan

dT

her

eIs

No

Su

chT

hin

gA

sA

Sin

gle

Ver

sio

no

fT

ruth

.—

II.

PR

OB

LE

MD

EF

INIT

ION

typ

eo

fo

utl

iers

:er

ror,

no

n-e

rro

r,..

.f(x

,✓)

III.

RE

LA

TE

DW

OR

KS

Act

ive

lear

nin

gan

do

utl

ier

det

ecti

on

has

bee

njo

intl

yst

ud

ied

bef

ore

.T

yp

ical

app

roac

his

tou

seo

utl

ier

det

ect

tore

move

sam

ple

sfr

om

acti

ve

lear

nin

gp

roce

ss:

e.g

.[?

]u

sed

ou

tlie

rd

etec

tio

nd

uri

ng

the

acti

ve

lear

nin

gto

iden

tify

and

rem

ove

the

sam

ple

sju

ged

tob

eo

utl

iers

.w

eu

seo

utl

ier

crit

erio

nas

anac

tive

lear

nin

gcr

iter

ion

;p

rev

iou

sw

ork

of

[1]

...

has

do

ne

ano

pp

osi

teu

sin

gac

tive

lear

nin

gcr

iter

ion

asan

ou

tlie

rcr

iter

ion

i.e.

ou

tlie

rd

etec

tio

nb

yac

tive

lear

nin

g;

we

pro

po

seth

eo

pp

osi

tyac

tive

lear

nin

gb

yo

utl

ier

det

ecti

on

...

QB

CV

Cd

imm

ensi

on

YC

han

ge

[ru

ben

s]is

sim

ilar

toco

ok

’sd

ista

nce

ou

tlie

rcr

iter

ion

...

ko

lmo

go

rov

/co

mp

ress

ion

:g

iven

that

un

der

lyin

gp

atte

rnh

asal

read

yb

een

lear

ned

;o

utl

ier

carr

ies

mo

read

dit

ion

alin

form

a-ti

on

Hypothesis/Model Selection

Page 10: Outliers and Inconsistency

2

assumption that the current model is accurate, and requires justsome tweaking. However, if the current model is inaccurate,it should be changed significantly; instead of ignoring theincompatability and keep making minor tweaks.

—x1

x2

y

.by—Practicality:Due to abundance of data; one may mistakenly dismiss this

problem as impractical. While the ulabeled data is abundant,labeled data is rather scarce. Even if overal, the amount oflabeled data is large enough; there may still be a need foradditional labeled data as to enable personalization (a commonfocus).

Moreover obtaining labeled data could be expensive. La-beled data is needed for personaliization ...

–This issue is exhorbated, in al settins in which ...This phenomena occurs frequently during the early stages

of the learning process [7], [6], or in a non-stationary envi-ronment in which changes may occur in the underlying model[2].

–Contributionsgradient descent ... (except the number of samples we can

make is very small)—

a) State the problem:b) Say why it’s an interesting problem: Not all of the

outliers are badc) Say what your solution achieves:d) Say what follows from your solution: If we discard

outliers, we might be discarding most informative data points====The goal of machine learning is to learn an accurate

predictive model from the data. Data that is inconsistent withthe learned model and/or existing data is refered to as anoutlier.

—Learned model is often assumed to be approximately cor-

rect, therefore using outliers for learning is considered to beundesireable, and hence outliers tend to be ignored. By playingit safe and learning only from consistent data just reinforcesour believes in what we think is correct (which may notneccesarily be accurate); instead of trying to learn what isnot yet known,

<<< TODO: need a nice illustrative example >>>—-just needs tweaking–AL estimates how informative points areoutliers are considered not informativewe think that they are very informative ...——

active learning aims at estimating how useful a point is forlearning. typically there is a lot of unlabeled data, some dataneeds to be labeled

—if some point is inconsistent, it may potentially be much

more informative; information that consistent points bringis rather limited since by virtue of being consistent, thisinformation is already captured within existing data/model,and consisten point mostly reinforces prior beliefs (w/o chaingthe outcomes).

–——outliers are discarded (since they are considered deterimen-

tal to learning the underlying pattern);unless objective is to learn to detect outliers, refered to as

anomaly detection [3], e.g. ...——AL: often contradictory items are considered to be outliers

and are ignored we argue that they are not outliers, but mayindeed contain a lot of information

http://jeffjonas.typepad.com/jeff_jonas/2010/11/big-data-new-physics.html

2. Bad data good. More specifically, natural variability indata including spelling errors, transposition errors, and evenprofessionally fabricated lies – all helpful. A bit more aboutthis here:It Turns Out Both Bad Data and a Teaspoon of DirtMay Be Good For YouandThere Is No Such Thing As A SingleVersion of Truth.

II. PROBLEM DEFINITION

type of outliers: error, non-error, ...f(x, ✓)

III. RELATED WORKS

Active learning and outlier detection has been jointly studiedbefore.

Typical approach is to use outlier detect to remove samplesfrom active learning process: e.g. [?] used outlier detectionduring the active learning to identify and remove the samplesjuged to be outliers.

we use outlier criterion as an active learning criterion;previous work of [1] ... has done an opposite using activelearning criterion as an outlier criterion i.e. outlier detectionby active learning; we propose the opposity active learning byoutlier detection ...

QBCVC dimmensionYChange [rubens] is similar to cook’s distance outlier

criterion ...kolmogorov/compression: given that underlying pattern has

already been learned; outlier carries more additional informa-tion

2

assu

mpti

on

that

the

curr

ent

model

isac

cura

te,an

dre

quir

esju

stso

me

twea

kin

g.

How

ever

,if

the

curr

ent

model

isin

accu

rate

,it

should

be

chan

ged

sig

nifi

cantl

y;

inst

ead

of

ignori

ng

the

inco

mpat

abil

ity

and

kee

pm

akin

gm

inor

twea

ks.

— x

1

x

2

y . by — P

ract

ical

ity:

Due

toab

undan

ceof

dat

a;one

may

mis

taken

lydis

mis

sth

ispro

ble

mas

impra

ctic

al.

Whil

eth

eula

bel

eddat

ais

abundan

t,la

bel

eddat

ais

rath

ersc

arce

.E

ven

ifover

al,

the

amou

nt

of

label

eddat

ais

larg

een

ough;

ther

em

ayst

ill

be

anee

dfo

rad

dit

ional

label

eddat

aas

toen

able

per

son

aliz

atio

n(a

com

mon

focu

s).

More

over

obta

inin

gla

bel

edd

ata

could

be

expen

sive.

La-

bel

eddat

ais

nee

ded

for

per

sonal

iiza

tion

...

– This

issu

eis

exh

orb

ated

,in

alse

ttin

sin

whic

h...

This

phen

om

ena

occ

urs

freq

uen

tly

duri

ng

the

earl

yst

ages

of

the

lear

nin

gpro

cess

[7],

[6],

or

ina

non-s

tati

on

ary

envi-

ronm

ent

inw

hic

hch

anges

may

occ

ur

inth

eunder

lyin

gm

od

el[2

]. – Contr

ibuti

ons

gra

die

nt

des

cent

...

(exce

pt

the

num

ber

of

sam

ple

sw

eca

nm

ake

isver

ysm

all)

—a)

Stat

eth

epr

oble

m:

b)Sa

yw

hyit

’san

inte

rest

ing

prob

lem

:N

ot

all

of

the

outl

iers

are

bad

c)Sa

yw

hat

your

solu

tion

achi

eves

:d)

Say

wha

tfo

llow

sfr

omyo

urso

luti

on:

Ifw

ed

isca

rdoutl

iers

,w

em

ight

be

dis

card

ing

most

info

rmat

ive

dat

apoin

ts=

==

=T

he

goal

of

mac

hin

ele

arnin

gis

tole

arn

anac

cura

tepre

dic

tive

model

from

the

dat

a.D

ata

that

isin

consi

sten

tw

ith

the

lear

ned

mod

elan

d/o

rex

isti

ng

dat

ais

refe

red

toas

anoutl

ier.

— Lea

rned

model

iso

ften

assu

med

tobe

appro

xim

atel

yco

r-re

ct,

ther

efore

usi

ng

outl

iers

for

lear

nin

gis

consi

der

edto

be

undes

irea

ble

,an

dhen

ceoutl

iers

ten

dto

be

igno

red.B

ypla

yin

git

safe

and

lear

nin

gonly

from

con

sist

ent

dat

aju

stre

info

rces

our

bel

ieves

inw

hat

we

thin

kis

corr

ect

(whic

hm

aynot

nec

cesa

rily

be

accu

rate

);in

stea

dof

tryin

gto

lear

nw

hat

isnot

yet

know

n,

<<

<T

OD

O:

nee

da

nic

eil

lust

rati

ve

exam

ple

>>

>—

-ju

stnee

ds

twea

kin

g– A

Les

tim

ates

how

info

rmat

ive

poin

tsar

eoutl

iers

are

consi

der

ednot

info

rmat

ive

we

thin

kth

atth

eyar

ever

yin

form

ativ

e...

——

acti

ve

lear

nin

gai

ms

ates

tim

atin

ghow

use

ful

ap

oin

tis

for

lear

nin

g.

typic

ally

ther

eis

alo

tof

unla

bel

eddat

a,so

me

dat

anee

ds

tobe

label

ed— if

som

epoin

tis

inco

nsi

sten

t,it

may

pote

nti

ally

be

much

more

info

rmat

ive;

info

rmat

ion

that

consi

sten

tpo

ints

bri

ng

isra

ther

lim

ited

since

by

vir

tue

of

bei

ng

con

sist

ent,

this

info

rmat

ion

isal

read

yca

ptu

red

wit

hin

exis

ting

dat

a/m

odel

,an

dco

nsi

sten

poin

tm

ost

lyre

info

rces

pri

or

bel

iefs

(w/o

chai

ng

the

outc

om

es).

– ——

outl

iers

are

dis

card

ed(s

ince

they

are

consi

der

eddet

erim

en-

tal

tole

arnin

gth

eu

nder

lyin

gpat

tern

);unle

ssobje

ctiv

eis

tole

arn

todet

ect

outl

iers

,re

fere

dto

asan

om

aly

det

ecti

on

[3],

e.g.

...

——

AL

:oft

enco

ntr

adic

tory

item

sar

eco

nsi

der

edto

be

outl

iers

and

are

ignore

dw

ear

gue

that

they

are

not

outl

iers

,bu

tm

ayin

dee

dco

nta

ina

lot

of

info

rmat

ion

htt

p:/

/jef

fjonas

.typep

ad.c

om

/jef

f_jo

nas

/2010/1

1/b

ig-d

ata-

new

-physi

cs.h

tml

2.

Bad

dat

agood.

More

spec

ifica

lly,

nat

ura

lva

riab

ilit

yin

dat

ain

cludin

gsp

elli

ng

erro

rs,

tran

sposi

tion

erro

rs,

and

even

pro

fess

ional

lyfa

bri

cate

dli

es–

all

hel

pfu

l.A

bit

mo

reab

ou

tth

isher

e:It

Turn

sO

ut

Both

Bad

Dat

aan

da

Tea

spoon

of

Dir

tM

ayB

eG

ood

For

Youan

dT

her

eIs

No

Such

Thin

gA

sA

Sin

gle

Ver

sion

of

Tru

th.

II.

PR

OB

LE

MD

EF

INIT

ION

type

of

outl

iers

:er

ror,

non-e

rror,

...

f(x

,✓)

III.

RE

LA

TE

DW

OR

KS

Act

ive

lear

nin

gan

dou

tlie

rdet

ecti

on

has

bee

njo

intl

yst

ud

ied

bef

ore

.T

ypic

alap

pro

ach

isto

use

outl

ier

det

ect

tore

move

sam

ple

sfr

om

acti

ve

lear

nin

gpro

cess

:e.

g.

[?]

use

doutl

ier

det

ecti

on

duri

ng

the

acti

ve

lear

nin

gto

iden

tify

and

rem

ove

the

sam

ple

sju

ged

tobe

outl

iers

.w

euse

outl

ier

crit

erio

nas

anac

tive

lear

nin

gcr

iter

ion

;pre

vio

us

work

of

[1]

...

has

done

anopposi

teu

sing

acti

ve

lear

nin

gcr

iter

ion

asan

outl

ier

crit

erio

ni.

e.outl

ier

det

ecti

on

by

acti

ve

lear

nin

g;

we

pro

pose

the

opposi

tyac

tive

lear

nin

gby

outl

ier

det

ecti

on

...

QB

CV

Cdim

men

sion

YC

han

ge

[ruben

s]is

sim

ilar

toco

ok’s

dis

tance

outl

ier

crit

erio

n..

.kolm

ogoro

v/c

om

pre

ssio

n:

giv

enth

atunder

lyin

gp

atte

rnhas

alre

ady

bee

nle

arned

;ou

tlie

rca

rrie

sm

ore

addit

ional

info

rma-

tion

Consistent SampleDoes not allow to reduce # of hypothesesLittle is learned

2

assumption that the current model is accurate, and requires justsome tweaking. However, if the current model is inaccurate,it should be changed significantly; instead of ignoring theincompatability and keep making minor tweaks.

—x1

x2

y

.by—Practicality:Due to abundance of data; one may mistakenly dismiss this

problem as impractical. While the ulabeled data is abundant,labeled data is rather scarce. Even if overal, the amount oflabeled data is large enough; there may still be a need foradditional labeled data as to enable personalization (a commonfocus).

Moreover obtaining labeled data could be expensive. La-beled data is needed for personaliization ...

–This issue is exhorbated, in al settins in which ...This phenomena occurs frequently during the early stages

of the learning process [7], [6], or in a non-stationary envi-ronment in which changes may occur in the underlying model[2].

–Contributionsgradient descent ... (except the number of samples we can

make is very small)—

a) State the problem:b) Say why it’s an interesting problem: Not all of the

outliers are badc) Say what your solution achieves:d) Say what follows from your solution: If we discard

outliers, we might be discarding most informative data points====The goal of machine learning is to learn an accurate

predictive model from the data. Data that is inconsistent withthe learned model and/or existing data is refered to as anoutlier.

—Learned model is often assumed to be approximately cor-

rect, therefore using outliers for learning is considered to beundesireable, and hence outliers tend to be ignored. By playingit safe and learning only from consistent data just reinforcesour believes in what we think is correct (which may notneccesarily be accurate); instead of trying to learn what isnot yet known,

<<< TODO: need a nice illustrative example >>>—-just needs tweaking–AL estimates how informative points areoutliers are considered not informativewe think that they are very informative ...——

active learning aims at estimating how useful a point is forlearning. typically there is a lot of unlabeled data, some dataneeds to be labeled

—if some point is inconsistent, it may potentially be much

more informative; information that consistent points bringis rather limited since by virtue of being consistent, thisinformation is already captured within existing data/model,and consisten point mostly reinforces prior beliefs (w/o chaingthe outcomes).

–——outliers are discarded (since they are considered deterimen-

tal to learning the underlying pattern);unless objective is to learn to detect outliers, refered to as

anomaly detection [3], e.g. ...——AL: often contradictory items are considered to be outliers

and are ignored we argue that they are not outliers, but mayindeed contain a lot of information

http://jeffjonas.typepad.com/jeff_jonas/2010/11/big-data-new-physics.html

2. Bad data good. More specifically, natural variability indata including spelling errors, transposition errors, and evenprofessionally fabricated lies – all helpful. A bit more aboutthis here:It Turns Out Both Bad Data and a Teaspoon of DirtMay Be Good For YouandThere Is No Such Thing As A SingleVersion of Truth.

II. PROBLEM DEFINITION

type of outliers: error, non-error, ...f(x, ✓)

III. RELATED WORKS

Active learning and outlier detection has been jointly studiedbefore.

Typical approach is to use outlier detect to remove samplesfrom active learning process: e.g. [?] used outlier detectionduring the active learning to identify and remove the samplesjuged to be outliers.

we use outlier criterion as an active learning criterion;previous work of [1] ... has done an opposite using activelearning criterion as an outlier criterion i.e. outlier detectionby active learning; we propose the opposity active learning byoutlier detection ...

QBCVC dimmensionYChange [rubens] is similar to cook’s distance outlier

criterion ...kolmogorov/compression: given that underlying pattern has

already been learned; outlier carries more additional informa-tion

2

assu

mpti

on

that

the

curr

ent

model

isac

cura

te,an

dre

quir

esju

stso

me

twea

kin

g.

How

ever

,if

the

curr

ent

model

isin

accu

rate

,it

should

be

chan

ged

signifi

cantl

y;

inst

ead

of

ignori

ng

the

inco

mpat

abil

ity

and

kee

pm

akin

gm

inor

twea

ks.

— x

1

x

2

y . by — P

ract

ical

ity:

Due

toab

undan

ceof

dat

a;one

may

mis

taken

lydis

mis

sth

ispro

ble

mas

impra

ctic

al.

Whil

eth

eula

bel

eddat

ais

abundan

t,la

bel

eddat

ais

rath

ersc

arce

.E

ven

ifover

al,

the

amount

of

label

eddat

ais

larg

een

ough;

ther

em

ayst

ill

be

anee

dfo

rad

dit

ional

label

eddat

aas

toen

able

per

sonal

izat

ion

(aco

mm

on

focu

s).

More

over

obta

inin

gla

bel

eddat

aco

uld

be

expen

sive.

La-

bel

eddat

ais

nee

ded

for

per

sonal

iiza

tion

...

– This

issu

eis

exhorb

ated

,in

alse

ttin

sin

whic

h...

This

phen

om

ena

occ

urs

freq

uen

tly

duri

ng

the

earl

yst

ages

of

the

lear

nin

gpro

cess

[7],

[6],

or

ina

non-s

tati

onar

yen

vi-

ronm

ent

inw

hic

hch

anges

may

occ

ur

inth

eunder

lyin

gm

od

el[2

]. – Contr

ibuti

ons

gra

die

nt

des

cent

...

(exce

pt

the

num

ber

of

sam

ple

sw

eca

nm

ake

isver

ysm

all)

—a)

Stat

eth

epr

oble

m:

b)Sa

yw

hyit

’san

inte

rest

ing

prob

lem

:N

ot

all

of

the

outl

iers

are

bad

c)Sa

yw

hat

your

solu

tion

achi

eves

:d)

Say

wha

tfo

llow

sfr

omyo

urso

luti

on:

Ifw

edis

card

outl

iers

,w

em

ight

be

dis

card

ing

most

info

rmat

ive

dat

apoin

ts=

==

=T

he

goal

of

mac

hin

ele

arnin

gis

tole

arn

anac

cura

tepre

dic

tive

model

from

the

dat

a.D

ata

that

isin

consi

sten

tw

ith

the

lear

ned

model

and/o

rex

isti

ng

dat

ais

refe

red

toas

anoutl

ier.

— Lea

rned

model

isoft

enas

sum

edto

be

appro

xim

atel

yco

r-re

ct,

ther

efore

usi

ng

outl

iers

for

lear

nin

gis

consi

der

edto

be

undes

irea

ble

,an

dhen

ceoutl

iers

tend

tobe

ignore

d.B

ypla

yin

git

safe

and

lear

nin

gonly

from

con

sist

ent

dat

aju

stre

info

rces

our

bel

ieves

inw

hat

we

thin

kis

corr

ect

(whic

hm

aynot

nec

cesa

rily

be

accu

rate

);in

stea

dof

tryin

gto

lear

nw

hat

isnot

yet

know

n,

<<

<T

OD

O:

nee

da

nic

eil

lust

rati

ve

exam

ple

>>

>—

-ju

stnee

ds

twea

kin

g– A

Les

tim

ates

how

info

rmat

ive

poin

tsar

eoutl

iers

are

consi

der

ednot

info

rmat

ive

we

thin

kth

atth

eyar

ever

yin

form

ativ

e...

——

acti

ve

lear

nin

gai

ms

ates

tim

atin

ghow

use

ful

ap

oin

tis

for

lear

nin

g.

typic

ally

ther

eis

alo

to

fun

label

eddat

a,so

me

dat

anee

ds

tobe

label

ed— if

som

epoin

tis

inco

nsi

sten

t,it

may

pote

nti

ally

be

much

more

info

rmat

ive;

info

rmat

ion

that

consi

sten

tpo

ints

bri

ng

isra

ther

lim

ited

since

by

vir

tue

of

bei

ng

con

sist

ent,

this

info

rmat

ion

isal

read

yca

ptu

red

wit

hin

exis

ting

dat

a/m

od

el,

and

consi

sten

poin

tm

ost

lyre

info

rces

pri

or

bel

iefs

(w/o

chai

ng

the

outc

om

es).

– ——

outl

iers

are

dis

card

ed(s

ince

they

are

consi

der

edd

eter

imen

-ta

lto

lear

nin

gth

eu

nder

lyin

gpat

tern

);unle

ssobje

ctiv

eis

tole

arn

todet

ect

outl

iers

,re

fere

dto

asan

om

aly

det

ecti

on

[3],

e.g.

...

——

AL

:oft

enco

ntr

adic

tory

item

sar

eco

nsi

der

edto

be

ou

tlie

rsan

dar

eig

nore

dw

ear

gue

that

they

are

not

outl

iers

,bu

tm

ayin

dee

dco

nta

ina

lot

of

info

rmat

ion

htt

p:/

/jef

fjonas

.typep

ad.c

om

/jef

f_jo

nas

/2010/1

1/b

ig-d

ata-

new

-physi

cs.h

tml

2.

Bad

dat

agood.

More

spec

ifica

lly,

nat

ura

lva

riab

ilit

yin

dat

ain

cludin

gsp

elli

ng

erro

rs,

tran

sposi

tion

erro

rs,

and

even

pro

fess

ional

lyfa

bri

cate

dli

es–

all

hel

pfu

l.A

bit

mo

reab

ou

tth

isher

e:It

Turn

sO

ut

Both

Bad

Dat

aan

da

Tea

spo

on

of

Dir

tM

ayB

eG

ood

For

Youan

dT

her

eIs

No

Such

Thin

gA

sA

Sin

gle

Ver

sion

of

Tru

th.

II.

PR

OB

LE

MD

EF

INIT

ION

type

of

outl

iers

:er

ror,

non-e

rror,

...

f(x

,✓)

III.

RE

LA

TE

DW

OR

KS

Act

ive

lear

nin

gan

dou

tlie

rdet

ecti

on

has

bee

njo

intl

yst

ud

ied

bef

ore

.T

ypic

alap

pro

ach

isto

use

outl

ier

det

ect

tore

move

sam

ple

sfr

om

acti

ve

lear

nin

gpro

cess

:e.

g.

[?]

use

doutl

ier

det

ecti

on

duri

ng

the

acti

ve

lear

nin

gto

iden

tify

and

rem

ove

the

sam

ple

sju

ged

tobe

outl

iers

.w

euse

outl

ier

crit

erio

nas

anac

tive

lear

nin

gcr

iter

ion

;pre

vio

us

work

of

[1]

...

has

done

ano

pposi

teu

sing

acti

ve

lear

nin

gcr

iter

ion

asan

outl

ier

crit

erio

ni.

e.outl

ier

det

ecti

on

by

acti

ve

lear

nin

g;

we

pro

pose

the

opposi

tyac

tive

lear

nin

gb

youtl

ier

det

ecti

on

...

QB

CV

Cdim

men

sion

YC

han

ge

[ruben

s]is

sim

ilar

toco

ok’s

dis

tan

ceo

utl

ier

crit

erio

n..

.kolm

ogoro

v/c

om

pre

ssio

n:

giv

enth

atunder

lyin

gp

atte

rnh

asal

read

ybee

nle

arned

;ou

tlie

rca

rrie

sm

ore

addit

ion

alin

form

a-ti

on

Inconsistent SampleWill not agree with some of the hypotheses(irregardless of the output values)

2

assumption that the current model is accurate, and requires justsome tweaking. However, if the current model is inaccurate,it should be changed significantly; instead of ignoring theincompatability and keep making minor tweaks.

—x1

x2

y

.by—Practicality:Due to abundance of data; one may mistakenly dismiss this

problem as impractical. While the ulabeled data is abundant,labeled data is rather scarce. Even if overal, the amount oflabeled data is large enough; there may still be a need foradditional labeled data as to enable personalization (a commonfocus).

Moreover obtaining labeled data could be expensive. La-beled data is needed for personaliization ...

–This issue is exhorbated, in al settins in which ...This phenomena occurs frequently during the early stages

of the learning process [7], [6], or in a non-stationary envi-ronment in which changes may occur in the underlying model[2].

–Contributionsgradient descent ... (except the number of samples we can

make is very small)—

a) State the problem:b) Say why it’s an interesting problem: Not all of the

outliers are badc) Say what your solution achieves:d) Say what follows from your solution: If we discard

outliers, we might be discarding most informative data points====The goal of machine learning is to learn an accurate

predictive model from the data. Data that is inconsistent withthe learned model and/or existing data is refered to as anoutlier.

—Learned model is often assumed to be approximately cor-

rect, therefore using outliers for learning is considered to beundesireable, and hence outliers tend to be ignored. By playingit safe and learning only from consistent data just reinforcesour believes in what we think is correct (which may notneccesarily be accurate); instead of trying to learn what isnot yet known,

<<< TODO: need a nice illustrative example >>>—-just needs tweaking–AL estimates how informative points areoutliers are considered not informativewe think that they are very informative ...——

active learning aims at estimating how useful a point is forlearning. typically there is a lot of unlabeled data, some dataneeds to be labeled

—if some point is inconsistent, it may potentially be much

more informative; information that consistent points bringis rather limited since by virtue of being consistent, thisinformation is already captured within existing data/model,and consisten point mostly reinforces prior beliefs (w/o chaingthe outcomes).

–——outliers are discarded (since they are considered deterimen-

tal to learning the underlying pattern);unless objective is to learn to detect outliers, refered to as

anomaly detection [3], e.g. ...——AL: often contradictory items are considered to be outliers

and are ignored we argue that they are not outliers, but mayindeed contain a lot of information

http://jeffjonas.typepad.com/jeff_jonas/2010/11/big-data-new-physics.html

2. Bad data good. More specifically, natural variability indata including spelling errors, transposition errors, and evenprofessionally fabricated lies – all helpful. A bit more aboutthis here:It Turns Out Both Bad Data and a Teaspoon of DirtMay Be Good For YouandThere Is No Such Thing As A SingleVersion of Truth.

II. PROBLEM DEFINITION

type of outliers: error, non-error, ...f(x, ✓)

III. RELATED WORKS

Active learning and outlier detection has been jointly studiedbefore.

Typical approach is to use outlier detect to remove samplesfrom active learning process: e.g. [?] used outlier detectionduring the active learning to identify and remove the samplesjuged to be outliers.

we use outlier criterion as an active learning criterion;previous work of [1] ... has done an opposite using activelearning criterion as an outlier criterion i.e. outlier detectionby active learning; we propose the opposity active learning byoutlier detection ...

QBCVC dimmensionYChange [rubens] is similar to cook’s distance outlier

criterion ...kolmogorov/compression: given that underlying pattern has

already been learned; outlier carries more additional informa-tion

2

assu

mpti

on

that

the

curr

ent

model

isac

cura

te,an

dre

quir

esju

stso

me

twea

kin

g.

How

ever

,if

the

curr

ent

model

isin

accu

rate

,it

should

be

chan

ged

signifi

cantl

y;

inst

ead

of

ignori

ng

the

inco

mpat

abil

ity

and

kee

pm

akin

gm

inor

twea

ks.

— x

1

x

2

y . by — P

ract

ical

ity:

Due

toab

undan

ceof

dat

a;one

may

mis

taken

lydis

mis

sth

ispro

ble

mas

impra

ctic

al.

Whil

eth

eula

bel

eddat

ais

abundan

t,la

bel

eddat

ais

rath

ersc

arce

.E

ven

ifover

al,

the

amount

of

label

eddat

ais

larg

een

ough;

ther

em

ayst

ill

be

anee

dfo

rad

dit

ional

label

eddat

aas

toen

able

per

sonal

izat

ion

(aco

mm

on

focu

s).

More

over

obta

inin

gla

bel

eddat

aco

uld

be

expen

sive.

La-

bel

eddat

ais

nee

ded

for

per

sonal

iiza

tion

...

– This

issu

eis

exhorb

ated

,in

alse

ttin

sin

whic

h...

This

phen

om

ena

occ

urs

freq

uen

tly

duri

ng

the

earl

yst

ages

of

the

lear

nin

gpro

cess

[7],

[6],

or

ina

non-s

tati

onar

yen

vi-

ronm

ent

inw

hic

hch

anges

may

occ

ur

inth

eunder

lyin

gm

od

el[2

]. – Contr

ibuti

ons

gra

die

nt

des

cent

...

(exce

pt

the

num

ber

of

sam

ple

sw

eca

nm

ake

isver

ysm

all)

—a)

Stat

eth

epr

oble

m:

b)Sa

yw

hyit

’san

inte

rest

ing

prob

lem

:N

ot

all

of

the

outl

iers

are

bad

c)Sa

yw

hat

your

solu

tion

achi

eves

:d)

Say

wha

tfo

llow

sfr

omyo

urso

luti

on:

Ifw

edis

card

outl

iers

,w

em

ight

be

dis

card

ing

most

info

rmat

ive

dat

apoin

ts=

==

=T

he

goal

of

mac

hin

ele

arnin

gis

tole

arn

anac

cura

tepre

dic

tive

model

from

the

dat

a.D

ata

that

isin

consi

sten

tw

ith

the

lear

ned

model

and/o

rex

isti

ng

dat

ais

refe

red

toas

anoutl

ier.

— Lea

rned

model

isoft

enas

sum

edto

be

appro

xim

atel

yco

r-re

ct,

ther

efore

usi

ng

outl

iers

for

lear

nin

gis

consi

der

edto

be

undes

irea

ble

,an

dhen

ceoutl

iers

tend

tobe

ignore

d.B

ypla

yin

git

safe

and

lear

nin

gonly

from

con

sist

ent

dat

aju

stre

info

rces

our

bel

ieves

inw

hat

we

thin

kis

corr

ect

(whic

hm

aynot

nec

cesa

rily

be

accu

rate

);in

stea

dof

tryin

gto

lear

nw

hat

isnot

yet

know

n,

<<

<T

OD

O:

nee

da

nic

eil

lust

rati

ve

exam

ple

>>

>—

-ju

stnee

ds

twea

kin

g– A

Les

tim

ates

how

info

rmat

ive

poin

tsar

eoutl

iers

are

consi

der

ednot

info

rmat

ive

we

thin

kth

atth

eyar

ever

yin

form

ativ

e...

——

acti

ve

lear

nin

gai

ms

ates

tim

atin

ghow

use

ful

ap

oin

tis

for

lear

nin

g.

typic

ally

ther

eis

alo

to

fun

label

eddat

a,so

me

dat

anee

ds

tobe

label

ed— if

som

epoin

tis

inco

nsi

sten

t,it

may

pote

nti

ally

be

much

more

info

rmat

ive;

info

rmat

ion

that

consi

sten

tpo

ints

bri

ng

isra

ther

lim

ited

since

by

vir

tue

of

bei

ng

con

sist

ent,

this

info

rmat

ion

isal

read

yca

ptu

red

wit

hin

exis

ting

dat

a/m

od

el,

and

consi

sten

poin

tm

ost

lyre

info

rces

pri

or

bel

iefs

(w/o

chai

ng

the

outc

om

es).

– ——

outl

iers

are

dis

card

ed(s

ince

they

are

consi

der

edd

eter

imen

-ta

lto

lear

nin

gth

eu

nder

lyin

gpat

tern

);unle

ssobje

ctiv

eis

tole

arn

todet

ect

outl

iers

,re

fere

dto

asan

om

aly

det

ecti

on

[3],

e.g.

...

——

AL

:oft

enco

ntr

adic

tory

item

sar

eco

nsi

der

edto

be

ou

tlie

rsan

dar

eig

nore

dw

ear

gue

that

they

are

not

outl

iers

,bu

tm

ayin

dee

dco

nta

ina

lot

of

info

rmat

ion

htt

p:/

/jef

fjonas

.typep

ad.c

om

/jef

f_jo

nas

/2010/1

1/b

ig-d

ata-

new

-physi

cs.h

tml

2.

Bad

dat

agood.

More

spec

ifica

lly,

nat

ura

lva

riab

ilit

yin

dat

ain

cludin

gsp

elli

ng

erro

rs,

tran

sposi

tion

erro

rs,

and

even

pro

fess

ional

lyfa

bri

cate

dli

es–

all

hel

pfu

l.A

bit

mo

reab

ou

tth

isher

e:It

Turn

sO

ut

Both

Bad

Dat

aan

da

Tea

spo

on

of

Dir

tM

ayB

eG

ood

For

Youan

dT

her

eIs

No

Such

Thin

gA

sA

Sin

gle

Ver

sion

of

Tru

th.

II.

PR

OB

LE

MD

EF

INIT

ION

type

of

outl

iers

:er

ror,

non-e

rror,

...

f(x

,✓)

III.

RE

LA

TE

DW

OR

KS

Act

ive

lear

nin

gan

dou

tlie

rdet

ecti

on

has

bee

njo

intl

yst

ud

ied

bef

ore

.T

ypic

alap

pro

ach

isto

use

outl

ier

det

ect

tore

move

sam

ple

sfr

om

acti

ve

lear

nin

gpro

cess

:e.

g.

[?]

use

doutl

ier

det

ecti

on

duri

ng

the

acti

ve

lear

nin

gto

iden

tify

and

rem

ove

the

sam

ple

sju

ged

tobe

outl

iers

.w

euse

outl

ier

crit

erio

nas

anac

tive

lear

nin

gcr

iter

ion

;pre

vio

us

work

of

[1]

...

has

done

ano

pposi

teu

sing

acti

ve

lear

nin

gcr

iter

ion

asan

outl

ier

crit

erio

ni.

e.outl

ier

det

ecti

on

by

acti

ve

lear

nin

g;

we

pro

pose

the

opposi

tyac

tive

lear

nin

gb

youtl

ier

det

ecti

on

...

QB

CV

Cdim

men

sion

YC

han

ge

[ruben

s]is

sim

ilar

toco

ok’s

dis

tan

ceo

utl

ier

crit

erio

n..

.kolm

ogoro

v/c

om

pre

ssio

n:

giv

enth

atunder

lyin

gp

atte

rnh

asal

read

ybee

nle

arned

;ou

tlie

rca

rrie

sm

ore

addit

ion

alin

form

a-ti

on

Inconsistent SampleNumber of hypotheses is reduced

Page 11: Outliers and Inconsistency

Rubens  et  al,  AJS  2011  

Page 12: Outliers and Inconsistency
Page 13: Outliers and Inconsistency

(a) under-fit (b) over-fit (c) appropriate fit

Figure 8: Dependence between model complexity and accuracy.

Figure 9: Training input points that are good for learning one model, are not necessary good for the other.

minX (T rain)

G(X (Train)). (25)

It would be beneficial to combine AL and MS since they share a common goal of minimizing the predictiveerror:

minX (T rain),M

G(X (Train), M). (26)

Ideally we would like to choose the model of appropriate complexity by a MS method and to choose the mostuseful training data by an AL method. However simply combining AL with MS in a batch manner, i.e. selectingall of the training points at once, may not be possible due to the following paradox:

• To select training input points by a standard AL method, a model must be fixed. In other words, MS hasalready been performed (see Figure 9).

• To select the model by a standard MS method, the training input points must be fixed and correspondingtraining output values must be gathered. In other words, AL has already been performed (see Figure 10).

As a result Batch AL selects training points for a randomly chosen model, but after the training points areobtained the model is selected once again, giving rise to the possibility that the training points will not be as

Unable to determine which model is more appropriate (Model Selection), untiltraining points have been obtained (Active Learning).

Figure 10: Dependence of Model Selection on Active Learning.15

Model Selection

If  there  is  no  inconsistency  between  the  training  and  tes#ng  data  then    the  most  complex  model  would  tend  be  selected.  

Page 14: Outliers and Inconsistency

Change  Detec#on  /  Model  Correc#on    

Is  inconsistency  caused  by  noise  (or  minor  factors)  or  by  changes  in  the  underlying  model  

hQp://www.iQvis.com/portals/0/images/ChangeDetec#on_3window.jpg  

hQp://www.lucieer.net/research/heard.html  

hQp://www.skyboximaging.com/solu#ons/applica#on/change-­‐detec#on  

hQp://www.sa#magingcorp.com/galleryimages/high-­‐resolu#on-­‐landsat-­‐satellite-­‐imagery-­‐oman.jpg  

–  Applica#ons:    medical  diagnos#cs,  intrusion  detec#on,  network  analysis,  finance  

Page 15: Outliers and Inconsistency

Conclusion  

•  Inconsistency  could  be  useful  for:  – Hypothesis  Learning  – Model  Selec#on  – Model  Correc#on  

Neil  Rubens  Assistant  Professor  Ac#ve  Intelligence  Group  Laboratory  for  Knowledge  Compu#ng  University  of  Electro-­‐Communica#ons  Tokyo,  Japan  

hQp://Ac#veIntelligence.org