outliers and inconsistency
DESCRIPTION
presentation at the Inconsistency Robustness Symposium 2011 at Stanford UniversityTRANSCRIPT
Inconsistency and Outliers Ac#ve Learning by Outlier Detec#on Inconsistency Robustness Symposium 2011
Neil Rubens Assistant Professor University of Electro-‐Communica#ons Tokyo, Japan
Outline
Inconsistency Robustness is a mul#-‐disciplinary issue. We discuss some of the aspect of Inconsistency Robustness from the perspec#ve of Machine Learning: • What is Inconsistency • Can Inconsistency be Useful • Measuring Inconsistency
Inconsistency/outlier: data that does not agree with the model.
Inconsistency-‐Outlier
Outlier Types
• Spa#al Outlier – unlabeled data
• Model Outlier – labeled data
Our Focus
Causes of Outliers
• Faulty data – Entry error, malfunc#on, etc.
• Incorrect Model
hQp://www.dkimages.com/discover/previews/852/20223083.JPG
• Chance/Devia#on
Our Focus
Typical Treatment of Outliers • Assume that the learned model is correct and discard points that don’t agree with the model
Atypical Treatment of Outliers
• Assume that data is right, and that the model is wrong
Our Focus
Obtaining Data could be “COSTLY”
Medicine:diagnosis: pain, time, $drug discovery: $$$, time
User Interaction:effort, time
Expertise Elicitation:$, time
2
assumption that the current model is accurate, and requires justsome tweaking. However, if the current model is inaccurate,it should be changed significantly; instead of ignoring theincompatability and keep making minor tweaks.
—x1
x2
y
.by—Practicality:Due to abundance of data; one may mistakenly dismiss this
problem as impractical. While the ulabeled data is abundant,labeled data is rather scarce. Even if overal, the amount oflabeled data is large enough; there may still be a need foradditional labeled data as to enable personalization (a commonfocus).
Moreover obtaining labeled data could be expensive. La-beled data is needed for personaliization ...
–This issue is exhorbated, in al settins in which ...This phenomena occurs frequently during the early stages
of the learning process [7], [6], or in a non-stationary envi-ronment in which changes may occur in the underlying model[2].
–Contributionsgradient descent ... (except the number of samples we can
make is very small)—
a) State the problem:b) Say why it’s an interesting problem: Not all of the
outliers are badc) Say what your solution achieves:d) Say what follows from your solution: If we discard
outliers, we might be discarding most informative data points====The goal of machine learning is to learn an accurate
predictive model from the data. Data that is inconsistent withthe learned model and/or existing data is refered to as anoutlier.
—Learned model is often assumed to be approximately cor-
rect, therefore using outliers for learning is considered to beundesireable, and hence outliers tend to be ignored. By playingit safe and learning only from consistent data just reinforcesour believes in what we think is correct (which may notneccesarily be accurate); instead of trying to learn what isnot yet known,
<<< TODO: need a nice illustrative example >>>—-just needs tweaking–AL estimates how informative points areoutliers are considered not informativewe think that they are very informative ...——
active learning aims at estimating how useful a point is forlearning. typically there is a lot of unlabeled data, some dataneeds to be labeled
—if some point is inconsistent, it may potentially be much
more informative; information that consistent points bringis rather limited since by virtue of being consistent, thisinformation is already captured within existing data/model,and consisten point mostly reinforces prior beliefs (w/o chaingthe outcomes).
–——outliers are discarded (since they are considered deterimen-
tal to learning the underlying pattern);unless objective is to learn to detect outliers, refered to as
anomaly detection [3], e.g. ...——AL: often contradictory items are considered to be outliers
and are ignored we argue that they are not outliers, but mayindeed contain a lot of information
http://jeffjonas.typepad.com/jeff_jonas/2010/11/big-data-new-physics.html
2. Bad data good. More specifically, natural variability indata including spelling errors, transposition errors, and evenprofessionally fabricated lies – all helpful. A bit more aboutthis here:It Turns Out Both Bad Data and a Teaspoon of DirtMay Be Good For YouandThere Is No Such Thing As A SingleVersion of Truth.
—
II. PROBLEM DEFINITION
type of outliers: error, non-error, ...f(x, ✓)
III. RELATED WORKS
Active learning and outlier detection has been jointly studiedbefore.
Typical approach is to use outlier detect to remove samplesfrom active learning process: e.g. [?] used outlier detectionduring the active learning to identify and remove the samplesjuged to be outliers.
we use outlier criterion as an active learning criterion;previous work of [1] ... has done an opposite using activelearning criterion as an outlier criterion i.e. outlier detectionby active learning; we propose the opposity active learning byoutlier detection ...
QBCVC dimmensionYChange [rubens] is similar to cook’s distance outlier
criterion ...kolmogorov/compression: given that underlying pattern has
already been learned; outlier carries more additional informa-tion
2
assu
mp
tio
nth
atth
ecu
rren
tm
od
elis
accu
rate
,an
dre
qu
ires
just
som
etw
eak
ing
.H
owev
er,
ifth
ecu
rren
tm
od
elis
inac
cura
te,
itsh
ou
ldb
ech
ang
edsi
gn
ifica
ntl
y;
inst
ead
of
ign
ori
ng
the
inco
mp
atab
ilit
yan
dkee
pm
akin
gm
ino
rtw
eak
s.— x
1
x
2
y . by — P
ract
ical
ity
:D
ue
toab
un
dan
ceo
fd
ata;
on
em
aym
ista
ken
lyd
ism
iss
this
pro
ble
mas
imp
ract
ical
.W
hil
eth
eu
lab
eled
dat
ais
abu
nd
ant,
lab
eled
dat
ais
rath
ersc
arce
.E
ven
ifover
al,
the
amo
un
to
fla
bel
edd
ata
isla
rge
eno
ug
h;
ther
em
ayst
ill
be
an
eed
for
add
itio
nal
lab
eled
dat
aas
toen
able
per
son
aliz
atio
n(a
com
mo
nfo
cus)
.M
ore
over
ob
tain
ing
lab
eled
dat
aco
uld
be
exp
ensi
ve.
La-
bel
edd
ata
isn
eed
edfo
rp
erso
nal
iiza
tio
n..
.– T
his
issu
eis
exh
orb
ated
,in
alse
ttin
sin
wh
ich
...
Th
isp
hen
om
ena
occ
urs
freq
uen
tly
du
rin
gth
eea
rly
stag
eso
fth
ele
arn
ing
pro
cess
[7],
[6],
or
ina
no
n-s
tati
on
ary
env
i-ro
nm
ent
inw
hic
hch
ang
esm
ayo
ccu
rin
the
un
der
lyin
gm
od
el[2
]. – Co
ntr
ibu
tio
ns
gra
die
nt
des
cen
t..
.(e
xce
pt
the
nu
mb
ero
fsa
mp
les
we
can
mak
eis
ver
ysm
all)
—a)
Stat
eth
epr
oble
m:
b)Sa
yw
hyit
’san
inte
rest
ing
prob
lem
:N
ot
all
of
the
ou
tlie
rsar
eb
adc)
Say
wha
tyo
urso
luti
onac
hiev
es:
d)Sa
yw
hat
foll
ows
from
your
solu
tion
:If
we
dis
card
ou
tlie
rs,
we
mig
ht
be
dis
card
ing
mo
stin
form
ativ
ed
ata
po
ints
==
==
Th
eg
oal
of
mac
hin
ele
arn
ing
isto
lear
nan
accu
rate
pre
dic
tive
mo
del
fro
mth
ed
ata.
Dat
ath
atis
inco
nsi
sten
tw
ith
the
lear
ned
mo
del
and
/or
exis
tin
gd
ata
isre
fere
dto
asan
ou
tlie
r.— L
earn
edm
od
elis
oft
enas
sum
edto
be
app
rox
imat
ely
cor-
rect
,th
eref
ore
usi
ng
ou
tlie
rsfo
rle
arn
ing
isco
nsi
der
edto
be
un
des
irea
ble
,an
dh
ence
ou
tlie
rste
nd
tob
eig
no
red
.B
yp
lay
ing
itsa
fean
dle
arn
ing
on
lyfr
om
con
sist
ent
dat
aju
stre
info
rces
ou
rb
elie
ves
inw
hat
we
thin
kis
corr
ect
(wh
ich
may
no
tn
ecce
sari
lyb
eac
cura
te);
inst
ead
of
try
ing
tole
arn
wh
atis
no
ty
etk
now
n,
<<
<T
OD
O:
nee
da
nic
eil
lust
rati
ve
exam
ple
>>
>—
-ju
stn
eed
stw
eak
ing
– AL
esti
mat
esh
owin
form
ativ
ep
oin
tsar
eo
utl
iers
are
con
sid
ered
no
tin
form
ativ
ew
eth
ink
that
they
are
ver
yin
form
ativ
e..
.—
—
acti
ve
lear
nin
gai
ms
ates
tim
atin
gh
owu
sefu
la
po
int
isfo
rle
arn
ing
.ty
pic
ally
ther
eis
alo
to
fu
nla
bel
edd
ata,
som
ed
ata
nee
ds
tob
ela
bel
ed— if
som
ep
oin
tis
inco
nsi
sten
t,it
may
po
ten
tial
lyb
em
uch
mo
rein
form
ativ
e;in
form
atio
nth
atco
nsi
sten
tp
oin
tsb
rin
gis
rath
erli
mit
edsi
nce
by
vir
tue
of
bei
ng
con
sist
ent,
this
info
rmat
ion
isal
read
yca
ptu
red
wit
hin
exis
tin
gd
ata/
mo
del
,an
dco
nsi
sten
po
int
mo
stly
rein
forc
esp
rio
rb
elie
fs(w
/och
ain
gth
eo
utc
om
es).
– ——
ou
tlie
rsar
ed
isca
rded
(sin
ceth
eyar
eco
nsi
der
edd
eter
imen
-ta
lto
lear
nin
gth
eu
nd
erly
ing
pat
tern
);u
nle
sso
bje
ctiv
eis
tole
arn
tod
etec
to
utl
iers
,re
fere
dto
asan
om
aly
det
ecti
on
[3],
e.g
...
.—
—A
L:
oft
enco
ntr
adic
tory
item
sar
eco
nsi
der
edto
be
ou
tlie
rsan
dar
eig
no
red
we
arg
ue
that
they
are
no
to
utl
iers
,bu
tm
ayin
dee
dco
nta
ina
lot
of
info
rmat
ion
htt
p:/
/jef
fjo
nas
.ty
pep
ad.c
om
/jef
f_jo
nas
/20
10
/11
/big
-dat
a-n
ew-p
hy
sics
.htm
l2
.B
add
ata
go
od
.M
ore
spec
ifica
lly,
nat
ura
lva
riab
ilit
yin
dat
ain
clu
din
gsp
elli
ng
erro
rs,
tran
spo
siti
on
erro
rs,
and
even
pro
fess
ion
ally
fab
rica
ted
lies
–al
lh
elp
ful.
Ab
itm
ore
abo
ut
this
her
e:It
Tu
rns
Ou
tB
oth
Bad
Dat
aan
da
Tea
spo
on
of
Dir
tM
ayB
eG
oo
dF
or
Yo
uan
dT
her
eIs
No
Su
chT
hin
gA
sA
Sin
gle
Ver
sio
no
fT
ruth
.—
II.
PR
OB
LE
MD
EF
INIT
ION
typ
eo
fo
utl
iers
:er
ror,
no
n-e
rro
r,..
.f(x
,✓)
III.
RE
LA
TE
DW
OR
KS
Act
ive
lear
nin
gan
do
utl
ier
det
ecti
on
has
bee
njo
intl
yst
ud
ied
bef
ore
.T
yp
ical
app
roac
his
tou
seo
utl
ier
det
ect
tore
move
sam
ple
sfr
om
acti
ve
lear
nin
gp
roce
ss:
e.g
.[?
]u
sed
ou
tlie
rd
etec
tio
nd
uri
ng
the
acti
ve
lear
nin
gto
iden
tify
and
rem
ove
the
sam
ple
sju
ged
tob
eo
utl
iers
.w
eu
seo
utl
ier
crit
erio
nas
anac
tive
lear
nin
gcr
iter
ion
;p
rev
iou
sw
ork
of
[1]
...
has
do
ne
ano
pp
osi
teu
sin
gac
tive
lear
nin
gcr
iter
ion
asan
ou
tlie
rcr
iter
ion
i.e.
ou
tlie
rd
etec
tio
nb
yac
tive
lear
nin
g;
we
pro
po
seth
eo
pp
osi
tyac
tive
lear
nin
gb
yo
utl
ier
det
ecti
on
...
QB
CV
Cd
imm
ensi
on
YC
han
ge
[ru
ben
s]is
sim
ilar
toco
ok
’sd
ista
nce
ou
tlie
rcr
iter
ion
...
ko
lmo
go
rov
/co
mp
ress
ion
:g
iven
that
un
der
lyin
gp
atte
rnh
asal
read
yb
een
lear
ned
;o
utl
ier
carr
ies
mo
read
dit
ion
alin
form
a-ti
on
2
assumption that the current model is accurate, and requires justsome tweaking. However, if the current model is inaccurate,it should be changed significantly; instead of ignoring theincompatability and keep making minor tweaks.
—x1
x2
y
.by—Practicality:Due to abundance of data; one may mistakenly dismiss this
problem as impractical. While the ulabeled data is abundant,labeled data is rather scarce. Even if overal, the amount oflabeled data is large enough; there may still be a need foradditional labeled data as to enable personalization (a commonfocus).
Moreover obtaining labeled data could be expensive. La-beled data is needed for personaliization ...
–This issue is exhorbated, in al settins in which ...This phenomena occurs frequently during the early stages
of the learning process [7], [6], or in a non-stationary envi-ronment in which changes may occur in the underlying model[2].
–Contributionsgradient descent ... (except the number of samples we can
make is very small)—
a) State the problem:b) Say why it’s an interesting problem: Not all of the
outliers are badc) Say what your solution achieves:d) Say what follows from your solution: If we discard
outliers, we might be discarding most informative data points====The goal of machine learning is to learn an accurate
predictive model from the data. Data that is inconsistent withthe learned model and/or existing data is refered to as anoutlier.
—Learned model is often assumed to be approximately cor-
rect, therefore using outliers for learning is considered to beundesireable, and hence outliers tend to be ignored. By playingit safe and learning only from consistent data just reinforcesour believes in what we think is correct (which may notneccesarily be accurate); instead of trying to learn what isnot yet known,
<<< TODO: need a nice illustrative example >>>—-just needs tweaking–AL estimates how informative points areoutliers are considered not informativewe think that they are very informative ...——
active learning aims at estimating how useful a point is forlearning. typically there is a lot of unlabeled data, some dataneeds to be labeled
—if some point is inconsistent, it may potentially be much
more informative; information that consistent points bringis rather limited since by virtue of being consistent, thisinformation is already captured within existing data/model,and consisten point mostly reinforces prior beliefs (w/o chaingthe outcomes).
–——outliers are discarded (since they are considered deterimen-
tal to learning the underlying pattern);unless objective is to learn to detect outliers, refered to as
anomaly detection [3], e.g. ...——AL: often contradictory items are considered to be outliers
and are ignored we argue that they are not outliers, but mayindeed contain a lot of information
http://jeffjonas.typepad.com/jeff_jonas/2010/11/big-data-new-physics.html
2. Bad data good. More specifically, natural variability indata including spelling errors, transposition errors, and evenprofessionally fabricated lies – all helpful. A bit more aboutthis here:It Turns Out Both Bad Data and a Teaspoon of DirtMay Be Good For YouandThere Is No Such Thing As A SingleVersion of Truth.
—
II. PROBLEM DEFINITION
type of outliers: error, non-error, ...f(x, ✓)
III. RELATED WORKS
Active learning and outlier detection has been jointly studiedbefore.
Typical approach is to use outlier detect to remove samplesfrom active learning process: e.g. [?] used outlier detectionduring the active learning to identify and remove the samplesjuged to be outliers.
we use outlier criterion as an active learning criterion;previous work of [1] ... has done an opposite using activelearning criterion as an outlier criterion i.e. outlier detectionby active learning; we propose the opposity active learning byoutlier detection ...
QBCVC dimmensionYChange [rubens] is similar to cook’s distance outlier
criterion ...kolmogorov/compression: given that underlying pattern has
already been learned; outlier carries more additional informa-tion
2
assu
mp
tio
nth
atth
ecu
rren
tm
od
elis
accu
rate
,an
dre
quir
esju
stso
me
twea
kin
g.
How
ever
,if
the
curr
ent
mo
del
isin
accu
rate
,it
sho
uld
be
chan
ged
sig
nifi
cantl
y;
inst
ead
of
ign
ori
ng
the
inco
mp
atab
ilit
yan
dkee
pm
akin
gm
ino
rtw
eak
s.— x
1
x
2
y . by — P
ract
ical
ity
:D
ue
toab
un
dan
ceo
fd
ata;
on
em
aym
ista
ken
lyd
ism
iss
this
pro
ble
mas
imp
ract
ical
.W
hil
eth
eu
lab
eled
dat
ais
abu
nd
ant,
lab
eled
dat
ais
rath
ersc
arce
.E
ven
ifover
al,
the
amo
un
to
fla
bel
edd
ata
isla
rge
eno
ug
h;
ther
em
ayst
ill
be
an
eed
for
add
itio
nal
lab
eled
dat
aas
toen
able
per
son
aliz
atio
n(a
com
mo
nfo
cus)
.M
ore
over
ob
tain
ing
lab
eled
dat
aco
uld
be
exp
ensi
ve.
La-
bel
edd
ata
isn
eed
edfo
rp
erso
nal
iiza
tio
n..
.– T
his
issu
eis
exh
orb
ated
,in
alse
ttin
sin
whic
h..
.T
his
ph
eno
men
ao
ccu
rsfr
equ
entl
yd
uri
ng
the
earl
yst
ages
of
the
lear
nin
gp
roce
ss[7
],[6
],o
rin
an
on
-sta
tio
nar
yen
vi-
ron
men
tin
wh
ich
chan
ges
may
occ
ur
inth
eu
nd
erly
ing
mo
del
[2]. – C
on
trib
uti
on
sg
rad
ien
td
esce
nt
...
(ex
cep
tth
en
um
ber
of
sam
ple
sw
eca
nm
ake
isver
ysm
all)
—a)
Stat
eth
epr
oble
m:
b)Sa
yw
hyit
’san
inte
rest
ing
prob
lem
:N
ot
all
of
the
ou
tlie
rsar
eb
adc)
Say
wha
tyo
urso
luti
onac
hiev
es:
d)Sa
yw
hat
foll
ows
from
your
solu
tion
:If
we
dis
card
ou
tlie
rs,
we
mig
ht
be
dis
card
ing
most
info
rmat
ive
dat
ap
oin
ts=
==
=T
he
go
alo
fm
ach
ine
lear
nin
gis
tole
arn
anac
cura
tep
red
icti
ve
mo
del
fro
mth
ed
ata.
Dat
ath
atis
inco
nsi
sten
tw
ith
the
lear
ned
mo
del
and
/or
exis
ting
dat
ais
refe
red
toas
ano
utl
ier.
— Lea
rned
mo
del
iso
ften
assu
med
tobe
app
rox
imat
ely
cor-
rect
,th
eref
ore
usi
ng
ou
tlie
rsfo
rle
arn
ing
isco
nsi
der
edto
be
un
des
irea
ble
,an
dh
ence
ou
tlie
rste
nd
tob
eig
no
red.B
ypla
yin
git
safe
and
lear
nin
go
nly
fro
mco
nsi
sten
td
ata
just
rein
forc
eso
ur
bel
ieves
inw
hat
we
thin
kis
corr
ect
(wh
ich
may
no
tn
ecce
sari
lyb
eac
cura
te);
inst
ead
of
try
ing
tole
arn
wh
atis
no
ty
etk
now
n,
<<
<T
OD
O:
nee
da
nic
eil
lust
rati
ve
exam
ple
>>
>—
-ju
stn
eed
stw
eak
ing
– AL
esti
mat
esh
owin
form
ativ
ep
oin
tsar
eo
utl
iers
are
con
sid
ered
no
tin
form
ativ
ew
eth
ink
that
they
are
ver
yin
form
ativ
e..
.—
—
acti
ve
lear
nin
gai
ms
ates
tim
atin
ghow
use
ful
ap
oin
tis
for
lear
nin
g.
typ
ical
lyth
ere
isa
lot
of
unla
bel
edd
ata,
som
ed
ata
nee
ds
tob
ela
bel
ed— if
som
ep
oin
tis
inco
nsi
sten
t,it
may
pote
nti
ally
be
mu
chm
ore
info
rmat
ive;
info
rmat
ion
that
con
sist
ent
po
ints
bri
ng
isra
ther
lim
ited
sin
ceby
vir
tue
of
bei
ng
con
sist
ent,
this
info
rmat
ion
isal
read
yca
ptu
red
wit
hin
exis
ting
dat
a/m
od
el,
and
con
sist
enp
oin
tm
ost
lyre
info
rces
pri
or
bel
iefs
(w/o
chai
ng
the
ou
tco
mes
).– —
—o
utl
iers
are
dis
card
ed(s
ince
they
are
consi
der
edd
eter
imen
-ta
lto
lear
nin
gth
eu
nd
erly
ing
pat
tern
);u
nle
sso
bje
ctiv
eis
tole
arn
tod
etec
tou
tlie
rs,
refe
red
toas
ano
mal
yd
etec
tio
n[3
],e.
g.
...
——
AL
:o
ften
con
trad
icto
ryit
ems
are
consi
der
edto
be
ou
tlie
rsan
dar
eig
no
red
we
arg
ue
that
they
are
no
toutl
iers
,bu
tm
ayin
dee
dco
nta
ina
lot
of
info
rmat
ion
htt
p:/
/jef
fjo
nas
.ty
pep
ad.c
om
/jef
f_jo
nas
/20
10/1
1/b
ig-d
ata-
new
-phy
sics
.htm
l2
.B
add
ata
go
od
.M
ore
spec
ifica
lly,
nat
ura
lva
riab
ilit
yin
dat
ain
clu
din
gsp
elli
ng
erro
rs,
tran
sposi
tion
erro
rs,
and
even
pro
fess
ion
ally
fab
rica
ted
lies
–al
lh
elpfu
l.A
bit
mo
reab
ou
tth
ish
ere:
ItT
urn
sO
ut
Bo
thB
adD
ata
and
aT
easp
oon
of
Dir
tM
ayB
eG
oo
dF
or
Yo
uan
dT
her
eIs
No
Such
Thin
gA
sA
Sin
gle
Ver
sio
no
fT
ruth
.—
II.
PR
OB
LE
MD
EF
INIT
ION
typ
eo
fo
utl
iers
:er
ror,
no
n-e
rror,
...
f(x
,✓)
III.
RE
LA
TE
DW
OR
KS
Act
ive
lear
nin
gan
do
utl
ier
det
ecti
on
has
bee
njo
intl
yst
ud
ied
bef
ore
.T
yp
ical
app
roac
his
touse
ou
tlie
rdet
ect
tore
move
sam
ple
sfr
om
acti
ve
lear
nin
gp
roce
ss:
e.g
.[?
]use
do
utl
ier
det
ecti
on
du
rin
gth
eac
tive
lear
nin
gto
iden
tify
and
rem
ove
the
sam
ple
sju
ged
tob
eo
utl
iers
.w
eu
seo
utl
ier
crit
erio
nas
anac
tive
lear
nin
gcr
iter
ion
;p
rev
iou
sw
ork
of
[1]
...
has
do
ne
anop
posi
teu
sin
gac
tive
lear
nin
gcr
iter
ion
asan
ou
tlie
rcr
iter
ion
i.e.
ou
tlie
rd
etec
tio
nb
yac
tive
lear
nin
g;
we
pro
po
seth
eo
pposi
tyac
tive
lear
nin
gby
ou
tlie
rd
etec
tio
n..
.Q
BC
VC
dim
men
sio
nY
Ch
ang
e[r
ub
ens]
issi
mil
arto
coo
k’s
dis
tan
ceo
utl
ier
crit
erio
n..
.ko
lmo
go
rov
/co
mp
ress
ion
:g
iven
that
und
erly
ing
pat
tern
has
alre
ady
bee
nle
arn
ed;
outl
ier
carr
ies
more
add
itio
nal
info
rma-
tio
n
Unlabeled Data
2
assumption that the current model is accurate, and requires justsome tweaking. However, if the current model is inaccurate,it should be changed significantly; instead of ignoring theincompatability and keep making minor tweaks.
—x1
x2
y
.by—Practicality:Due to abundance of data; one may mistakenly dismiss this
problem as impractical. While the ulabeled data is abundant,labeled data is rather scarce. Even if overal, the amount oflabeled data is large enough; there may still be a need foradditional labeled data as to enable personalization (a commonfocus).
Moreover obtaining labeled data could be expensive. La-beled data is needed for personaliization ...
–This issue is exhorbated, in al settins in which ...This phenomena occurs frequently during the early stages
of the learning process [7], [6], or in a non-stationary envi-ronment in which changes may occur in the underlying model[2].
–Contributionsgradient descent ... (except the number of samples we can
make is very small)—
a) State the problem:b) Say why it’s an interesting problem: Not all of the
outliers are badc) Say what your solution achieves:d) Say what follows from your solution: If we discard
outliers, we might be discarding most informative data points====The goal of machine learning is to learn an accurate
predictive model from the data. Data that is inconsistent withthe learned model and/or existing data is refered to as anoutlier.
—Learned model is often assumed to be approximately cor-
rect, therefore using outliers for learning is considered to beundesireable, and hence outliers tend to be ignored. By playingit safe and learning only from consistent data just reinforcesour believes in what we think is correct (which may notneccesarily be accurate); instead of trying to learn what isnot yet known,
<<< TODO: need a nice illustrative example >>>—-just needs tweaking–AL estimates how informative points areoutliers are considered not informativewe think that they are very informative ...——
active learning aims at estimating how useful a point is forlearning. typically there is a lot of unlabeled data, some dataneeds to be labeled
—if some point is inconsistent, it may potentially be much
more informative; information that consistent points bringis rather limited since by virtue of being consistent, thisinformation is already captured within existing data/model,and consisten point mostly reinforces prior beliefs (w/o chaingthe outcomes).
–——outliers are discarded (since they are considered deterimen-
tal to learning the underlying pattern);unless objective is to learn to detect outliers, refered to as
anomaly detection [3], e.g. ...——AL: often contradictory items are considered to be outliers
and are ignored we argue that they are not outliers, but mayindeed contain a lot of information
http://jeffjonas.typepad.com/jeff_jonas/2010/11/big-data-new-physics.html
2. Bad data good. More specifically, natural variability indata including spelling errors, transposition errors, and evenprofessionally fabricated lies – all helpful. A bit more aboutthis here:It Turns Out Both Bad Data and a Teaspoon of DirtMay Be Good For YouandThere Is No Such Thing As A SingleVersion of Truth.
—
II. PROBLEM DEFINITION
type of outliers: error, non-error, ...f(x, ✓)
III. RELATED WORKS
Active learning and outlier detection has been jointly studiedbefore.
Typical approach is to use outlier detect to remove samplesfrom active learning process: e.g. [?] used outlier detectionduring the active learning to identify and remove the samplesjuged to be outliers.
we use outlier criterion as an active learning criterion;previous work of [1] ... has done an opposite using activelearning criterion as an outlier criterion i.e. outlier detectionby active learning; we propose the opposity active learning byoutlier detection ...
QBCVC dimmensionYChange [rubens] is similar to cook’s distance outlier
criterion ...kolmogorov/compression: given that underlying pattern has
already been learned; outlier carries more additional informa-tion
2
assu
mp
tio
nth
atth
ecu
rren
tm
od
elis
accu
rate
,an
dre
quir
esju
stso
me
twea
kin
g.
How
ever
,if
the
curr
ent
mo
del
isin
accu
rate
,it
sho
uld
be
chan
ged
sig
nifi
cantl
y;
inst
ead
of
ign
ori
ng
the
inco
mp
atab
ilit
yan
dkee
pm
akin
gm
ino
rtw
eak
s.— x
1
x
2
y . by — P
ract
ical
ity
:D
ue
toab
un
dan
ceo
fd
ata;
on
em
aym
ista
ken
lyd
ism
iss
this
pro
ble
mas
imp
ract
ical
.W
hil
eth
eu
lab
eled
dat
ais
abu
nd
ant,
lab
eled
dat
ais
rath
ersc
arce
.E
ven
ifover
al,
the
amo
un
to
fla
bel
edd
ata
isla
rge
eno
ug
h;
ther
em
ayst
ill
be
an
eed
for
add
itio
nal
lab
eled
dat
aas
toen
able
per
son
aliz
atio
n(a
com
mo
nfo
cus)
.M
ore
over
ob
tain
ing
lab
eled
dat
aco
uld
be
exp
ensi
ve.
La-
bel
edd
ata
isn
eed
edfo
rp
erso
nal
iiza
tio
n..
.– T
his
issu
eis
exh
orb
ated
,in
alse
ttin
sin
whic
h..
.T
his
ph
eno
men
ao
ccu
rsfr
equ
entl
yd
uri
ng
the
earl
yst
ages
of
the
lear
nin
gp
roce
ss[7
],[6
],o
rin
an
on
-sta
tio
nar
yen
vi-
ron
men
tin
wh
ich
chan
ges
may
occ
ur
inth
eu
nd
erly
ing
mo
del
[2]. – C
on
trib
uti
on
sg
rad
ien
td
esce
nt
...
(ex
cep
tth
en
um
ber
of
sam
ple
sw
eca
nm
ake
isver
ysm
all)
—a)
Stat
eth
epr
oble
m:
b)Sa
yw
hyit
’san
inte
rest
ing
prob
lem
:N
ot
all
of
the
ou
tlie
rsar
eb
adc)
Say
wha
tyo
urso
luti
onac
hiev
es:
d)Sa
yw
hat
foll
ows
from
your
solu
tion
:If
we
dis
card
ou
tlie
rs,
we
mig
ht
be
dis
card
ing
most
info
rmat
ive
dat
ap
oin
ts=
==
=T
he
go
alo
fm
ach
ine
lear
nin
gis
tole
arn
anac
cura
tep
red
icti
ve
mo
del
fro
mth
ed
ata.
Dat
ath
atis
inco
nsi
sten
tw
ith
the
lear
ned
mo
del
and
/or
exis
ting
dat
ais
refe
red
toas
ano
utl
ier.
— Lea
rned
mo
del
iso
ften
assu
med
tobe
app
rox
imat
ely
cor-
rect
,th
eref
ore
usi
ng
ou
tlie
rsfo
rle
arn
ing
isco
nsi
der
edto
be
un
des
irea
ble
,an
dh
ence
ou
tlie
rste
nd
tob
eig
no
red.B
ypla
yin
git
safe
and
lear
nin
go
nly
fro
mco
nsi
sten
td
ata
just
rein
forc
eso
ur
bel
ieves
inw
hat
we
thin
kis
corr
ect
(wh
ich
may
no
tn
ecce
sari
lyb
eac
cura
te);
inst
ead
of
try
ing
tole
arn
wh
atis
no
ty
etk
now
n,
<<
<T
OD
O:
nee
da
nic
eil
lust
rati
ve
exam
ple
>>
>—
-ju
stn
eed
stw
eak
ing
– AL
esti
mat
esh
owin
form
ativ
ep
oin
tsar
eo
utl
iers
are
con
sid
ered
no
tin
form
ativ
ew
eth
ink
that
they
are
ver
yin
form
ativ
e..
.—
—
acti
ve
lear
nin
gai
ms
ates
tim
atin
ghow
use
ful
ap
oin
tis
for
lear
nin
g.
typ
ical
lyth
ere
isa
lot
of
unla
bel
edd
ata,
som
ed
ata
nee
ds
tob
ela
bel
ed— if
som
ep
oin
tis
inco
nsi
sten
t,it
may
pote
nti
ally
be
mu
chm
ore
info
rmat
ive;
info
rmat
ion
that
con
sist
ent
po
ints
bri
ng
isra
ther
lim
ited
sin
ceby
vir
tue
of
bei
ng
con
sist
ent,
this
info
rmat
ion
isal
read
yca
ptu
red
wit
hin
exis
ting
dat
a/m
od
el,
and
con
sist
enp
oin
tm
ost
lyre
info
rces
pri
or
bel
iefs
(w/o
chai
ng
the
ou
tco
mes
).– —
—o
utl
iers
are
dis
card
ed(s
ince
they
are
consi
der
edd
eter
imen
-ta
lto
lear
nin
gth
eu
nd
erly
ing
pat
tern
);u
nle
sso
bje
ctiv
eis
tole
arn
tod
etec
tou
tlie
rs,
refe
red
toas
ano
mal
yd
etec
tio
n[3
],e.
g.
...
——
AL
:o
ften
con
trad
icto
ryit
ems
are
consi
der
edto
be
ou
tlie
rsan
dar
eig
no
red
we
arg
ue
that
they
are
no
toutl
iers
,bu
tm
ayin
dee
dco
nta
ina
lot
of
info
rmat
ion
htt
p:/
/jef
fjo
nas
.ty
pep
ad.c
om
/jef
f_jo
nas
/20
10/1
1/b
ig-d
ata-
new
-phy
sics
.htm
l2
.B
add
ata
go
od
.M
ore
spec
ifica
lly,
nat
ura
lva
riab
ilit
yin
dat
ain
clu
din
gsp
elli
ng
erro
rs,
tran
sposi
tion
erro
rs,
and
even
pro
fess
ion
ally
fab
rica
ted
lies
–al
lh
elpfu
l.A
bit
mo
reab
ou
tth
ish
ere:
ItT
urn
sO
ut
Bo
thB
adD
ata
and
aT
easp
oon
of
Dir
tM
ayB
eG
oo
dF
or
Yo
uan
dT
her
eIs
No
Such
Thin
gA
sA
Sin
gle
Ver
sio
no
fT
ruth
.—
II.
PR
OB
LE
MD
EF
INIT
ION
typ
eo
fo
utl
iers
:er
ror,
no
n-e
rror,
...
f(x
,✓)
III.
RE
LA
TE
DW
OR
KS
Act
ive
lear
nin
gan
do
utl
ier
det
ecti
on
has
bee
njo
intl
yst
ud
ied
bef
ore
.T
yp
ical
app
roac
his
touse
ou
tlie
rdet
ect
tore
move
sam
ple
sfr
om
acti
ve
lear
nin
gp
roce
ss:
e.g
.[?
]use
do
utl
ier
det
ecti
on
du
rin
gth
eac
tive
lear
nin
gto
iden
tify
and
rem
ove
the
sam
ple
sju
ged
tob
eo
utl
iers
.w
eu
seo
utl
ier
crit
erio
nas
anac
tive
lear
nin
gcr
iter
ion
;p
rev
iou
sw
ork
of
[1]
...
has
do
ne
anop
posi
teu
sin
gac
tive
lear
nin
gcr
iter
ion
asan
ou
tlie
rcr
iter
ion
i.e.
ou
tlie
rd
etec
tio
nb
yac
tive
lear
nin
g;
we
pro
po
seth
eo
pposi
tyac
tive
lear
nin
gby
ou
tlie
rd
etec
tio
n..
.Q
BC
VC
dim
men
sio
nY
Ch
ang
e[r
ub
ens]
issi
mil
arto
coo
k’s
dis
tan
ceo
utl
ier
crit
erio
n..
.ko
lmo
go
rov
/co
mp
ress
ion
:g
iven
that
und
erly
ing
pat
tern
has
alre
ady
bee
nle
arn
ed;
outl
ier
carr
ies
more
add
itio
nal
info
rma-
tio
n
Sampling
2
assumption that the current model is accurate, and requires justsome tweaking. However, if the current model is inaccurate,it should be changed significantly; instead of ignoring theincompatability and keep making minor tweaks.
—x1
x2
y
.by—Practicality:Due to abundance of data; one may mistakenly dismiss this
problem as impractical. While the ulabeled data is abundant,labeled data is rather scarce. Even if overal, the amount oflabeled data is large enough; there may still be a need foradditional labeled data as to enable personalization (a commonfocus).
Moreover obtaining labeled data could be expensive. La-beled data is needed for personaliization ...
–This issue is exhorbated, in al settins in which ...This phenomena occurs frequently during the early stages
of the learning process [7], [6], or in a non-stationary envi-ronment in which changes may occur in the underlying model[2].
–Contributionsgradient descent ... (except the number of samples we can
make is very small)—
a) State the problem:b) Say why it’s an interesting problem: Not all of the
outliers are badc) Say what your solution achieves:d) Say what follows from your solution: If we discard
outliers, we might be discarding most informative data points====The goal of machine learning is to learn an accurate
predictive model from the data. Data that is inconsistent withthe learned model and/or existing data is refered to as anoutlier.
—Learned model is often assumed to be approximately cor-
rect, therefore using outliers for learning is considered to beundesireable, and hence outliers tend to be ignored. By playingit safe and learning only from consistent data just reinforcesour believes in what we think is correct (which may notneccesarily be accurate); instead of trying to learn what isnot yet known,
<<< TODO: need a nice illustrative example >>>—-just needs tweaking–AL estimates how informative points areoutliers are considered not informativewe think that they are very informative ...——
active learning aims at estimating how useful a point is forlearning. typically there is a lot of unlabeled data, some dataneeds to be labeled
—if some point is inconsistent, it may potentially be much
more informative; information that consistent points bringis rather limited since by virtue of being consistent, thisinformation is already captured within existing data/model,and consisten point mostly reinforces prior beliefs (w/o chaingthe outcomes).
–——outliers are discarded (since they are considered deterimen-
tal to learning the underlying pattern);unless objective is to learn to detect outliers, refered to as
anomaly detection [3], e.g. ...——AL: often contradictory items are considered to be outliers
and are ignored we argue that they are not outliers, but mayindeed contain a lot of information
http://jeffjonas.typepad.com/jeff_jonas/2010/11/big-data-new-physics.html
2. Bad data good. More specifically, natural variability indata including spelling errors, transposition errors, and evenprofessionally fabricated lies – all helpful. A bit more aboutthis here:It Turns Out Both Bad Data and a Teaspoon of DirtMay Be Good For YouandThere Is No Such Thing As A SingleVersion of Truth.
—
II. PROBLEM DEFINITION
type of outliers: error, non-error, ...f(x, ✓)
III. RELATED WORKS
Active learning and outlier detection has been jointly studiedbefore.
Typical approach is to use outlier detect to remove samplesfrom active learning process: e.g. [?] used outlier detectionduring the active learning to identify and remove the samplesjuged to be outliers.
we use outlier criterion as an active learning criterion;previous work of [1] ... has done an opposite using activelearning criterion as an outlier criterion i.e. outlier detectionby active learning; we propose the opposity active learning byoutlier detection ...
QBCVC dimmensionYChange [rubens] is similar to cook’s distance outlier
criterion ...kolmogorov/compression: given that underlying pattern has
already been learned; outlier carries more additional informa-tion
2
assu
mp
tio
nth
atth
ecu
rren
tm
od
elis
accu
rate
,an
dre
qu
ires
just
som
etw
eak
ing
.H
owev
er,
ifth
ecu
rren
tm
od
elis
inac
cura
te,
itsh
ou
ldb
ech
ang
edsi
gn
ifica
ntl
y;
inst
ead
of
ign
ori
ng
the
inco
mp
atab
ilit
yan
dkee
pm
akin
gm
ino
rtw
eak
s.— x
1
x
2
y . by — P
ract
ical
ity
:D
ue
toab
un
dan
ceo
fd
ata;
on
em
aym
ista
ken
lyd
ism
iss
this
pro
ble
mas
imp
ract
ical
.W
hil
eth
eu
lab
eled
dat
ais
abu
nd
ant,
lab
eled
dat
ais
rath
ersc
arce
.E
ven
ifover
al,
the
amo
un
to
fla
bel
edd
ata
isla
rge
eno
ug
h;
ther
em
ayst
ill
be
an
eed
for
add
itio
nal
lab
eled
dat
aas
toen
able
per
son
aliz
atio
n(a
com
mo
nfo
cus)
.M
ore
over
ob
tain
ing
lab
eled
dat
aco
uld
be
exp
ensi
ve.
La-
bel
edd
ata
isn
eed
edfo
rp
erso
nal
iiza
tio
n..
.– T
his
issu
eis
exh
orb
ated
,in
alse
ttin
sin
wh
ich
...
Th
isp
hen
om
ena
occ
urs
freq
uen
tly
du
rin
gth
eea
rly
stag
eso
fth
ele
arn
ing
pro
cess
[7],
[6],
or
ina
no
n-s
tati
on
ary
env
i-ro
nm
ent
inw
hic
hch
ang
esm
ayo
ccu
rin
the
un
der
lyin
gm
od
el[2
]. – Co
ntr
ibu
tio
ns
gra
die
nt
des
cen
t..
.(e
xce
pt
the
nu
mb
ero
fsa
mp
les
we
can
mak
eis
ver
ysm
all)
—a)
Stat
eth
epr
oble
m:
b)Sa
yw
hyit
’san
inte
rest
ing
prob
lem
:N
ot
all
of
the
ou
tlie
rsar
eb
adc)
Say
wha
tyo
urso
luti
onac
hiev
es:
d)Sa
yw
hat
foll
ows
from
your
solu
tion
:If
we
dis
card
ou
tlie
rs,
we
mig
ht
be
dis
card
ing
mo
stin
form
ativ
ed
ata
po
ints
==
==
Th
eg
oal
of
mac
hin
ele
arn
ing
isto
lear
nan
accu
rate
pre
dic
tive
mo
del
fro
mth
ed
ata.
Dat
ath
atis
inco
nsi
sten
tw
ith
the
lear
ned
mo
del
and
/or
exis
tin
gd
ata
isre
fere
dto
asan
ou
tlie
r.— L
earn
edm
od
elis
oft
enas
sum
edto
be
app
rox
imat
ely
cor-
rect
,th
eref
ore
usi
ng
ou
tlie
rsfo
rle
arn
ing
isco
nsi
der
edto
be
un
des
irea
ble
,an
dh
ence
ou
tlie
rste
nd
tob
eig
no
red
.B
yp
lay
ing
itsa
fean
dle
arn
ing
on
lyfr
om
con
sist
ent
dat
aju
stre
info
rces
ou
rb
elie
ves
inw
hat
we
thin
kis
corr
ect
(wh
ich
may
no
tn
ecce
sari
lyb
eac
cura
te);
inst
ead
of
try
ing
tole
arn
wh
atis
no
ty
etk
now
n,
<<
<T
OD
O:
nee
da
nic
eil
lust
rati
ve
exam
ple
>>
>—
-ju
stn
eed
stw
eak
ing
– AL
esti
mat
esh
owin
form
ativ
ep
oin
tsar
eo
utl
iers
are
con
sid
ered
no
tin
form
ativ
ew
eth
ink
that
they
are
ver
yin
form
ativ
e..
.—
—
acti
ve
lear
nin
gai
ms
ates
tim
atin
gh
owu
sefu
la
po
int
isfo
rle
arn
ing
.ty
pic
ally
ther
eis
alo
to
fu
nla
bel
edd
ata,
som
ed
ata
nee
ds
tob
ela
bel
ed— if
som
ep
oin
tis
inco
nsi
sten
t,it
may
po
ten
tial
lyb
em
uch
mo
rein
form
ativ
e;in
form
atio
nth
atco
nsi
sten
tp
oin
tsb
rin
gis
rath
erli
mit
edsi
nce
by
vir
tue
of
bei
ng
con
sist
ent,
this
info
rmat
ion
isal
read
yca
ptu
red
wit
hin
exis
tin
gd
ata/
mo
del
,an
dco
nsi
sten
po
int
mo
stly
rein
forc
esp
rio
rb
elie
fs(w
/och
ain
gth
eo
utc
om
es).
– ——
ou
tlie
rsar
ed
isca
rded
(sin
ceth
eyar
eco
nsi
der
edd
eter
imen
-ta
lto
lear
nin
gth
eu
nd
erly
ing
pat
tern
);u
nle
sso
bje
ctiv
eis
tole
arn
tod
etec
to
utl
iers
,re
fere
dto
asan
om
aly
det
ecti
on
[3],
e.g
...
.—
—A
L:
oft
enco
ntr
adic
tory
item
sar
eco
nsi
der
edto
be
ou
tlie
rsan
dar
eig
no
red
we
arg
ue
that
they
are
no
to
utl
iers
,bu
tm
ayin
dee
dco
nta
ina
lot
of
info
rmat
ion
htt
p:/
/jef
fjo
nas
.ty
pep
ad.c
om
/jef
f_jo
nas
/20
10
/11
/big
-dat
a-n
ew-p
hy
sics
.htm
l2
.B
add
ata
go
od
.M
ore
spec
ifica
lly,
nat
ura
lva
riab
ilit
yin
dat
ain
clu
din
gsp
elli
ng
erro
rs,
tran
spo
siti
on
erro
rs,
and
even
pro
fess
ion
ally
fab
rica
ted
lies
–al
lh
elp
ful.
Ab
itm
ore
abo
ut
this
her
e:It
Tu
rns
Ou
tB
oth
Bad
Dat
aan
da
Tea
spo
on
of
Dir
tM
ayB
eG
oo
dF
or
Yo
uan
dT
her
eIs
No
Su
chT
hin
gA
sA
Sin
gle
Ver
sio
no
fT
ruth
.—
II.
PR
OB
LE
MD
EF
INIT
ION
typ
eo
fo
utl
iers
:er
ror,
no
n-e
rro
r,..
.f(x
,✓)
III.
RE
LA
TE
DW
OR
KS
Act
ive
lear
nin
gan
do
utl
ier
det
ecti
on
has
bee
njo
intl
yst
ud
ied
bef
ore
.T
yp
ical
app
roac
his
tou
seo
utl
ier
det
ect
tore
move
sam
ple
sfr
om
acti
ve
lear
nin
gp
roce
ss:
e.g
.[?
]u
sed
ou
tlie
rd
etec
tio
nd
uri
ng
the
acti
ve
lear
nin
gto
iden
tify
and
rem
ove
the
sam
ple
sju
ged
tob
eo
utl
iers
.w
eu
seo
utl
ier
crit
erio
nas
anac
tive
lear
nin
gcr
iter
ion
;p
rev
iou
sw
ork
of
[1]
...
has
do
ne
ano
pp
osi
teu
sin
gac
tive
lear
nin
gcr
iter
ion
asan
ou
tlie
rcr
iter
ion
i.e.
ou
tlie
rd
etec
tio
nb
yac
tive
lear
nin
g;
we
pro
po
seth
eo
pp
osi
tyac
tive
lear
nin
gb
yo
utl
ier
det
ecti
on
...
QB
CV
Cd
imm
ensi
on
YC
han
ge
[ru
ben
s]is
sim
ilar
toco
ok
’sd
ista
nce
ou
tlie
rcr
iter
ion
...
ko
lmo
go
rov
/co
mp
ress
ion
:g
iven
that
un
der
lyin
gp
atte
rnh
asal
read
yb
een
lear
ned
;o
utl
ier
carr
ies
mo
read
dit
ion
alin
form
a-ti
on
Multiple Hypothesis
2
assumption that the current model is accurate, and requires justsome tweaking. However, if the current model is inaccurate,it should be changed significantly; instead of ignoring theincompatability and keep making minor tweaks.
—x1
x2
y
.by—Practicality:Due to abundance of data; one may mistakenly dismiss this
problem as impractical. While the ulabeled data is abundant,labeled data is rather scarce. Even if overal, the amount oflabeled data is large enough; there may still be a need foradditional labeled data as to enable personalization (a commonfocus).
Moreover obtaining labeled data could be expensive. La-beled data is needed for personaliization ...
–This issue is exhorbated, in al settins in which ...This phenomena occurs frequently during the early stages
of the learning process [7], [6], or in a non-stationary envi-ronment in which changes may occur in the underlying model[2].
–Contributionsgradient descent ... (except the number of samples we can
make is very small)—
a) State the problem:b) Say why it’s an interesting problem: Not all of the
outliers are badc) Say what your solution achieves:d) Say what follows from your solution: If we discard
outliers, we might be discarding most informative data points====The goal of machine learning is to learn an accurate
predictive model from the data. Data that is inconsistent withthe learned model and/or existing data is refered to as anoutlier.
—Learned model is often assumed to be approximately cor-
rect, therefore using outliers for learning is considered to beundesireable, and hence outliers tend to be ignored. By playingit safe and learning only from consistent data just reinforcesour believes in what we think is correct (which may notneccesarily be accurate); instead of trying to learn what isnot yet known,
<<< TODO: need a nice illustrative example >>>—-just needs tweaking–AL estimates how informative points areoutliers are considered not informativewe think that they are very informative ...——
active learning aims at estimating how useful a point is forlearning. typically there is a lot of unlabeled data, some dataneeds to be labeled
—if some point is inconsistent, it may potentially be much
more informative; information that consistent points bringis rather limited since by virtue of being consistent, thisinformation is already captured within existing data/model,and consisten point mostly reinforces prior beliefs (w/o chaingthe outcomes).
–——outliers are discarded (since they are considered deterimen-
tal to learning the underlying pattern);unless objective is to learn to detect outliers, refered to as
anomaly detection [3], e.g. ...——AL: often contradictory items are considered to be outliers
and are ignored we argue that they are not outliers, but mayindeed contain a lot of information
http://jeffjonas.typepad.com/jeff_jonas/2010/11/big-data-new-physics.html
2. Bad data good. More specifically, natural variability indata including spelling errors, transposition errors, and evenprofessionally fabricated lies – all helpful. A bit more aboutthis here:It Turns Out Both Bad Data and a Teaspoon of DirtMay Be Good For YouandThere Is No Such Thing As A SingleVersion of Truth.
—
II. PROBLEM DEFINITION
type of outliers: error, non-error, ...f(x, ✓)
III. RELATED WORKS
Active learning and outlier detection has been jointly studiedbefore.
Typical approach is to use outlier detect to remove samplesfrom active learning process: e.g. [?] used outlier detectionduring the active learning to identify and remove the samplesjuged to be outliers.
we use outlier criterion as an active learning criterion;previous work of [1] ... has done an opposite using activelearning criterion as an outlier criterion i.e. outlier detectionby active learning; we propose the opposity active learning byoutlier detection ...
QBCVC dimmensionYChange [rubens] is similar to cook’s distance outlier
criterion ...kolmogorov/compression: given that underlying pattern has
already been learned; outlier carries more additional informa-tion
2
assu
mp
tio
nth
atth
ecu
rren
tm
od
elis
accu
rate
,an
dre
qu
ires
just
som
etw
eak
ing
.H
owev
er,
ifth
ecu
rren
tm
od
elis
inac
cura
te,
itsh
ou
ldb
ech
ang
edsi
gn
ifica
ntl
y;
inst
ead
of
ign
ori
ng
the
inco
mp
atab
ilit
yan
dkee
pm
akin
gm
ino
rtw
eak
s.— x
1
x
2
y . by — P
ract
ical
ity
:D
ue
toab
un
dan
ceo
fd
ata;
on
em
aym
ista
ken
lyd
ism
iss
this
pro
ble
mas
imp
ract
ical
.W
hil
eth
eu
lab
eled
dat
ais
abu
nd
ant,
lab
eled
dat
ais
rath
ersc
arce
.E
ven
ifover
al,
the
amo
un
to
fla
bel
edd
ata
isla
rge
eno
ug
h;
ther
em
ayst
ill
be
an
eed
for
add
itio
nal
lab
eled
dat
aas
toen
able
per
son
aliz
atio
n(a
com
mo
nfo
cus)
.M
ore
over
ob
tain
ing
lab
eled
dat
aco
uld
be
exp
ensi
ve.
La-
bel
edd
ata
isn
eed
edfo
rp
erso
nal
iiza
tio
n..
.– T
his
issu
eis
exh
orb
ated
,in
alse
ttin
sin
wh
ich
...
Th
isp
hen
om
ena
occ
urs
freq
uen
tly
du
rin
gth
eea
rly
stag
eso
fth
ele
arn
ing
pro
cess
[7],
[6],
or
ina
no
n-s
tati
on
ary
env
i-ro
nm
ent
inw
hic
hch
ang
esm
ayo
ccu
rin
the
un
der
lyin
gm
odel
[2]. – C
on
trib
uti
on
sg
rad
ien
td
esce
nt
...
(ex
cep
tth
en
um
ber
of
sam
ple
sw
eca
nm
ake
isver
ysm
all)
—a)
Stat
eth
epr
oble
m:
b)Sa
yw
hyit
’san
inte
rest
ing
prob
lem
:N
ot
all
of
the
ou
tlie
rsar
eb
adc)
Say
wha
tyo
urso
luti
onac
hiev
es:
d)Sa
yw
hat
foll
ows
from
your
solu
tion
:If
we
dis
card
ou
tlie
rs,
we
mig
ht
be
dis
card
ing
mo
stin
form
ativ
ed
ata
po
ints
==
==
Th
eg
oal
of
mac
hin
ele
arn
ing
isto
lear
nan
accu
rate
pre
dic
tive
mo
del
fro
mth
ed
ata.
Dat
ath
atis
inco
nsi
sten
tw
ith
the
lear
ned
mo
del
and
/or
exis
tin
gd
ata
isre
fere
dto
asan
ou
tlie
r.— L
earn
edm
od
elis
oft
enas
sum
edto
be
app
rox
imat
ely
cor-
rect
,th
eref
ore
usi
ng
ou
tlie
rsfo
rle
arn
ing
isco
nsi
der
edto
be
un
des
irea
ble
,an
dh
ence
ou
tlie
rste
nd
tob
eig
no
red
.B
yp
lay
ing
itsa
fean
dle
arn
ing
on
lyfr
om
con
sist
ent
dat
aju
stre
info
rces
ou
rb
elie
ves
inw
hat
we
thin
kis
corr
ect
(wh
ich
may
no
tn
ecce
sari
lyb
eac
cura
te);
inst
ead
of
try
ing
tole
arn
wh
atis
no
ty
etk
now
n,
<<
<T
OD
O:
nee
da
nic
eil
lust
rati
ve
exam
ple
>>
>—
-ju
stn
eed
stw
eak
ing
– AL
esti
mat
esh
owin
form
ativ
ep
oin
tsar
eo
utl
iers
are
con
sid
ered
no
tin
form
ativ
ew
eth
ink
that
they
are
ver
yin
form
ativ
e..
.—
—
acti
ve
lear
nin
gai
ms
ates
tim
atin
gh
owu
sefu
la
po
int
isfo
rle
arn
ing
.ty
pic
ally
ther
eis
alo
to
fu
nla
bel
edd
ata,
som
ed
ata
nee
ds
tob
ela
bel
ed— if
som
ep
oin
tis
inco
nsi
sten
t,it
may
po
ten
tial
lyb
em
uch
mo
rein
form
ativ
e;in
form
atio
nth
atco
nsi
sten
tp
oin
tsb
rin
gis
rath
erli
mit
edsi
nce
by
vir
tue
of
bei
ng
con
sist
ent,
this
info
rmat
ion
isal
read
yca
ptu
red
wit
hin
exis
tin
gd
ata/
mo
del
,an
dco
nsi
sten
po
int
mo
stly
rein
forc
esp
rio
rb
elie
fs(w
/och
ain
gth
eo
utc
om
es).
– ——
ou
tlie
rsar
ed
isca
rded
(sin
ceth
eyar
eco
nsi
der
edd
eter
imen
-ta
lto
lear
nin
gth
eu
nd
erly
ing
pat
tern
);u
nle
sso
bje
ctiv
eis
tole
arn
tod
etec
to
utl
iers
,re
fere
dto
asan
om
aly
det
ecti
on
[3],
e.g
...
.—
—A
L:
oft
enco
ntr
adic
tory
item
sar
eco
nsi
der
edto
be
ou
tlie
rsan
dar
eig
no
red
we
arg
ue
that
they
are
no
to
utl
iers
,bu
tm
ayin
dee
dco
nta
ina
lot
of
info
rmat
ion
htt
p:/
/jef
fjo
nas
.ty
pep
ad.c
om
/jef
f_jo
nas
/20
10
/11
/big
-dat
a-n
ew-p
hy
sics
.htm
l2
.B
add
ata
go
od
.M
ore
spec
ifica
lly,
nat
ura
lva
riab
ilit
yin
dat
ain
clu
din
gsp
elli
ng
erro
rs,
tran
spo
siti
on
erro
rs,
and
even
pro
fess
ion
ally
fab
rica
ted
lies
–al
lh
elp
ful.
Ab
itm
ore
abo
ut
this
her
e:It
Tu
rns
Ou
tB
oth
Bad
Dat
aan
da
Tea
spo
on
of
Dir
tM
ayB
eG
oo
dF
or
Yo
uan
dT
her
eIs
No
Su
chT
hin
gA
sA
Sin
gle
Ver
sio
no
fT
ruth
.—
II.
PR
OB
LE
MD
EF
INIT
ION
typ
eo
fo
utl
iers
:er
ror,
no
n-e
rro
r,..
.f(x
,✓)
III.
RE
LA
TE
DW
OR
KS
Act
ive
lear
nin
gan
do
utl
ier
det
ecti
on
has
bee
njo
intl
yst
ud
ied
bef
ore
.T
yp
ical
app
roac
his
tou
seo
utl
ier
det
ect
tore
move
sam
ple
sfr
om
acti
ve
lear
nin
gp
roce
ss:
e.g
.[?
]u
sed
ou
tlie
rd
etec
tio
nd
uri
ng
the
acti
ve
lear
nin
gto
iden
tify
and
rem
ove
the
sam
ple
sju
ged
tob
eo
utl
iers
.w
eu
seo
utl
ier
crit
erio
nas
anac
tive
lear
nin
gcr
iter
ion
;p
rev
iou
sw
ork
of
[1]
...
has
do
ne
ano
pp
osi
teu
sin
gac
tive
lear
nin
gcr
iter
ion
asan
ou
tlie
rcr
iter
ion
i.e.
ou
tlie
rd
etec
tio
nb
yac
tive
lear
nin
g;
we
pro
po
seth
eo
pp
osi
tyac
tive
lear
nin
gb
yo
utl
ier
det
ecti
on
...
QB
CV
Cd
imm
ensi
on
YC
han
ge
[ru
ben
s]is
sim
ilar
toco
ok
’sd
ista
nce
ou
tlie
rcr
iter
ion
...
ko
lmo
go
rov
/co
mp
ress
ion
:g
iven
that
un
der
lyin
gp
atte
rnh
asal
read
yb
een
lear
ned
;o
utl
ier
carr
ies
mo
read
dit
ion
alin
form
a-ti
on
Hypothesis/Model Selection
2
assumption that the current model is accurate, and requires justsome tweaking. However, if the current model is inaccurate,it should be changed significantly; instead of ignoring theincompatability and keep making minor tweaks.
—x1
x2
y
.by—Practicality:Due to abundance of data; one may mistakenly dismiss this
problem as impractical. While the ulabeled data is abundant,labeled data is rather scarce. Even if overal, the amount oflabeled data is large enough; there may still be a need foradditional labeled data as to enable personalization (a commonfocus).
Moreover obtaining labeled data could be expensive. La-beled data is needed for personaliization ...
–This issue is exhorbated, in al settins in which ...This phenomena occurs frequently during the early stages
of the learning process [7], [6], or in a non-stationary envi-ronment in which changes may occur in the underlying model[2].
–Contributionsgradient descent ... (except the number of samples we can
make is very small)—
a) State the problem:b) Say why it’s an interesting problem: Not all of the
outliers are badc) Say what your solution achieves:d) Say what follows from your solution: If we discard
outliers, we might be discarding most informative data points====The goal of machine learning is to learn an accurate
predictive model from the data. Data that is inconsistent withthe learned model and/or existing data is refered to as anoutlier.
—Learned model is often assumed to be approximately cor-
rect, therefore using outliers for learning is considered to beundesireable, and hence outliers tend to be ignored. By playingit safe and learning only from consistent data just reinforcesour believes in what we think is correct (which may notneccesarily be accurate); instead of trying to learn what isnot yet known,
<<< TODO: need a nice illustrative example >>>—-just needs tweaking–AL estimates how informative points areoutliers are considered not informativewe think that they are very informative ...——
active learning aims at estimating how useful a point is forlearning. typically there is a lot of unlabeled data, some dataneeds to be labeled
—if some point is inconsistent, it may potentially be much
more informative; information that consistent points bringis rather limited since by virtue of being consistent, thisinformation is already captured within existing data/model,and consisten point mostly reinforces prior beliefs (w/o chaingthe outcomes).
–——outliers are discarded (since they are considered deterimen-
tal to learning the underlying pattern);unless objective is to learn to detect outliers, refered to as
anomaly detection [3], e.g. ...——AL: often contradictory items are considered to be outliers
and are ignored we argue that they are not outliers, but mayindeed contain a lot of information
http://jeffjonas.typepad.com/jeff_jonas/2010/11/big-data-new-physics.html
2. Bad data good. More specifically, natural variability indata including spelling errors, transposition errors, and evenprofessionally fabricated lies – all helpful. A bit more aboutthis here:It Turns Out Both Bad Data and a Teaspoon of DirtMay Be Good For YouandThere Is No Such Thing As A SingleVersion of Truth.
—
II. PROBLEM DEFINITION
type of outliers: error, non-error, ...f(x, ✓)
III. RELATED WORKS
Active learning and outlier detection has been jointly studiedbefore.
Typical approach is to use outlier detect to remove samplesfrom active learning process: e.g. [?] used outlier detectionduring the active learning to identify and remove the samplesjuged to be outliers.
we use outlier criterion as an active learning criterion;previous work of [1] ... has done an opposite using activelearning criterion as an outlier criterion i.e. outlier detectionby active learning; we propose the opposity active learning byoutlier detection ...
QBCVC dimmensionYChange [rubens] is similar to cook’s distance outlier
criterion ...kolmogorov/compression: given that underlying pattern has
already been learned; outlier carries more additional informa-tion
2
assu
mpti
on
that
the
curr
ent
model
isac
cura
te,an
dre
quir
esju
stso
me
twea
kin
g.
How
ever
,if
the
curr
ent
model
isin
accu
rate
,it
should
be
chan
ged
sig
nifi
cantl
y;
inst
ead
of
ignori
ng
the
inco
mpat
abil
ity
and
kee
pm
akin
gm
inor
twea
ks.
— x
1
x
2
y . by — P
ract
ical
ity:
Due
toab
undan
ceof
dat
a;one
may
mis
taken
lydis
mis
sth
ispro
ble
mas
impra
ctic
al.
Whil
eth
eula
bel
eddat
ais
abundan
t,la
bel
eddat
ais
rath
ersc
arce
.E
ven
ifover
al,
the
amou
nt
of
label
eddat
ais
larg
een
ough;
ther
em
ayst
ill
be
anee
dfo
rad
dit
ional
label
eddat
aas
toen
able
per
son
aliz
atio
n(a
com
mon
focu
s).
More
over
obta
inin
gla
bel
edd
ata
could
be
expen
sive.
La-
bel
eddat
ais
nee
ded
for
per
sonal
iiza
tion
...
– This
issu
eis
exh
orb
ated
,in
alse
ttin
sin
whic
h...
This
phen
om
ena
occ
urs
freq
uen
tly
duri
ng
the
earl
yst
ages
of
the
lear
nin
gpro
cess
[7],
[6],
or
ina
non-s
tati
on
ary
envi-
ronm
ent
inw
hic
hch
anges
may
occ
ur
inth
eunder
lyin
gm
od
el[2
]. – Contr
ibuti
ons
gra
die
nt
des
cent
...
(exce
pt
the
num
ber
of
sam
ple
sw
eca
nm
ake
isver
ysm
all)
—a)
Stat
eth
epr
oble
m:
b)Sa
yw
hyit
’san
inte
rest
ing
prob
lem
:N
ot
all
of
the
outl
iers
are
bad
c)Sa
yw
hat
your
solu
tion
achi
eves
:d)
Say
wha
tfo
llow
sfr
omyo
urso
luti
on:
Ifw
ed
isca
rdoutl
iers
,w
em
ight
be
dis
card
ing
most
info
rmat
ive
dat
apoin
ts=
==
=T
he
goal
of
mac
hin
ele
arnin
gis
tole
arn
anac
cura
tepre
dic
tive
model
from
the
dat
a.D
ata
that
isin
consi
sten
tw
ith
the
lear
ned
mod
elan
d/o
rex
isti
ng
dat
ais
refe
red
toas
anoutl
ier.
— Lea
rned
model
iso
ften
assu
med
tobe
appro
xim
atel
yco
r-re
ct,
ther
efore
usi
ng
outl
iers
for
lear
nin
gis
consi
der
edto
be
undes
irea
ble
,an
dhen
ceoutl
iers
ten
dto
be
igno
red.B
ypla
yin
git
safe
and
lear
nin
gonly
from
con
sist
ent
dat
aju
stre
info
rces
our
bel
ieves
inw
hat
we
thin
kis
corr
ect
(whic
hm
aynot
nec
cesa
rily
be
accu
rate
);in
stea
dof
tryin
gto
lear
nw
hat
isnot
yet
know
n,
<<
<T
OD
O:
nee
da
nic
eil
lust
rati
ve
exam
ple
>>
>—
-ju
stnee
ds
twea
kin
g– A
Les
tim
ates
how
info
rmat
ive
poin
tsar
eoutl
iers
are
consi
der
ednot
info
rmat
ive
we
thin
kth
atth
eyar
ever
yin
form
ativ
e...
——
acti
ve
lear
nin
gai
ms
ates
tim
atin
ghow
use
ful
ap
oin
tis
for
lear
nin
g.
typic
ally
ther
eis
alo
tof
unla
bel
eddat
a,so
me
dat
anee
ds
tobe
label
ed— if
som
epoin
tis
inco
nsi
sten
t,it
may
pote
nti
ally
be
much
more
info
rmat
ive;
info
rmat
ion
that
consi
sten
tpo
ints
bri
ng
isra
ther
lim
ited
since
by
vir
tue
of
bei
ng
con
sist
ent,
this
info
rmat
ion
isal
read
yca
ptu
red
wit
hin
exis
ting
dat
a/m
odel
,an
dco
nsi
sten
poin
tm
ost
lyre
info
rces
pri
or
bel
iefs
(w/o
chai
ng
the
outc
om
es).
– ——
outl
iers
are
dis
card
ed(s
ince
they
are
consi
der
eddet
erim
en-
tal
tole
arnin
gth
eu
nder
lyin
gpat
tern
);unle
ssobje
ctiv
eis
tole
arn
todet
ect
outl
iers
,re
fere
dto
asan
om
aly
det
ecti
on
[3],
e.g.
...
——
AL
:oft
enco
ntr
adic
tory
item
sar
eco
nsi
der
edto
be
outl
iers
and
are
ignore
dw
ear
gue
that
they
are
not
outl
iers
,bu
tm
ayin
dee
dco
nta
ina
lot
of
info
rmat
ion
htt
p:/
/jef
fjonas
.typep
ad.c
om
/jef
f_jo
nas
/2010/1
1/b
ig-d
ata-
new
-physi
cs.h
tml
2.
Bad
dat
agood.
More
spec
ifica
lly,
nat
ura
lva
riab
ilit
yin
dat
ain
cludin
gsp
elli
ng
erro
rs,
tran
sposi
tion
erro
rs,
and
even
pro
fess
ional
lyfa
bri
cate
dli
es–
all
hel
pfu
l.A
bit
mo
reab
ou
tth
isher
e:It
Turn
sO
ut
Both
Bad
Dat
aan
da
Tea
spoon
of
Dir
tM
ayB
eG
ood
For
Youan
dT
her
eIs
No
Such
Thin
gA
sA
Sin
gle
Ver
sion
of
Tru
th.
—
II.
PR
OB
LE
MD
EF
INIT
ION
type
of
outl
iers
:er
ror,
non-e
rror,
...
f(x
,✓)
III.
RE
LA
TE
DW
OR
KS
Act
ive
lear
nin
gan
dou
tlie
rdet
ecti
on
has
bee
njo
intl
yst
ud
ied
bef
ore
.T
ypic
alap
pro
ach
isto
use
outl
ier
det
ect
tore
move
sam
ple
sfr
om
acti
ve
lear
nin
gpro
cess
:e.
g.
[?]
use
doutl
ier
det
ecti
on
duri
ng
the
acti
ve
lear
nin
gto
iden
tify
and
rem
ove
the
sam
ple
sju
ged
tobe
outl
iers
.w
euse
outl
ier
crit
erio
nas
anac
tive
lear
nin
gcr
iter
ion
;pre
vio
us
work
of
[1]
...
has
done
anopposi
teu
sing
acti
ve
lear
nin
gcr
iter
ion
asan
outl
ier
crit
erio
ni.
e.outl
ier
det
ecti
on
by
acti
ve
lear
nin
g;
we
pro
pose
the
opposi
tyac
tive
lear
nin
gby
outl
ier
det
ecti
on
...
QB
CV
Cdim
men
sion
YC
han
ge
[ruben
s]is
sim
ilar
toco
ok’s
dis
tance
outl
ier
crit
erio
n..
.kolm
ogoro
v/c
om
pre
ssio
n:
giv
enth
atunder
lyin
gp
atte
rnhas
alre
ady
bee
nle
arned
;ou
tlie
rca
rrie
sm
ore
addit
ional
info
rma-
tion
Consistent SampleDoes not allow to reduce # of hypothesesLittle is learned
2
assumption that the current model is accurate, and requires justsome tweaking. However, if the current model is inaccurate,it should be changed significantly; instead of ignoring theincompatability and keep making minor tweaks.
—x1
x2
y
.by—Practicality:Due to abundance of data; one may mistakenly dismiss this
problem as impractical. While the ulabeled data is abundant,labeled data is rather scarce. Even if overal, the amount oflabeled data is large enough; there may still be a need foradditional labeled data as to enable personalization (a commonfocus).
Moreover obtaining labeled data could be expensive. La-beled data is needed for personaliization ...
–This issue is exhorbated, in al settins in which ...This phenomena occurs frequently during the early stages
of the learning process [7], [6], or in a non-stationary envi-ronment in which changes may occur in the underlying model[2].
–Contributionsgradient descent ... (except the number of samples we can
make is very small)—
a) State the problem:b) Say why it’s an interesting problem: Not all of the
outliers are badc) Say what your solution achieves:d) Say what follows from your solution: If we discard
outliers, we might be discarding most informative data points====The goal of machine learning is to learn an accurate
predictive model from the data. Data that is inconsistent withthe learned model and/or existing data is refered to as anoutlier.
—Learned model is often assumed to be approximately cor-
rect, therefore using outliers for learning is considered to beundesireable, and hence outliers tend to be ignored. By playingit safe and learning only from consistent data just reinforcesour believes in what we think is correct (which may notneccesarily be accurate); instead of trying to learn what isnot yet known,
<<< TODO: need a nice illustrative example >>>—-just needs tweaking–AL estimates how informative points areoutliers are considered not informativewe think that they are very informative ...——
active learning aims at estimating how useful a point is forlearning. typically there is a lot of unlabeled data, some dataneeds to be labeled
—if some point is inconsistent, it may potentially be much
more informative; information that consistent points bringis rather limited since by virtue of being consistent, thisinformation is already captured within existing data/model,and consisten point mostly reinforces prior beliefs (w/o chaingthe outcomes).
–——outliers are discarded (since they are considered deterimen-
tal to learning the underlying pattern);unless objective is to learn to detect outliers, refered to as
anomaly detection [3], e.g. ...——AL: often contradictory items are considered to be outliers
and are ignored we argue that they are not outliers, but mayindeed contain a lot of information
http://jeffjonas.typepad.com/jeff_jonas/2010/11/big-data-new-physics.html
2. Bad data good. More specifically, natural variability indata including spelling errors, transposition errors, and evenprofessionally fabricated lies – all helpful. A bit more aboutthis here:It Turns Out Both Bad Data and a Teaspoon of DirtMay Be Good For YouandThere Is No Such Thing As A SingleVersion of Truth.
—
II. PROBLEM DEFINITION
type of outliers: error, non-error, ...f(x, ✓)
III. RELATED WORKS
Active learning and outlier detection has been jointly studiedbefore.
Typical approach is to use outlier detect to remove samplesfrom active learning process: e.g. [?] used outlier detectionduring the active learning to identify and remove the samplesjuged to be outliers.
we use outlier criterion as an active learning criterion;previous work of [1] ... has done an opposite using activelearning criterion as an outlier criterion i.e. outlier detectionby active learning; we propose the opposity active learning byoutlier detection ...
QBCVC dimmensionYChange [rubens] is similar to cook’s distance outlier
criterion ...kolmogorov/compression: given that underlying pattern has
already been learned; outlier carries more additional informa-tion
2
assu
mpti
on
that
the
curr
ent
model
isac
cura
te,an
dre
quir
esju
stso
me
twea
kin
g.
How
ever
,if
the
curr
ent
model
isin
accu
rate
,it
should
be
chan
ged
signifi
cantl
y;
inst
ead
of
ignori
ng
the
inco
mpat
abil
ity
and
kee
pm
akin
gm
inor
twea
ks.
— x
1
x
2
y . by — P
ract
ical
ity:
Due
toab
undan
ceof
dat
a;one
may
mis
taken
lydis
mis
sth
ispro
ble
mas
impra
ctic
al.
Whil
eth
eula
bel
eddat
ais
abundan
t,la
bel
eddat
ais
rath
ersc
arce
.E
ven
ifover
al,
the
amount
of
label
eddat
ais
larg
een
ough;
ther
em
ayst
ill
be
anee
dfo
rad
dit
ional
label
eddat
aas
toen
able
per
sonal
izat
ion
(aco
mm
on
focu
s).
More
over
obta
inin
gla
bel
eddat
aco
uld
be
expen
sive.
La-
bel
eddat
ais
nee
ded
for
per
sonal
iiza
tion
...
– This
issu
eis
exhorb
ated
,in
alse
ttin
sin
whic
h...
This
phen
om
ena
occ
urs
freq
uen
tly
duri
ng
the
earl
yst
ages
of
the
lear
nin
gpro
cess
[7],
[6],
or
ina
non-s
tati
onar
yen
vi-
ronm
ent
inw
hic
hch
anges
may
occ
ur
inth
eunder
lyin
gm
od
el[2
]. – Contr
ibuti
ons
gra
die
nt
des
cent
...
(exce
pt
the
num
ber
of
sam
ple
sw
eca
nm
ake
isver
ysm
all)
—a)
Stat
eth
epr
oble
m:
b)Sa
yw
hyit
’san
inte
rest
ing
prob
lem
:N
ot
all
of
the
outl
iers
are
bad
c)Sa
yw
hat
your
solu
tion
achi
eves
:d)
Say
wha
tfo
llow
sfr
omyo
urso
luti
on:
Ifw
edis
card
outl
iers
,w
em
ight
be
dis
card
ing
most
info
rmat
ive
dat
apoin
ts=
==
=T
he
goal
of
mac
hin
ele
arnin
gis
tole
arn
anac
cura
tepre
dic
tive
model
from
the
dat
a.D
ata
that
isin
consi
sten
tw
ith
the
lear
ned
model
and/o
rex
isti
ng
dat
ais
refe
red
toas
anoutl
ier.
— Lea
rned
model
isoft
enas
sum
edto
be
appro
xim
atel
yco
r-re
ct,
ther
efore
usi
ng
outl
iers
for
lear
nin
gis
consi
der
edto
be
undes
irea
ble
,an
dhen
ceoutl
iers
tend
tobe
ignore
d.B
ypla
yin
git
safe
and
lear
nin
gonly
from
con
sist
ent
dat
aju
stre
info
rces
our
bel
ieves
inw
hat
we
thin
kis
corr
ect
(whic
hm
aynot
nec
cesa
rily
be
accu
rate
);in
stea
dof
tryin
gto
lear
nw
hat
isnot
yet
know
n,
<<
<T
OD
O:
nee
da
nic
eil
lust
rati
ve
exam
ple
>>
>—
-ju
stnee
ds
twea
kin
g– A
Les
tim
ates
how
info
rmat
ive
poin
tsar
eoutl
iers
are
consi
der
ednot
info
rmat
ive
we
thin
kth
atth
eyar
ever
yin
form
ativ
e...
——
acti
ve
lear
nin
gai
ms
ates
tim
atin
ghow
use
ful
ap
oin
tis
for
lear
nin
g.
typic
ally
ther
eis
alo
to
fun
label
eddat
a,so
me
dat
anee
ds
tobe
label
ed— if
som
epoin
tis
inco
nsi
sten
t,it
may
pote
nti
ally
be
much
more
info
rmat
ive;
info
rmat
ion
that
consi
sten
tpo
ints
bri
ng
isra
ther
lim
ited
since
by
vir
tue
of
bei
ng
con
sist
ent,
this
info
rmat
ion
isal
read
yca
ptu
red
wit
hin
exis
ting
dat
a/m
od
el,
and
consi
sten
poin
tm
ost
lyre
info
rces
pri
or
bel
iefs
(w/o
chai
ng
the
outc
om
es).
– ——
outl
iers
are
dis
card
ed(s
ince
they
are
consi
der
edd
eter
imen
-ta
lto
lear
nin
gth
eu
nder
lyin
gpat
tern
);unle
ssobje
ctiv
eis
tole
arn
todet
ect
outl
iers
,re
fere
dto
asan
om
aly
det
ecti
on
[3],
e.g.
...
——
AL
:oft
enco
ntr
adic
tory
item
sar
eco
nsi
der
edto
be
ou
tlie
rsan
dar
eig
nore
dw
ear
gue
that
they
are
not
outl
iers
,bu
tm
ayin
dee
dco
nta
ina
lot
of
info
rmat
ion
htt
p:/
/jef
fjonas
.typep
ad.c
om
/jef
f_jo
nas
/2010/1
1/b
ig-d
ata-
new
-physi
cs.h
tml
2.
Bad
dat
agood.
More
spec
ifica
lly,
nat
ura
lva
riab
ilit
yin
dat
ain
cludin
gsp
elli
ng
erro
rs,
tran
sposi
tion
erro
rs,
and
even
pro
fess
ional
lyfa
bri
cate
dli
es–
all
hel
pfu
l.A
bit
mo
reab
ou
tth
isher
e:It
Turn
sO
ut
Both
Bad
Dat
aan
da
Tea
spo
on
of
Dir
tM
ayB
eG
ood
For
Youan
dT
her
eIs
No
Such
Thin
gA
sA
Sin
gle
Ver
sion
of
Tru
th.
—
II.
PR
OB
LE
MD
EF
INIT
ION
type
of
outl
iers
:er
ror,
non-e
rror,
...
f(x
,✓)
III.
RE
LA
TE
DW
OR
KS
Act
ive
lear
nin
gan
dou
tlie
rdet
ecti
on
has
bee
njo
intl
yst
ud
ied
bef
ore
.T
ypic
alap
pro
ach
isto
use
outl
ier
det
ect
tore
move
sam
ple
sfr
om
acti
ve
lear
nin
gpro
cess
:e.
g.
[?]
use
doutl
ier
det
ecti
on
duri
ng
the
acti
ve
lear
nin
gto
iden
tify
and
rem
ove
the
sam
ple
sju
ged
tobe
outl
iers
.w
euse
outl
ier
crit
erio
nas
anac
tive
lear
nin
gcr
iter
ion
;pre
vio
us
work
of
[1]
...
has
done
ano
pposi
teu
sing
acti
ve
lear
nin
gcr
iter
ion
asan
outl
ier
crit
erio
ni.
e.outl
ier
det
ecti
on
by
acti
ve
lear
nin
g;
we
pro
pose
the
opposi
tyac
tive
lear
nin
gb
youtl
ier
det
ecti
on
...
QB
CV
Cdim
men
sion
YC
han
ge
[ruben
s]is
sim
ilar
toco
ok’s
dis
tan
ceo
utl
ier
crit
erio
n..
.kolm
ogoro
v/c
om
pre
ssio
n:
giv
enth
atunder
lyin
gp
atte
rnh
asal
read
ybee
nle
arned
;ou
tlie
rca
rrie
sm
ore
addit
ion
alin
form
a-ti
on
Inconsistent SampleWill not agree with some of the hypotheses(irregardless of the output values)
2
assumption that the current model is accurate, and requires justsome tweaking. However, if the current model is inaccurate,it should be changed significantly; instead of ignoring theincompatability and keep making minor tweaks.
—x1
x2
y
.by—Practicality:Due to abundance of data; one may mistakenly dismiss this
problem as impractical. While the ulabeled data is abundant,labeled data is rather scarce. Even if overal, the amount oflabeled data is large enough; there may still be a need foradditional labeled data as to enable personalization (a commonfocus).
Moreover obtaining labeled data could be expensive. La-beled data is needed for personaliization ...
–This issue is exhorbated, in al settins in which ...This phenomena occurs frequently during the early stages
of the learning process [7], [6], or in a non-stationary envi-ronment in which changes may occur in the underlying model[2].
–Contributionsgradient descent ... (except the number of samples we can
make is very small)—
a) State the problem:b) Say why it’s an interesting problem: Not all of the
outliers are badc) Say what your solution achieves:d) Say what follows from your solution: If we discard
outliers, we might be discarding most informative data points====The goal of machine learning is to learn an accurate
predictive model from the data. Data that is inconsistent withthe learned model and/or existing data is refered to as anoutlier.
—Learned model is often assumed to be approximately cor-
rect, therefore using outliers for learning is considered to beundesireable, and hence outliers tend to be ignored. By playingit safe and learning only from consistent data just reinforcesour believes in what we think is correct (which may notneccesarily be accurate); instead of trying to learn what isnot yet known,
<<< TODO: need a nice illustrative example >>>—-just needs tweaking–AL estimates how informative points areoutliers are considered not informativewe think that they are very informative ...——
active learning aims at estimating how useful a point is forlearning. typically there is a lot of unlabeled data, some dataneeds to be labeled
—if some point is inconsistent, it may potentially be much
more informative; information that consistent points bringis rather limited since by virtue of being consistent, thisinformation is already captured within existing data/model,and consisten point mostly reinforces prior beliefs (w/o chaingthe outcomes).
–——outliers are discarded (since they are considered deterimen-
tal to learning the underlying pattern);unless objective is to learn to detect outliers, refered to as
anomaly detection [3], e.g. ...——AL: often contradictory items are considered to be outliers
and are ignored we argue that they are not outliers, but mayindeed contain a lot of information
http://jeffjonas.typepad.com/jeff_jonas/2010/11/big-data-new-physics.html
2. Bad data good. More specifically, natural variability indata including spelling errors, transposition errors, and evenprofessionally fabricated lies – all helpful. A bit more aboutthis here:It Turns Out Both Bad Data and a Teaspoon of DirtMay Be Good For YouandThere Is No Such Thing As A SingleVersion of Truth.
—
II. PROBLEM DEFINITION
type of outliers: error, non-error, ...f(x, ✓)
III. RELATED WORKS
Active learning and outlier detection has been jointly studiedbefore.
Typical approach is to use outlier detect to remove samplesfrom active learning process: e.g. [?] used outlier detectionduring the active learning to identify and remove the samplesjuged to be outliers.
we use outlier criterion as an active learning criterion;previous work of [1] ... has done an opposite using activelearning criterion as an outlier criterion i.e. outlier detectionby active learning; we propose the opposity active learning byoutlier detection ...
QBCVC dimmensionYChange [rubens] is similar to cook’s distance outlier
criterion ...kolmogorov/compression: given that underlying pattern has
already been learned; outlier carries more additional informa-tion
2
assu
mpti
on
that
the
curr
ent
model
isac
cura
te,an
dre
quir
esju
stso
me
twea
kin
g.
How
ever
,if
the
curr
ent
model
isin
accu
rate
,it
should
be
chan
ged
signifi
cantl
y;
inst
ead
of
ignori
ng
the
inco
mpat
abil
ity
and
kee
pm
akin
gm
inor
twea
ks.
— x
1
x
2
y . by — P
ract
ical
ity:
Due
toab
undan
ceof
dat
a;one
may
mis
taken
lydis
mis
sth
ispro
ble
mas
impra
ctic
al.
Whil
eth
eula
bel
eddat
ais
abundan
t,la
bel
eddat
ais
rath
ersc
arce
.E
ven
ifover
al,
the
amount
of
label
eddat
ais
larg
een
ough;
ther
em
ayst
ill
be
anee
dfo
rad
dit
ional
label
eddat
aas
toen
able
per
sonal
izat
ion
(aco
mm
on
focu
s).
More
over
obta
inin
gla
bel
eddat
aco
uld
be
expen
sive.
La-
bel
eddat
ais
nee
ded
for
per
sonal
iiza
tion
...
– This
issu
eis
exhorb
ated
,in
alse
ttin
sin
whic
h...
This
phen
om
ena
occ
urs
freq
uen
tly
duri
ng
the
earl
yst
ages
of
the
lear
nin
gpro
cess
[7],
[6],
or
ina
non-s
tati
onar
yen
vi-
ronm
ent
inw
hic
hch
anges
may
occ
ur
inth
eunder
lyin
gm
od
el[2
]. – Contr
ibuti
ons
gra
die
nt
des
cent
...
(exce
pt
the
num
ber
of
sam
ple
sw
eca
nm
ake
isver
ysm
all)
—a)
Stat
eth
epr
oble
m:
b)Sa
yw
hyit
’san
inte
rest
ing
prob
lem
:N
ot
all
of
the
outl
iers
are
bad
c)Sa
yw
hat
your
solu
tion
achi
eves
:d)
Say
wha
tfo
llow
sfr
omyo
urso
luti
on:
Ifw
edis
card
outl
iers
,w
em
ight
be
dis
card
ing
most
info
rmat
ive
dat
apoin
ts=
==
=T
he
goal
of
mac
hin
ele
arnin
gis
tole
arn
anac
cura
tepre
dic
tive
model
from
the
dat
a.D
ata
that
isin
consi
sten
tw
ith
the
lear
ned
model
and/o
rex
isti
ng
dat
ais
refe
red
toas
anoutl
ier.
— Lea
rned
model
isoft
enas
sum
edto
be
appro
xim
atel
yco
r-re
ct,
ther
efore
usi
ng
outl
iers
for
lear
nin
gis
consi
der
edto
be
undes
irea
ble
,an
dhen
ceoutl
iers
tend
tobe
ignore
d.B
ypla
yin
git
safe
and
lear
nin
gonly
from
con
sist
ent
dat
aju
stre
info
rces
our
bel
ieves
inw
hat
we
thin
kis
corr
ect
(whic
hm
aynot
nec
cesa
rily
be
accu
rate
);in
stea
dof
tryin
gto
lear
nw
hat
isnot
yet
know
n,
<<
<T
OD
O:
nee
da
nic
eil
lust
rati
ve
exam
ple
>>
>—
-ju
stnee
ds
twea
kin
g– A
Les
tim
ates
how
info
rmat
ive
poin
tsar
eoutl
iers
are
consi
der
ednot
info
rmat
ive
we
thin
kth
atth
eyar
ever
yin
form
ativ
e...
——
acti
ve
lear
nin
gai
ms
ates
tim
atin
ghow
use
ful
ap
oin
tis
for
lear
nin
g.
typic
ally
ther
eis
alo
to
fun
label
eddat
a,so
me
dat
anee
ds
tobe
label
ed— if
som
epoin
tis
inco
nsi
sten
t,it
may
pote
nti
ally
be
much
more
info
rmat
ive;
info
rmat
ion
that
consi
sten
tpo
ints
bri
ng
isra
ther
lim
ited
since
by
vir
tue
of
bei
ng
con
sist
ent,
this
info
rmat
ion
isal
read
yca
ptu
red
wit
hin
exis
ting
dat
a/m
od
el,
and
consi
sten
poin
tm
ost
lyre
info
rces
pri
or
bel
iefs
(w/o
chai
ng
the
outc
om
es).
– ——
outl
iers
are
dis
card
ed(s
ince
they
are
consi
der
edd
eter
imen
-ta
lto
lear
nin
gth
eu
nder
lyin
gpat
tern
);unle
ssobje
ctiv
eis
tole
arn
todet
ect
outl
iers
,re
fere
dto
asan
om
aly
det
ecti
on
[3],
e.g.
...
——
AL
:oft
enco
ntr
adic
tory
item
sar
eco
nsi
der
edto
be
ou
tlie
rsan
dar
eig
nore
dw
ear
gue
that
they
are
not
outl
iers
,bu
tm
ayin
dee
dco
nta
ina
lot
of
info
rmat
ion
htt
p:/
/jef
fjonas
.typep
ad.c
om
/jef
f_jo
nas
/2010/1
1/b
ig-d
ata-
new
-physi
cs.h
tml
2.
Bad
dat
agood.
More
spec
ifica
lly,
nat
ura
lva
riab
ilit
yin
dat
ain
cludin
gsp
elli
ng
erro
rs,
tran
sposi
tion
erro
rs,
and
even
pro
fess
ional
lyfa
bri
cate
dli
es–
all
hel
pfu
l.A
bit
mo
reab
ou
tth
isher
e:It
Turn
sO
ut
Both
Bad
Dat
aan
da
Tea
spo
on
of
Dir
tM
ayB
eG
ood
For
Youan
dT
her
eIs
No
Such
Thin
gA
sA
Sin
gle
Ver
sion
of
Tru
th.
—
II.
PR
OB
LE
MD
EF
INIT
ION
type
of
outl
iers
:er
ror,
non-e
rror,
...
f(x
,✓)
III.
RE
LA
TE
DW
OR
KS
Act
ive
lear
nin
gan
dou
tlie
rdet
ecti
on
has
bee
njo
intl
yst
ud
ied
bef
ore
.T
ypic
alap
pro
ach
isto
use
outl
ier
det
ect
tore
move
sam
ple
sfr
om
acti
ve
lear
nin
gpro
cess
:e.
g.
[?]
use
doutl
ier
det
ecti
on
duri
ng
the
acti
ve
lear
nin
gto
iden
tify
and
rem
ove
the
sam
ple
sju
ged
tobe
outl
iers
.w
euse
outl
ier
crit
erio
nas
anac
tive
lear
nin
gcr
iter
ion
;pre
vio
us
work
of
[1]
...
has
done
ano
pposi
teu
sing
acti
ve
lear
nin
gcr
iter
ion
asan
outl
ier
crit
erio
ni.
e.outl
ier
det
ecti
on
by
acti
ve
lear
nin
g;
we
pro
pose
the
opposi
tyac
tive
lear
nin
gb
youtl
ier
det
ecti
on
...
QB
CV
Cdim
men
sion
YC
han
ge
[ruben
s]is
sim
ilar
toco
ok’s
dis
tan
ceo
utl
ier
crit
erio
n..
.kolm
ogoro
v/c
om
pre
ssio
n:
giv
enth
atunder
lyin
gp
atte
rnh
asal
read
ybee
nle
arned
;ou
tlie
rca
rrie
sm
ore
addit
ion
alin
form
a-ti
on
Inconsistent SampleNumber of hypotheses is reduced
Rubens et al, AJS 2011
(a) under-fit (b) over-fit (c) appropriate fit
Figure 8: Dependence between model complexity and accuracy.
Figure 9: Training input points that are good for learning one model, are not necessary good for the other.
minX (T rain)
G(X (Train)). (25)
It would be beneficial to combine AL and MS since they share a common goal of minimizing the predictiveerror:
minX (T rain),M
G(X (Train), M). (26)
Ideally we would like to choose the model of appropriate complexity by a MS method and to choose the mostuseful training data by an AL method. However simply combining AL with MS in a batch manner, i.e. selectingall of the training points at once, may not be possible due to the following paradox:
• To select training input points by a standard AL method, a model must be fixed. In other words, MS hasalready been performed (see Figure 9).
• To select the model by a standard MS method, the training input points must be fixed and correspondingtraining output values must be gathered. In other words, AL has already been performed (see Figure 10).
As a result Batch AL selects training points for a randomly chosen model, but after the training points areobtained the model is selected once again, giving rise to the possibility that the training points will not be as
Unable to determine which model is more appropriate (Model Selection), untiltraining points have been obtained (Active Learning).
Figure 10: Dependence of Model Selection on Active Learning.15
Model Selection
If there is no inconsistency between the training and tes#ng data then the most complex model would tend be selected.
Change Detec#on / Model Correc#on
Is inconsistency caused by noise (or minor factors) or by changes in the underlying model
hQp://www.iQvis.com/portals/0/images/ChangeDetec#on_3window.jpg
hQp://www.lucieer.net/research/heard.html
hQp://www.skyboximaging.com/solu#ons/applica#on/change-‐detec#on
hQp://www.sa#magingcorp.com/galleryimages/high-‐resolu#on-‐landsat-‐satellite-‐imagery-‐oman.jpg
– Applica#ons: medical diagnos#cs, intrusion detec#on, network analysis, finance
Conclusion
• Inconsistency could be useful for: – Hypothesis Learning – Model Selec#on – Model Correc#on
Neil Rubens Assistant Professor Ac#ve Intelligence Group Laboratory for Knowledge Compu#ng University of Electro-‐Communica#ons Tokyo, Japan
hQp://Ac#veIntelligence.org