1/15 agnostically learning halfspaces focs 2005. 2/15 set x, f class of functions f: x! {0,1}....

Agnostically learning halfspaces

FOCS 2005

Set X, F class of functions f: X!{0,1}.

EfficientAgnosticLearner

w.h.p. h: X!{0,1}

poly(1/) samples

P[f*(x)y]

P [h(x) y] · opt +

L. SellieAgnostic learning

arbitrary arbitrary dist. over (x,y) 2 X £ {0,1}f* = argminf2F P [f(x)y]

Set Xnn µ Rnn, Fnn class of functions f: Xnn!{0,1}.

w.h.p. h: Xnn!{0,1}

poly(n,n,1/)

samples

Agnostic learning

P[f*(x)y]

P [h(x) y] · opt +

L. Sellie

Set Xnn µ Rnn, Fnn class of functions f: Xnn!{0,1}.

w.h.p. h: Xnn!{0,1}

poly(n,n,1/)

samples

Agnostic learning

P[f*(x)y]

P [h(x) y] · opt +

in PAC model, P [f*(x)y] = 0

L. Sellie

Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }.

h: Rn!{0,1}

P [h(x) y] · opt +

Agnostic learning of halfspaces

argminf2F

P[f(x)y]

P[f*(x)y]

Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }.

h: Rn!{0,1}

P [h(x) y] · opt +

Special case: junctions, e.g.,f(x) = x1 Ç x3 = I(x1 + x3 ¸ 1)

Efficient agnostic-learn junctions ) PAC-learn DNF

NP-hard to properly agnostic learn

P[f*(x)y]

Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }.

h: Rn!{0,1}

P [h(x) y] · opt +

PAC PAC learninglearninghalfspaceshalfspaces

solved by LP solved by LP P[f*(x)y]

Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }.

h: Rn!{0,1}

P [h(x) y] · opt +

PAC PAC learninglearninghalfspaces with halfspaces with indep./random indep./random noisenoise

solved by:solved by: P[f*(x)y]

Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }.

h: Rn!{0,1}

minf2FnnP[f(x)y]

P [h(x) y] · opt +

Equivalently, f*=“truth” with adversarial noise

Theorem 1:

Our alg. outputs h: Rn!{0,1} with P [h(x) y] · opt + ,

in time poly(n) (8 const ),

as long as draws x x 2 2 RRnn from: Log-concave distribution, e.g.:

uniform over convex set, exponential e-|x|, normal

Uniform over {-1,1}n or Sn-1={x2Rn | |x|=1}

…nO(-4)

(w.h.p.)

1. L1 polynomial regression algorithm Given: d>0, (x1,y1),…,(xm,ym) 2 Rn £ {0,1}

Find deg-d p(x) to minimize:

Pick 22 [0,1] [0,1] at random, output h(x) = h(x) = II(p(x)(p(x)¸̧))time nO(d)

multivariate¼ minimizedeg(p)·d E [ |p(x)-y| ]

time nO(d)

2. Low-degree Fourier algorithm of Chose ,

where Output h(x) = h(x) = II(p(x)(p(x)¸̧½) ½)

¼ minimizedeg(p)·d E [ (p(x)-y)2 ](requires x uniform from {-1,1}n)

time nO(d)

1. L1 polynomial regression algorithm Given: d>0, (x1,y1),…,(xm,ym) 2 Rn £ {0,1}

Find deg-d p(x) to minimize:

Pick 22 [0,1] [0,1] at random, output h(x) = h(x) = II(p(x)(p(x)¸̧))

multivariate

2. Low-degree Fourier algorithm of

¼ minimizedeg(p)·d E [ |p(x)-y| ]

Chose ,

where Output h(x) = h(x) = II(p(x)(p(x)¸̧½) ½)

¼ minimizedeg(p)·d E [ (p(x)-y)2 ](requires x uniform from {-1,1}n)

lemma: alg’s error · opt + mindeg(q)·dE [ |f*(x)-q(x)| ]

lemma: alg’s error·8(opt + mindeg(q)·dE [(f*(x)-q(x))2])

lemma of : alg’s error· ½ - (½ - opt)2 +& Sellie

Approx degree is dimension-free for halfspaces

Useful properties of logconcave dist’s: projection is logconcave, …,

-1 -0.5 0.5 1

q(x) ¼ I(x ¸ 0)degree d=10

q(w¢x) ¼ I(w¢x¸0)degree d=10

-1 -0.5 0.5 1

Approximating I(x ¸ ) (1 dimension)

Bound mindeg(q)·dE[(q(x) – I(x ¸ ))2] Continuous distributions: orthogonal polynomials

Normal: Hermite polynomials Logconcave (e-|x|/2 suffices): new polynomials Uniform on sphere: Gegenbauer polynomials

Uniform on hypercube: Fourier

<f,g> = E [f(x)g(x)]

Hey, I’ve usedHermite (pronounced air-meet) polynomials

many times.

Theorem 2: junctions (e.g., x1 Æ x11 Æ x17) For arbitrary over {0,1}n£{0,1} the polynomial regression algorithm with d=O(n1/2log(1/)) (time -O*(n½)) outputs h with P[h(x)y] · opt +

Follows from previous lemmas +

How far can we get in poly(n,1/) time?

Assume draws x uniform from: Sn-1 = { x2Rn | |x|=1} Perceptron algorithm: error · O(pn) opt +

We show: simple averaging algorithm of achieves error · O(log(1/opt)) opt +

Assume (x,y) = (1-) (x,f*(x)) + (arbitrary (x,y)):

We get: error · O(n1/4 log(n/)) + using Rankin’s second bound

uniform 2 Sn-1

Half-space conclusions & future work

L1 poly reg: natural extension of Fourier learning Works for non-uniform/arbitrary distributions Tolerates agnostic noise Works on both continuous and discrete problems

Future work Work on all distributions

(not just logconcave/uniform {-1,1}n) opt + using poly(n,1/) algorithm

(we have poly(n) for fixed , and trivial: poly() for fixed n) Other interesting classes of functions

1/15 agnostically learning halfspaces focs 2005. 2/15 set x, f class of functions f: x! {0,1}....

n slide

f p fx y n n nn set

x uniform

arbitrary x

f n p fx y p hx y

n lemma

fixed n

f n class of functions

Documents

random variables over domains - entcsrandom variables over...

ponderació per - estudia a la...

diapositiva 1 · equilibrio 0,2 –x ... il ph sara: ch 3...

-3 -4 1 -4 · 2021. 1. 2. · 0,1 d d @ 1;1> f(x) 0 @2;3>...

rebarino - grupinox€¦ · en frío laminado en caliente 3...

va2055sa/va2055sm/ va2055sm-2 monitor · 2016. 10. 28. ·...

3as-maths-livre-prof-solutions-exercices · 2,9 4,9 3 0x x...

barÓmetro de mayo 2019 - aelpa · 2019. 6. 19. ·...

o. caligaris - p. oliva riduzione di problemi ......140 o....

formaÇÃo de intermetÁlicos feal atravÉs de … ·...

§ 2.1. op ćenito o nizovima - wordpress.comrealni brojevi,...

1 linguagens livres de contexto. 2 motivação - exemplo pal...

(1,0) (0,1) (-1,0) (0, -1) α (x,y) x y 1 sin(α) = y...

a modern course on curves and...

ami`q/m+ibqmgiqg.22tgg2 `mbm;straka/courses/npfl114/... ·...

encontrando linearizações sensibilidade à variação:...

lampiran lampiran 1. determinasi...

mathématiques - lyceearago · peut réduire l'étude de la...

gráficas de funciones con matlab. rangos numéricos. *...

protección individual - construmática€¦ · mag / tig...