hardness of learning halfspaces with noise prasad raghavendra advisor venkatesan guruswami

Post on 31-Dec-2015

220 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Hardness of Learning Halfspaces with Noise

Prasad Raghavendra

Advisor

Venkatesan Guruswami

Prasad Raghavendra
Larger Name

Spam Problem10 Million Lottery Cheap Pharmacy Junk Is Spam

YES YES NO YES NO SPAM

NO YES YES NO YES NOT SPAM

YES YES YES YES YES SPAM

NO NO NO YES YES SPAM

YES NO YES NO YES NOT SPAM

NO YES NO NO NO SPAM

1

1

0

1

0

2 X 1 + 3 X 1 + 3 X 0 + 1 X 1 + 7 X 0 = 6

6 > 3Output SPAM

PERCEPTRON

32

1

33

7

Halfspace Learning Problem

Input:Training Samples

Vectors : W1,W2,…Wm {-1,1 }n

Labels : l1, l2,…lm {-1,1}

+

++

+

+

+

+

+

--

-

--

-

--

X

Y

Output: Separating Halfspace:(A, θ)

A ∙ Wi < θ if li =-1A ∙ Wi ≥ θ if li =1

θ - Threshold

SPAM

NOT SPAM

Prasad Raghavendra
Light green is a bad color MakSpend more timeMake vector notationMake more time

Perspective

• Perceptron classifiers are the simplest neural networks – widely used for classification.

• Perceptron learning algorithms can learn if the data is perfectly separated.

+

++

+

+

+

+

+

--

-

--

-

--

X

-+

-

+

SPAM

NOT SPAM

-

+

+

-

Inseparability

• Who said Halfspaces can classify SPAM vs NOT SPAM?

Data is inherently inseparable- Agnostic Learning

• Even if data is separable, what about Noise? inherent in many forms of data

PAC learning

In Presence of Noise

Agreement : fraction of the examples classified correctly

+

++

+

+

+

+

+

--

-

--

-

--

X

Y

-+

-

+

Classifies correctly 16 of the 20 examples :

Agreement = 0.8 or 80%

Classifies correctly 16 of the 20 examples :

Agreement = 0.8 or 80%

‘Find the hyperplane that maximizes the agreement with training examples’

Halfspace Maximum Agreement (HSMA)

Problem

Prasad Raghavendra
HSMA Halfspace expand it

Related Work : Positive Results

Random Classification Noise • [Blum-Freize-Kannan-Vempala 96] : a PAC learning

algorithm that outputs a decision list of halfspaces• [Cohen 97] : a proper learning algorithm(outputs a

halfspace) for learning halfspacesDistribution of examples• [Kalai-Klivans-Mansour-Servedio 05] : an algorithm

that finds a close to optimal halfspace when examples are from uniform or any log-concave distribution.

Each label flipped with probability less than 1/2

Prasad Raghavendra
We assume arbitrary noise and arbitrary examples, there are algorithms either one of them is randomly generated.

Related Work : Negative Results

• [Amaldi-Kann 98, Ben-David-Eiron-Long 92] HSMA is NP-hard to approximate with some constant factor [261/262, 415/418]

• [Bshouty-Burroghs 02] HSMA is NP-hard to approximate better than 84/85

• [Arora-Babai-Stern-Sweedyk 97, Amaldi-Kann 98] NP-hard to minimize disagreements within a factor of 2O(log n)

1

Open Problem

Given that 99.9% of the examples are correct :No algorithm known that finds a halfspace with

agreement of 51%No hardness result ruled out getting an

agreement of 99%

• Closing this gap was stated as an open problem by [Blum-Frieze-Kannan-Vempala 96] • Highlighted in recent work by [Feldman 06] on (1-ε,1/2 +δ) tight hardness of learning monomials

Prasad Raghavendra
blue author namemake citation
Prasad Raghavendra
Weakest algorithm, weakest hardness.. emphasize

Our Result

For any ε,δ > 0 , given a set of training examples, it is NP-hard to distinguish between following two cases:

• There is a halfspace with agreement 1- ε• No halfspace has agreement greater than ½ + δ

Even with 99.9% of examples non-noisy, the best we can do is output a random/trivial halfspace!

Remarks

• [Feldman-Gopalan-Khot-Ponnuswami 06] independently showed a similar result.– Our Hardness result holds even for boolean examples

{-1,1}n (their result holds for Rn)– [Feldman et al.]’s hardness result gives stronger

hardness in the sub-constant regime

• We also show: Given a set of linear equations over integers that is

1-ε satisfiable it is NP-hard to find an assignment that satisfies more than δ fraction of the equations

Prasad Raghavendra
Say Hope the problem is clear, and lets get to the proof.

Linear InequalitiesLet halfspace be

a1x1 + a2x2 +… +an xn ≥ θ

SupposeW1 = (-1, 1, -1, 1)

l1 = 1

Constraint :a1(-1) + a2(1) + a3(-1)+ a4(1) ≥ θ

Learning a Halfspace Solving a system of linear inequalities

UnknownsA = (a1,a2,a3,a4)

θ a1 + a2 + a3 + a4 ≥ θ

a1 + a2 + a3 - a4 < θ

a1 + a2 - a3 + a4 < θ

a1 + a2 - a3 + a4 ≥ θ

a1 - a2 + a3 - a4 ≥ θ

a1 - a2 + a3 + a4 < θ

a1 + a2 - a3 - a4 < θ

Label Cover Problem

U, V : set of verticesE : set of edges{1,2… R} : set of labels πe: constraint on edge e

An assignment A satisfies an edge e = (u,v) E if

πe (A(u)) = A(v)

123..R

123..Rπe

U V

u

v

Find an assignment A that satisfies maximum number of edges

3

π e (3)=2

7

5

2

3

3

1

4

1

2

5

6

Hardness of Label Cover

There exists γ > 0 such thatGiven a label cover instance Г =(U,V,E,R,π), it is

NP-hard to distinguish between :• Г is completely satisfiable• No assignment satisfies more than 1/Rγ

fraction of the edges.

[Raz 98]

a1 + a2 + a3 + a4 ≥ θ

a1 + a2 + a3 - a4 < θ

a1 + a2 - a3 + a4 < θ

a1 + a2 - a3 + a4 ≥ θ

a1 - a2 + a3 - a4 ≥ θ

a1 - a2 + a3 + a4 < θ

a1 + a2 - a3 - a4 < θ

Aim

U V

Variables : a1,a2,a3,a4, θ

SATISFIABLE1/Rγ SATISFIABLE

Homogenous inequalities with +1, -1 coefficients

Prasad Raghavendra
Learning to linequatlities relation/Or Just add a slide Make w

Variables

For each vertex u,R variables : u1,u2,…,uR

U V

123..R

If u is assigned label k then uk = 1

and uj = 0 for all j ≠k

Prasad Raghavendra
same u for vertex and variable

Equation Tuples

All vertices are assigned exactly one label

123..R

123..Rπe

u

v

Most of the variables are zero

For all uu1 + u2 +.. uR = 1

For all u,vu1 + u2 +.. uR - (v1 + v2 +.. vR) = 0

For all constraints πe

all 1 ≤ k ≤ R∑ui = vk summation over all i, πe(i) = k

u1 – v1 = 0u2 + u3 – v2 = 0

Pick randomly t variables ui

ui = 0OVER ALL RANDOM CHOICES

EQUATION TUPLE

There is an assignment that satisfies most of the equation tuples

Equation Tuples

SATISFIABLE1/Rγ SATISFIABLE

Suppose u2 + u3 – v2 = 0 is an equation|u2 + u3 – v2| > ε (u1 + u2 +.. uR )

Scaling Factor : u1 + u2 +.. uR

Next Stepu1 – v1 = 0u2 + u3 – v2 = 0u1 + u2 + u3 – v1 –v2 – v3 = 0u1 = 0 u3 + v1 – v2 = 0

One Unsatisfied equation Most tuples have C equations that are not even approximately satisfied

•Introduce Several copies of the variables•Add consistency checks between the different copies of the same variable

Each variable appears exactly

once in a tuple, with coefficient +1, -1

Prasad Raghavendra
font size?
Prasad Raghavendra
font ize?emphasize C

Recap

SATISFIABLE1/Rγ SATISFIABLE

Most tuples have C equations that are not even approximately satisfied

Most tuples are completely satisfied

Each variable appears exactly once

in a tuple, with coeffcient +1, -1

Using linear inequalities distinguish between a tuple that is •Completely Satisfied•Atleast C of its equations are not even approximately satisfied

ObservationA – B < 0 A + B ≥ 0

B > 0 |A| < B

u1 – v1 = 0u4 + u5 – v2 = 0u6 + u2 + u7 – v4 –v5 – v6 = 0u3 = 0 u8 + v3 – v7 = 0

X 1+X 1+X -1+X 1+X -1+

u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7

=

u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 - u1 + u2 +.. uR < 0u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 + u1 + u2 +.. uR ≥ 0

Pick one of the equation tuples at random Scaling Factor : u1 + u2 +.. uR

Good Case

u1 – v1 u2 + u3 – v2 u1 + u2 + u3 – v1 –v2 – v3 u1 u3 + v1 – v2

= 0= 0= 0= 0= 0

u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 = 0

The assignment also satisfies, u1 + u2 +.. uR = 1

BOTH INEQUALITIES SATISFIED

With high probability over the choice of tuples

u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 - u1 + u2 +.. uR < 0u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 + u1 + u2 +.. uR ≥ 0

Bad Case

u1 – v1 u2 + u3 – v2 u1 + u2 + u3 – v1 –v2 – v3 u1 u3 + v1 – v2

> ε (u1 + u2 +.. uR )

> ε (u1 + u2 +.. uR )> ε (u1 + u2 +.. uR )

For large enough C ,With high probability over choice of +1,-1 combination,

| u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 | > (u1 + u2 +.. uR )

ATMOST ONE OF INEQUALITIES SATISFIED

With high probability over choice of equaton tuple,

u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 - u1 + u2 +.. uR < 0u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 + u1 + u2 +.. uR ≥ 0

For any vector v = (v1 ,v2 ,…vn) with sufficiently many large coordinates(> ε ), at least 1- δ fraction of the vectors u S satisfy

|u v| > 1∙

Interesting Set of Vectors

All possible {-1,1} combinations is exponentially large set. Construct a polynomial size subset S of {-1,1}n such that

Construction using 4-wise independent family and random grouping of coordinates

Construction

V1

V2

V3

V4

V5

V6

V7

> ε> ε

> ε

> ε

-1 1-1 1-1 1 1

∙ -V1 +V2 -V3 +V4 -V5 +V6 +V7

= > 1

Four-wise independent family : some constant probabilityAll 2n combinations : probability close to 1

=S5εε

= L11

=S10

=S2

= S1

εε

εε

ConstructionV1

V2

V3

V4

V5

V6

V7

..

..V89

V99

V100

V101

V102

V103

V1

….V8

V2

..

V5

V100

…V101

V103

V9

…V89

V6

-1..

1

1…-11

1….1

-1…1-1

1

-1

+1

1

εε

εε

εε

εε

εε

εε

εε

εε

εε

εε

εε

εε

All 2n combinations

4-wise independent set

By independence of grouping

By ChernoffBounds

All 2n combinations

Conclusion

• Either an assumption on the distribution of examples or the noise is necessary for efficient halfspace learning algorithms.

• [Raghavendra-Venkatesan] Similar hardness result for learning Support vector machines in presence of adversarial noise.

Prasad Raghavendra
give citation

THANK YOU

Details

• All possible {-1,1} combinations is an exponentially large set.

• No variable should occur more than once in an equation tuple, to ensure that ultimately the inequalities all have coefficients in {-1,1}

Construction using 4-wise independent family and random grouping of coordinates

Use different copies of the variables for different equations, and careful choice of consistency checks

For any vector v = (v1 ,v2 ,…vn) with sufficiently many large coordinates(> ε ), atmost δ fraction of the vectors u S satisfy

|u v| < 1∙

Interesting Set of Vectors

All possible {-1,1} combinations is exponentially large set. Construct a polynomial size subset S of {-1,1}n such that

Construction using 4-wise independent family and random grouping of coordinates

Equation Tuple

u1 – v1 = 0u2 + u3 – v2 = 0u1 + u2 + u3 – v1 –v2 – v3 = 0u1 = 0 u3 + v1 – v2 = 0

ε-SatisfactionAn assignment A is said to ε-satisfy an equation E tuple if it satisfies all the equations in the tuple

u2 + u3 – v2 < ε (u1 + u2 + u3)

top related