hardness of learning halfspaces with noise prasad raghavendra advisor venkatesan guruswami

Hardness of Learning Halfspaces with Noise

Prasad Raghavendra

Advisor

Venkatesan Guruswami

Spam Problem10 Million Lottery Cheap Pharmacy Junk Is Spam

YES YES NO YES NO SPAM

NO YES YES NO YES NOT SPAM

YES YES YES YES YES SPAM

NO NO NO YES YES SPAM

YES NO YES NO YES NOT SPAM

NO YES NO NO NO SPAM

2 X 1 + 3 X 1 + 3 X 0 + 1 X 1 + 7 X 0 = 6

6 > 3Output SPAM

PERCEPTRON

Halfspace Learning Problem

Input:Training Samples

Vectors : W1,W2,…Wm {-1,1 }n

Labels : l1, l2,…lm {-1,1}

Output: Separating Halfspace:(A, θ)

A ∙ Wi < θ if li =-1A ∙ Wi ≥ θ if li =1

θ - Threshold

NOT SPAM

Perspective

• Perceptron classifiers are the simplest neural networks – widely used for classification.

• Perceptron learning algorithms can learn if the data is perfectly separated.

NOT SPAM

Inseparability

• Who said Halfspaces can classify SPAM vs NOT SPAM?

Data is inherently inseparable- Agnostic Learning

• Even if data is separable, what about Noise? inherent in many forms of data

PAC learning

In Presence of Noise

Agreement : fraction of the examples classified correctly

Classifies correctly 16 of the 20 examples :

Agreement = 0.8 or 80%

Classifies correctly 16 of the 20 examples :

Agreement = 0.8 or 80%

‘Find the hyperplane that maximizes the agreement with training examples’

Halfspace Maximum Agreement (HSMA)

Problem

Related Work : Positive Results

Random Classification Noise • [Blum-Freize-Kannan-Vempala 96] : a PAC learning

algorithm that outputs a decision list of halfspaces• [Cohen 97] : a proper learning algorithm(outputs a

halfspace) for learning halfspacesDistribution of examples• [Kalai-Klivans-Mansour-Servedio 05] : an algorithm

that finds a close to optimal halfspace when examples are from uniform or any log-concave distribution.

Each label flipped with probability less than 1/2

Related Work : Negative Results

• [Amaldi-Kann 98, Ben-David-Eiron-Long 92] HSMA is NP-hard to approximate with some constant factor [261/262, 415/418]

• [Bshouty-Burroghs 02] HSMA is NP-hard to approximate better than 84/85

• [Arora-Babai-Stern-Sweedyk 97, Amaldi-Kann 98] NP-hard to minimize disagreements within a factor of 2O(log n)

Open Problem

Given that 99.9% of the examples are correct :No algorithm known that finds a halfspace with

agreement of 51%No hardness result ruled out getting an

agreement of 99%

• Closing this gap was stated as an open problem by [Blum-Frieze-Kannan-Vempala 96] • Highlighted in recent work by [Feldman 06] on (1-ε,1/2 +δ) tight hardness of learning monomials

Our Result

For any ε,δ > 0 , given a set of training examples, it is NP-hard to distinguish between following two cases:

• There is a halfspace with agreement 1- ε• No halfspace has agreement greater than ½ + δ

Even with 99.9% of examples non-noisy, the best we can do is output a random/trivial halfspace!

Remarks

• [Feldman-Gopalan-Khot-Ponnuswami 06] independently showed a similar result.– Our Hardness result holds even for boolean examples

{-1,1}n (their result holds for Rn)– [Feldman et al.]’s hardness result gives stronger

hardness in the sub-constant regime

• We also show: Given a set of linear equations over integers that is

1-ε satisfiable it is NP-hard to find an assignment that satisfies more than δ fraction of the equations

Linear InequalitiesLet halfspace be

a1x1 + a2x2 +… +an xn ≥ θ

SupposeW1 = (-1, 1, -1, 1)

l1 = 1

Constraint :a1(-1) + a2(1) + a3(-1)+ a4(1) ≥ θ

Learning a Halfspace Solving a system of linear inequalities

UnknownsA = (a1,a2,a3,a4)

θ a1 + a2 + a3 + a4 ≥ θ

a1 + a2 + a3 - a4 < θ

a1 + a2 - a3 + a4 < θ

a1 + a2 - a3 + a4 ≥ θ

a1 - a2 + a3 - a4 ≥ θ

a1 - a2 + a3 + a4 < θ

a1 + a2 - a3 - a4 < θ

Label Cover Problem

U, V : set of verticesE : set of edges{1,2… R} : set of labels πe: constraint on edge e

An assignment A satisfies an edge e = (u,v) E if

πe (A(u)) = A(v)

123..R

123..Rπe

Find an assignment A that satisfies maximum number of edges

π e (3)=2

Hardness of Label Cover

There exists γ > 0 such thatGiven a label cover instance Г =(U,V,E,R,π), it is

NP-hard to distinguish between :• Г is completely satisfiable• No assignment satisfies more than 1/Rγ

fraction of the edges.

[Raz 98]

a1 + a2 + a3 + a4 ≥ θ

a1 + a2 + a3 - a4 < θ

a1 + a2 - a3 + a4 < θ

a1 + a2 - a3 + a4 ≥ θ

a1 - a2 + a3 - a4 ≥ θ

a1 - a2 + a3 + a4 < θ

a1 + a2 - a3 - a4 < θ

Variables : a1,a2,a3,a4, θ

SATISFIABLE1/Rγ SATISFIABLE

Homogenous inequalities with +1, -1 coefficients

Variables

For each vertex u,R variables : u1,u2,…,uR

123..R

If u is assigned label k then uk = 1

and uj = 0 for all j ≠k

Equation Tuples

All vertices are assigned exactly one label

123..R

123..Rπe

Most of the variables are zero

For all uu1 + u2 +.. uR = 1

For all u,vu1 + u2 +.. uR - (v1 + v2 +.. vR) = 0

For all constraints πe

all 1 ≤ k ≤ R∑ui = vk summation over all i, πe(i) = k

u1 – v1 = 0u2 + u3 – v2 = 0

Pick randomly t variables ui

ui = 0OVER ALL RANDOM CHOICES

EQUATION TUPLE

There is an assignment that satisfies most of the equation tuples

Equation Tuples

Suppose u2 + u3 – v2 = 0 is an equation|u2 + u3 – v2| > ε (u1 + u2 +.. uR )

Scaling Factor : u1 + u2 +.. uR

Next Stepu1 – v1 = 0u2 + u3 – v2 = 0u1 + u2 + u3 – v1 –v2 – v3 = 0u1 = 0 u3 + v1 – v2 = 0

One Unsatisfied equation Most tuples have C equations that are not even approximately satisfied

•Introduce Several copies of the variables•Add consistency checks between the different copies of the same variable

Each variable appears exactly

once in a tuple, with coefficient +1, -1

Most tuples have C equations that are not even approximately satisfied

Most tuples are completely satisfied

Each variable appears exactly once

in a tuple, with coeffcient +1, -1

Using linear inequalities distinguish between a tuple that is •Completely Satisfied•Atleast C of its equations are not even approximately satisfied

ObservationA – B < 0 A + B ≥ 0

B > 0 |A| < B

u1 – v1 = 0u4 + u5 – v2 = 0u6 + u2 + u7 – v4 –v5 – v6 = 0u3 = 0 u8 + v3 – v7 = 0

X 1+X 1+X -1+X 1+X -1+

u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7

u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 - u1 + u2 +.. uR < 0u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 + u1 + u2 +.. uR ≥ 0

Pick one of the equation tuples at random Scaling Factor : u1 + u2 +.. uR

Good Case

u1 – v1 u2 + u3 – v2 u1 + u2 + u3 – v1 –v2 – v3 u1 u3 + v1 – v2

= 0= 0= 0= 0= 0

u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 = 0

The assignment also satisfies, u1 + u2 +.. uR = 1

BOTH INEQUALITIES SATISFIED

With high probability over the choice of tuples

Bad Case

u1 – v1 u2 + u3 – v2 u1 + u2 + u3 – v1 –v2 – v3 u1 u3 + v1 – v2

> ε (u1 + u2 +.. uR )

> ε (u1 + u2 +.. uR )> ε (u1 + u2 +.. uR )

For large enough C ,With high probability over choice of +1,-1 combination,

| u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 | > (u1 + u2 +.. uR )

ATMOST ONE OF INEQUALITIES SATISFIED

With high probability over choice of equaton tuple,

For any vector v = (v1 ,v2 ,…vn) with sufficiently many large coordinates(> ε ), at least 1- δ fraction of the vectors u S satisfy

|u v| > 1∙

Interesting Set of Vectors

All possible {-1,1} combinations is exponentially large set. Construct a polynomial size subset S of {-1,1}n such that

Construction using 4-wise independent family and random grouping of coordinates

Construction

> ε> ε

-1 1-1 1-1 1 1

∙ -V1 +V2 -V3 +V4 -V5 +V6 +V7

Four-wise independent family : some constant probabilityAll 2n combinations : probability close to 1

=S5εε

ConstructionV1

….V8

…V101

…V89

1…-11

1….1

-1…1-1

All 2n combinations

4-wise independent set

By independence of grouping

By ChernoffBounds

All 2n combinations

Conclusion

• Either an assumption on the distribution of examples or the noise is necessary for efficient halfspace learning algorithms.

• [Raghavendra-Venkatesan] Similar hardness result for learning Support vector machines in presence of adversarial noise.

THANK YOU

Details

• All possible {-1,1} combinations is an exponentially large set.

• No variable should occur more than once in an equation tuple, to ensure that ultimately the inequalities all have coefficients in {-1,1}

Use different copies of the variables for different equations, and careful choice of consistency checks

For any vector v = (v1 ,v2 ,…vn) with sufficiently many large coordinates(> ε ), atmost δ fraction of the vectors u S satisfy

|u v| < 1∙

Interesting Set of Vectors

All possible {-1,1} combinations is exponentially large set. Construct a polynomial size subset S of {-1,1}n such that

Equation Tuple

u1 – v1 = 0u2 + u3 – v2 = 0u1 + u2 + u3 – v1 –v2 – v3 = 0u1 = 0 u3 + v1 – v2 = 0

ε-SatisfactionAn assignment A is said to ε-satisfy an equation E tuple if it satisfies all the equations in the tuple

u2 + u3 – v2 < ε (u1 + u2 + u3)

hardness of learning halfspaces with noise prasad raghavendra advisor venkatesan guruswami

Documents

venkatesan guruswami (cmu) yury makarychev (tti-c) prasad...

a 3-query pcp over integers a.k.a solving sparse linear...

list decoding tensor products and interleaved codes ·...

efficient active learning of halfspaces: an aggressive...

venkatesan guruswami (cmu) yuan zhou (cmu)

learning kernel-based halfspaces with the...

list decoding product and interleaved codes prasad...

tropical halfspaces - the library at msri

lasserre hierarchy, higher eigenvalues and approximation...

crooked halfspaces j. burelle, v. charette, t. drumm...

venkatesan ashok, velvyslanec indi swagatam

curriculum vitae - carnegie mellon school of computer...

communication with imperfect shared...

probabilistically checkable proofs - madhu...

decoding reed-muller codes with the guruswami …...decoding...

learning intersections and thresholds of halfspaces

aditya bhaskara ( princeton ) moses charikar (princeton)...

venkatesan guruswami (cmu) yuan zhou (cmu). satisfiable csps...

srinivasan venkatesan resume

chandramouli venkatesan catalyst