Download - Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami
Hardness of Learning Halfspaces with Noise
Prasad Raghavendra
Advisor
Venkatesan Guruswami
Spam Problem10 Million Lottery Cheap Pharmacy Junk Is Spam
YES YES NO YES NO SPAM
NO YES YES NO YES NOT SPAM
YES YES YES YES YES SPAM
NO NO NO YES YES SPAM
YES NO YES NO YES NOT SPAM
NO YES NO NO NO SPAM
1
1
0
1
0
2 X 1 + 3 X 1 + 3 X 0 + 1 X 1 + 7 X 0 = 6
6 > 3Output SPAM
PERCEPTRON
32
1
33
7
Halfspace Learning Problem
Input:Training Samples
Vectors : W1,W2,…Wm {-1,1 }n
Labels : l1, l2,…lm {-1,1}
+
++
+
+
+
+
+
--
-
--
-
--
X
Y
Output: Separating Halfspace:(A, θ)
A ∙ Wi < θ if li =-1A ∙ Wi ≥ θ if li =1
θ - Threshold
SPAM
NOT SPAM
Perspective
• Perceptron classifiers are the simplest neural networks – widely used for classification.
• Perceptron learning algorithms can learn if the data is perfectly separated.
+
++
+
+
+
+
+
--
-
--
-
--
X
-+
-
+
SPAM
NOT SPAM
-
+
+
-
Inseparability
• Who said Halfspaces can classify SPAM vs NOT SPAM?
Data is inherently inseparable- Agnostic Learning
• Even if data is separable, what about Noise? inherent in many forms of data
PAC learning
In Presence of Noise
Agreement : fraction of the examples classified correctly
+
++
+
+
+
+
+
--
-
--
-
--
X
Y
-+
-
+
Classifies correctly 16 of the 20 examples :
Agreement = 0.8 or 80%
Classifies correctly 16 of the 20 examples :
Agreement = 0.8 or 80%
‘Find the hyperplane that maximizes the agreement with training examples’
Halfspace Maximum Agreement (HSMA)
Problem
Related Work : Positive Results
Random Classification Noise • [Blum-Freize-Kannan-Vempala 96] : a PAC learning
algorithm that outputs a decision list of halfspaces• [Cohen 97] : a proper learning algorithm(outputs a
halfspace) for learning halfspacesDistribution of examples• [Kalai-Klivans-Mansour-Servedio 05] : an algorithm
that finds a close to optimal halfspace when examples are from uniform or any log-concave distribution.
Each label flipped with probability less than 1/2
Related Work : Negative Results
• [Amaldi-Kann 98, Ben-David-Eiron-Long 92] HSMA is NP-hard to approximate with some constant factor [261/262, 415/418]
• [Bshouty-Burroghs 02] HSMA is NP-hard to approximate better than 84/85
• [Arora-Babai-Stern-Sweedyk 97, Amaldi-Kann 98] NP-hard to minimize disagreements within a factor of 2O(log n)
1
Open Problem
Given that 99.9% of the examples are correct :No algorithm known that finds a halfspace with
agreement of 51%No hardness result ruled out getting an
agreement of 99%
• Closing this gap was stated as an open problem by [Blum-Frieze-Kannan-Vempala 96] • Highlighted in recent work by [Feldman 06] on (1-ε,1/2 +δ) tight hardness of learning monomials
Our Result
For any ε,δ > 0 , given a set of training examples, it is NP-hard to distinguish between following two cases:
• There is a halfspace with agreement 1- ε• No halfspace has agreement greater than ½ + δ
Even with 99.9% of examples non-noisy, the best we can do is output a random/trivial halfspace!
Remarks
• [Feldman-Gopalan-Khot-Ponnuswami 06] independently showed a similar result.– Our Hardness result holds even for boolean examples
{-1,1}n (their result holds for Rn)– [Feldman et al.]’s hardness result gives stronger
hardness in the sub-constant regime
• We also show: Given a set of linear equations over integers that is
1-ε satisfiable it is NP-hard to find an assignment that satisfies more than δ fraction of the equations
Linear InequalitiesLet halfspace be
a1x1 + a2x2 +… +an xn ≥ θ
SupposeW1 = (-1, 1, -1, 1)
l1 = 1
Constraint :a1(-1) + a2(1) + a3(-1)+ a4(1) ≥ θ
Learning a Halfspace Solving a system of linear inequalities
UnknownsA = (a1,a2,a3,a4)
θ a1 + a2 + a3 + a4 ≥ θ
a1 + a2 + a3 - a4 < θ
a1 + a2 - a3 + a4 < θ
a1 + a2 - a3 + a4 ≥ θ
a1 - a2 + a3 - a4 ≥ θ
a1 - a2 + a3 + a4 < θ
a1 + a2 - a3 - a4 < θ
Label Cover Problem
U, V : set of verticesE : set of edges{1,2… R} : set of labels πe: constraint on edge e
An assignment A satisfies an edge e = (u,v) E if
πe (A(u)) = A(v)
123..R
123..Rπe
U V
u
v
Find an assignment A that satisfies maximum number of edges
3
π e (3)=2
7
5
2
3
3
1
4
1
2
5
6
Hardness of Label Cover
There exists γ > 0 such thatGiven a label cover instance Г =(U,V,E,R,π), it is
NP-hard to distinguish between :• Г is completely satisfiable• No assignment satisfies more than 1/Rγ
fraction of the edges.
[Raz 98]
a1 + a2 + a3 + a4 ≥ θ
a1 + a2 + a3 - a4 < θ
a1 + a2 - a3 + a4 < θ
a1 + a2 - a3 + a4 ≥ θ
a1 - a2 + a3 - a4 ≥ θ
a1 - a2 + a3 + a4 < θ
a1 + a2 - a3 - a4 < θ
Aim
U V
Variables : a1,a2,a3,a4, θ
SATISFIABLE1/Rγ SATISFIABLE
Homogenous inequalities with +1, -1 coefficients
Variables
For each vertex u,R variables : u1,u2,…,uR
U V
123..R
If u is assigned label k then uk = 1
and uj = 0 for all j ≠k
Equation Tuples
All vertices are assigned exactly one label
123..R
123..Rπe
u
v
Most of the variables are zero
For all uu1 + u2 +.. uR = 1
For all u,vu1 + u2 +.. uR - (v1 + v2 +.. vR) = 0
For all constraints πe
all 1 ≤ k ≤ R∑ui = vk summation over all i, πe(i) = k
u1 – v1 = 0u2 + u3 – v2 = 0
Pick randomly t variables ui
ui = 0OVER ALL RANDOM CHOICES
EQUATION TUPLE
There is an assignment that satisfies most of the equation tuples
Equation Tuples
SATISFIABLE1/Rγ SATISFIABLE
Suppose u2 + u3 – v2 = 0 is an equation|u2 + u3 – v2| > ε (u1 + u2 +.. uR )
Scaling Factor : u1 + u2 +.. uR
Next Stepu1 – v1 = 0u2 + u3 – v2 = 0u1 + u2 + u3 – v1 –v2 – v3 = 0u1 = 0 u3 + v1 – v2 = 0
One Unsatisfied equation Most tuples have C equations that are not even approximately satisfied
•Introduce Several copies of the variables•Add consistency checks between the different copies of the same variable
Each variable appears exactly
once in a tuple, with coefficient +1, -1
Recap
SATISFIABLE1/Rγ SATISFIABLE
Most tuples have C equations that are not even approximately satisfied
Most tuples are completely satisfied
Each variable appears exactly once
in a tuple, with coeffcient +1, -1
Using linear inequalities distinguish between a tuple that is •Completely Satisfied•Atleast C of its equations are not even approximately satisfied
ObservationA – B < 0 A + B ≥ 0
B > 0 |A| < B
u1 – v1 = 0u4 + u5 – v2 = 0u6 + u2 + u7 – v4 –v5 – v6 = 0u3 = 0 u8 + v3 – v7 = 0
X 1+X 1+X -1+X 1+X -1+
u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7
=
u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 - u1 + u2 +.. uR < 0u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 + u1 + u2 +.. uR ≥ 0
Pick one of the equation tuples at random Scaling Factor : u1 + u2 +.. uR
Good Case
u1 – v1 u2 + u3 – v2 u1 + u2 + u3 – v1 –v2 – v3 u1 u3 + v1 – v2
= 0= 0= 0= 0= 0
u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 = 0
The assignment also satisfies, u1 + u2 +.. uR = 1
BOTH INEQUALITIES SATISFIED
With high probability over the choice of tuples
u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 - u1 + u2 +.. uR < 0u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 + u1 + u2 +.. uR ≥ 0
Bad Case
u1 – v1 u2 + u3 – v2 u1 + u2 + u3 – v1 –v2 – v3 u1 u3 + v1 – v2
> ε (u1 + u2 +.. uR )
> ε (u1 + u2 +.. uR )> ε (u1 + u2 +.. uR )
For large enough C ,With high probability over choice of +1,-1 combination,
| u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 | > (u1 + u2 +.. uR )
ATMOST ONE OF INEQUALITIES SATISFIED
With high probability over choice of equaton tuple,
u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 - u1 + u2 +.. uR < 0u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 + u1 + u2 +.. uR ≥ 0
For any vector v = (v1 ,v2 ,…vn) with sufficiently many large coordinates(> ε ), at least 1- δ fraction of the vectors u S satisfy
|u v| > 1∙
Interesting Set of Vectors
All possible {-1,1} combinations is exponentially large set. Construct a polynomial size subset S of {-1,1}n such that
Construction using 4-wise independent family and random grouping of coordinates
Construction
V1
V2
V3
V4
V5
V6
V7
> ε> ε
> ε
> ε
-1 1-1 1-1 1 1
∙ -V1 +V2 -V3 +V4 -V5 +V6 +V7
= > 1
Four-wise independent family : some constant probabilityAll 2n combinations : probability close to 1
=S5εε
= L11
=S10
=S2
= S1
εε
εε
ConstructionV1
V2
V3
V4
V5
V6
V7
..
..V89
V99
V100
V101
V102
V103
V1
….V8
V2
..
V5
V100
…V101
V103
V9
…V89
V6
-1..
1
1…-11
1….1
-1…1-1
1
-1
+1
1
εε
εε
εε
εε
εε
εε
εε
εε
εε
εε
εε
εε
All 2n combinations
4-wise independent set
By independence of grouping
By ChernoffBounds
All 2n combinations
Conclusion
• Either an assumption on the distribution of examples or the noise is necessary for efficient halfspace learning algorithms.
• [Raghavendra-Venkatesan] Similar hardness result for learning Support vector machines in presence of adversarial noise.
THANK YOU
Details
• All possible {-1,1} combinations is an exponentially large set.
• No variable should occur more than once in an equation tuple, to ensure that ultimately the inequalities all have coefficients in {-1,1}
Construction using 4-wise independent family and random grouping of coordinates
Use different copies of the variables for different equations, and careful choice of consistency checks
For any vector v = (v1 ,v2 ,…vn) with sufficiently many large coordinates(> ε ), atmost δ fraction of the vectors u S satisfy
|u v| < 1∙
Interesting Set of Vectors
All possible {-1,1} combinations is exponentially large set. Construct a polynomial size subset S of {-1,1}n such that
Construction using 4-wise independent family and random grouping of coordinates
Equation Tuple
u1 – v1 = 0u2 + u3 – v2 = 0u1 + u2 + u3 – v1 –v2 – v3 = 0u1 = 0 u3 + v1 – v2 = 0
ε-SatisfactionAn assignment A is said to ε-satisfy an equation E tuple if it satisfies all the equations in the tuple
u2 + u3 – v2 < ε (u1 + u2 + u3)