nearest neighbors algorithm

1

NEAREST NEIGHBORS ALGORITHMLecturer: Yishay MansourPresentation: Adi Haviv and Guy Lev

2

Lecture Overview• NN general overview• Various methods of NN• Models of the Nearest Neighbor Algorithm• NN – Risk Analysis • KNN – Risk Analysis• Drawbacks• Locality Sensitive Hashing (LSH)• Implementing KNN using LSH• Extension for Bounded Values• Extension for Real Numbers

3

General Overview• The nearest neighbors algorithm is a simple classification

algorithm that classify a new point according to its nearest neighbors class\label:• Let be a sample space of points and their classification. Given a

new point we find the and classify with • An implicit assumptions is that close points have the same

classification

4

Various methods of NN• NN - Given a new point x, we wish to find it's nearest point

and return it's classification.

• K-NN - Given a new point x, we wish to find it's k nearest points and return their average classification.

• Weighted - Given a new point x , we assign weights to all the sample points according to the distance from x and classify x according to the weighted average.

5

Models of the Nearest Neighbor Algorithm

1. Distribution over • Sample according to • .• The problem with this model is that is not measurable.

2. 2 distributions (negative points), (positive points) and parameter

• sample a class using such that • sample from and return •

3. Distribution over .• sample from D.

6

NN – Risk Analysis• The optimal classifier is: • r(x) = min{p(x), 1-p(x)}, • the loss of the optimal classifier is (Bayes Risk):

= = • P =

• We will prove that:

7

• we split to 2 cases: Simple Case and General Case.• Simple Case:

• There exist exatcly one such that• Proof:• Thus we get that the expected error is:

KNN vs. Bayes Risk Proof

8

• General Case:• The nearest neighbor of converge to • The classification of the neighborhood of is close to that of

• Proof• Simplifications:

• D(x) is non zero• P(x) is continues

• Theorem: for every given with probability 1.• Proof:

• be a sphere with radius with center • .

• Theorem: • Proof:

• = {the event that the NN algorithm made a mistake with a sample space of m points}

• Pr[

NN vs. Bayes Risk Proof Cont.

9

KNN – Risk Analysis• We go on to the case of K points, here we will gain that we

wont lose the factor of 2 of the Bayes Risk.• Denote:

• l= number of • The estimation is : • Our conditions are:

• • We want to proof that:

1. 2.

10

KNN vs. Bayes Risk Proof • Same as before we split to 2 cases.• Simple Case:

• All k nearest neighbors are identical to (1) is satisfied • proof

• By Chertoff bound:

11

KNN vs. Bayes Risk Proof • Proof for the General Case:• Define same as before.• ] > 0• Expected number of point that will fall in is • = number of points that fall in • at most k-1 in ] =

12

Drawbacks • No bound on the number of samples (m): the nearest

neighbor is dependent on the actual distribution • for example: we set m and take such that m

• The probability of error is at least • Determine the distance function- distance between points

should not be effected by different scales. 2 ways to normalize:• Assuming normal distribution : • Assuming uniform distribution:

𝛿𝛿 𝛿

? +¿−

13

Locality Sensitive Hashing (LSH)• Trivial algorithm for KNN : for every point go over all other

points and compute distance linear time.• We want to pre-process the sample set so that search

time would be sub-linear.• We can look at the following problem: given x and R, find

all y such that

14

Locality Sensitive Hashing (LSH)• A Locality Sensitive Hashing family is a set H of hash

functions s.t. for any p,q:• If then • If then for some probabilities

• Example:• • • • If then we have as required.

15

Implementing KNN using LSH• Step 1: Amplification:

• Use functions of the form where are randomly selected from H. Then:

• If then • If then

• k is chosen s.t. . Thus:

• Denote:

16

Implementing KNN using LSH Cont.• Step 2: Combination

• pick L functions (use L hash tables).• For each i:

• Probability of no-match for any of the functions: • For given δ, Choose , then we have:

• For “far” points, the probability to hit is so the probability of hitting a “far” point in any of the tables is bounded by

17

Implementing KNN using LSH Cont.• We are given LSH family H and sample set.• Pre-processing:

• Pick L functions (use L hash tables).• Insert each sample x to each table i, according to

• Finding nearest neighbors of q:• For each i calculate and search in the ith table.• Thus obtain • Check the distance between q and each point in P.

18

Implementing KNN using LSH Cont.• Complexity:• Space Complexity: L tables, each containing n samples.

Therefore: • Search time complexity:

• O(L) queries to hash tables.• We assume lookup time is constant.

• For each sample retrieved we check if it is “close”.• Expected number of “far” points is at most

Therefore rejecting “far” samples is O(L).• Time for processing “close” samples: O(kL)

• Where k is number of desired neighbors.

19

Extension for Bounded Values• Sample space is• We use as distance metric.• Use unary encoding:

• Represent each coordinate by a block of s bits• A value t is represented by t consecutive 1s followed by s-t zeros.

• Example: s=8, x=<5,7>• Representation of x: 1111100011111110

• Hamming distance in this representation is same as distance in the original representation.

• Problems with real values can be reduced to this solution by quantization.

L1

L1

20

Extension for Real Numbers• Sample space is X = • Assume R<<1• Pick randomly and uniformly • Hash function is:

• For :• Therefore:

• If R is small then:

21

Extension for Real Numbers Cont.• Therefore:

• So we get a separation between and given a big enough constant c.

nearest neighbors algorithm

Documents

sample x

nearest point

knn risk analysis9knn

representation of x

new point x

bayes risk proof

use functions

nearest neighbors of