seminar report (final)

1 | P a g e

SEMINAR REPORT

ON

Closest pair: Using Divide and Conquer

SESSION 2014-2015

DEPARTMENT OF

Computer Science and Engineering

SILIGURI INSTITUTE OF TECHNOLOGY

(AFFILIATED BY WBUT)

SUBMITTED BY:-

ARUNEEL DAS

Roll No: - 119002075

Year: - 3rd (6th semester)

Under guidance of:-

Mr. KAUSHIK NATH (Assist. Professor)

2 | P a g e

Preface

This report contains information on a program I wrote in C. The "closest" program

takes in a set of points in two dimensions and finds the distance between the closest

pair of points in the set. The algorithm used in this program is given

in Introduction to Algorithms, by Thomas H. Cormen, Charles E. Leiserson, and

Ronald L. Rivest. This report was prepared for a seminar under guidance of

Professor Mr. KAUSHIK NATH at SHILIGURI INSTITUTE OF TECHNOLOGY.

3 | P a g e

Contents

Seminar report: Closest Pair Algorithm

Preface

Acknowledgement

Description

Introduction

History

Algorithm

Brute Force Algorithm

Divide & Conquer Algorithm

Implementation

Code : BRUTE FORCE

Code: DIVIDE & CONQUER

Result

Output: BRUTE FORCE

Code: DIVIDE & CONQUER

Conclusion

Bibliography

4 | P a g e

Description

This program solves the problem of finding the closest pair of points in a set of points. The set consists of points in R2 defined by both, x and y coordinate. The "closest pair" refers to the pair of points in the set that has the smallest Euclidean distance, where Euclidean distance between points p1=(x1, y1) and p2=(x2,y2) is simply sqrt((x1-x2)2-(y1-y2)2). If there are two identical points in the set, then the closest pair distance in the set will obviously be zero. As noted in Introduction to Algorithms, "this problem has applications in traffic control systems. A system for controlling air or sea traffic might need to know which the two closest vehicles are in order to detect potential collisions."

5 | P a g e

Introduction

The Closest-Pair problem is considered an “easy” Closest-Point problem, in the sense that there are a number of other geometric problems (e.g. nearest neighbors and minimal spanning trees) that find the closest pair as part of their solution. This problem and its generalizations arise in areas such as statistics, pattern recognition and molecular biology. At present time, many algorithms are known for solving the Closest-Pair problem in any dimension k > 2, with optimal time complexity). The Closest-Pair is also one of the first non-trivial computational problems that was solved efficiently using the divide-and-conquer strategy and it became since a classical.

6 | P a g e

History

An algorithm with optimal time complexity O(n lg n) for solving the Closest-Pair problem in the planar case appeared for the first time in 1975, in a computational geometry classic paper by Ian Shamos . This algorithm was based on the Voronoi polygons. The first optimal algorithm for solving the Closest-Pair problem in any dimension k > 2 is due to Jon Bentley and Ian Shamos . Using a divide-and-conquer approach to initially solve the problem in the plane1, those authors were able to generalize the planar process to higher dimensions by exploring a sparsity condition induced over the set of points in the k-plane. For the planar case, the original procedure and other versions of the divide-and-conquer algorithm usually compute at least seven pairwise comparisons for each point in the central slab, within the combine step. In 1998, Zhou, Xiong, and Zhu2 presented an improved version of the planar procedure, where at most four pairwise comparisons need to be considered in the combine step, for each point lying on the left side (alternatively, on the right side) of the central slab. In the same article, Zhou et al. introduced the “complexity of computing distances”, which measures “the number of Euclidean distances to compute by a closest-pair algorithm”. The core idea behind this definition is that, since the Euclidean distance is usually more expensive than other basic operations, it may be possible to achieve significant efficiency improvements by reducing this complexity measure. The authors conclude More recently, Ge, Wang, and Zhu used some sophisticated geometric arguments to show that it is always possible to discard one of the four pairwise comparisons in the combine step, thus reducing significantly the complexity of computing distances, and presented their enhanced version of the Closest-Pair algorithm, accordingly. In 2007, Jiang and Gillespie presented another version of the Closest-Pair divide-and-conquer algorithm which reduced the complexity of computing distances by a logarithmic factor. However, after performing some algorithmic experimentation, the authors found that, albeit this reduction, the new algorithm was “the slowest among the four algorithms” [7] that were included in the comparative study. The experimental results also showed that the fastest among the four algorithms was in

7 | P a g e

fact a procedure named Basic-2, where two pairwise comparisons are required in the combine step, for each point that lies in the central slab and, therefore, has a relative high complexity of computing distancesthat the simpler design in the combine step, and a consequent correct imbalance in trading expensive operations with cheaper ones are the main factors for explaining the success of the Basic-2 algorithm.

8 | P a g e

Algorithm

The most obvious way to compute the closest pair distance of a set of points is to compute the distance for every pair and keep the smallest distance. This brute force algorithm can be computed in O(n2) for a set of n points. The divide and conquer algorithm used here requires only O(n log n) time to compute the same closest pair distance.

Brute Force Algorithm

A straight forward solution is to check the distances between all pairs and take the minimum among them. This solution requires n(n - 1)/2 distance computations and n(n - 1)/2- 1 comparisons. The straightforward solution using induction would proceed by removing a point, solving the problem for (n – 1) points, and considering the extra point. However, if the only information obtained from the solution of the (n – 1) case is the minimum distance, then the distances from the additional point to all other (n -1) points must be checked. As a result, the total number of distance computations T(n) satisfies the recurrence relation T(n) = T(n-1) + n-1, where T(2)= 1, and we can solve T(n) = O(n2).

A l g o r i t h m D e s c r i p t i o n o f B r u t e F o r c e S t r a t e g y

: -

The closest pair of points can be computed in O(n2) time by performing a brute-force

search. To do that, one could compute the distances between all the n(n − 1) /2 pairs

of points, then pick the pair with the smallest distance, as illustrated below.

minDist = infinity for i = 1 to length(P) - 1

for j = i + 1 to length(P) let p = P[i], q = P[j] if dist(p, q) < minDist: minDist = dist(p, q) closestPair = (p, q)

return closestPair

9 | P a g e

bruteForceClosestPair of P(1), P(2), ... P(N) if N < 2 then return _ else minDistance _ |P(1) - P(2)| minPoints _ { P(1), P(2) } foreach i _ [1, N-1] foreach j _ [i+1, N] if |P(i) - P(j)| < minDistance then minDistance _ |P(i) - P(j)| minPoints _ { P(i), P(j) } endif endfor e n d f o r

return minDistance, minPoints endif

Divide and Conquer Algorithm

This algorithm begins by taking the set of points P and sorting in two ways. The set X

consists of the points of P sorted by X coordinate, the set Y consists of the points of P

sorted by Y coordinate. We use presorting, as described later, to avoid resorting X

and Y with each recursive call. The idea of the algorithm is to recursively divide P into

smaller and smaller sets until some base case is reached, compute this base case,

and then combine the solutions. The base case used in my program is to compute by

"brute force" method (compare all pairs) when the set is size BASE_CASE_SIZE or

smaller. When the base case does not apply, "the recursive invocation carries out the

divide-and-conquer paradigm as follows."

Divide: Divide the set P of points into 2 smaller sets PL and PR such that all points in PL are on or to the left of some vertical line l and all points in PR are on or to the right of l. The array X is divided into the sorted arrays XL and XR, and Y is divided into sorted arrays YL and YR, each containing the sorted points of PL and PR respectively. An example divide is shown below:

10 | P a g e

Conquer: After the set of points has been divided, the algorithm makes two recursive calls to find the closest pair of points in PL and PR. The first recursive call receives PL, XL and YL, the second recursive call receives PR, XR and YR. The results of recursive calls are then compared, with the smallest closest pairs distance of the two stored as delta. In the PVM implementation, a recursive procedure call may be replaced with a process spawn in some cases.

Combine: The closest pair distance of a given set is often the delta found after the two recursive algorithm calls; however we must also take care to check the points that lie near the dividing line l. We leave to the reader of this to verify that we only need to consider points falling in the strip within delta distance of the dividing line l, as illustrated by the shaded region. The points in this 2*delta wide strip are stored in an array Y', sorted by y coordinate. For every point in this array Y', we check the distance to the next seven points in Y'. The smallest distance found in this manner is kept track of as delta'. Finally, if delta' is less than delta, then the strip did contain a pair of points closer than delta distance apart and the distance delta' is returned instead of delta.

11 | P a g e

A l g o r i t h m D e s c r i p t i o n o f D i v i d e a n d C o n q u e r

S t r a t e g y : -

Divide the set into two equal sized parts by the line minimal distance in each part. a) Let d be the minimal of the two minimal distances. It takes O(1) time. b) Eliminate points that lie farther than d apart from l. It takes O(n) time c) Sort the remaining points according to their y-coordinates. This Step is a sort that takes O(n log n) time. d) Scan the remaining points in the y order and compute the distances of each point to its five neighbors. It takes O(n) time. e) If any of these distances is less than d then update d. It takes O(1) time. Steps define the merging process must be repeated log n times because this is a

12 | P a g e

divide and conquer algorithm. A sketch of the algorithm based on the recursive divide & conquer approach is given below.

closestPair of (xP, yP) where xP is P(1) .. P(N) sorted by x coordinate, and yP is P(1) .. P(N) sorted by y coordinate (ascending order) if N _ 3 then return closest points of xP using brute-force algorithm else xL _ points of xP from 1 to _N/2_ xR _ points of xP from _N/2_+1 to N xm _ xP(_N/2_)x yL _ { p _ yP : px _ xm } yR _ { p _ yP : px > xm } (dL, pairL) _ closestPair of (xL, yL) (dR, pairR) _ closestPair of (xR, yR) (dmin, pairMin) _ (dR, pairR) if dL < dR then (dmin, pairMin) _ (dL, pairL) endif yS _ { p _ yP : |xm - px| < dmin } nS _ number of points in yS (closest, closestPair) _ (dmin, pairMin) for i from 1 to nS - 1 k _ i + 1 while k _ nS and yS(k)y - yS(i)y < dmin if |yS(k) - yS(i)| < closest then (closest, closestPair) _ (|yS(k) - yS(i)|, {yS(k), yS(i)}) endif k _ k + 1 endwhile endfor return closest, closestPair endif

13 | P a g e

Closest Pair Analysis It takes O(n log n) steps to sort according to the x coordinates, but done only once. We then solve two sub problems of size n/2. Eliminating the points outside of the strips can be done in O(n) steps. It then takes 0(n log n) steps to sort according to the y coordinates. Finally, it takes O(n) steps to scan the strips and to compare each one to a constant number of its neighbors in the order. Overall, to solve a problem of size n, we solve two sub problems of size n/2 and use O(n log n) steps for combining the solutions (plus O(n log n) steps) beginning for sorting the x coordinates). We obtain the following recurrence relation: T(n)=2T(n/2)+O(n log n) ,t(2)=1 The solution of this recurrence relation is T(n) = O(n log2 n). This is asymptotically better than a quadratic algorithm, but we still want to do better than that. So, now we try to find an O(n log n) algorithm. The key idea here is to strengthen the induction hypothesis. The reason we have to spend O(n log n) time in the combining step is the sorting of the y coordinates. Although we know how to solve the sorting problem directly, doing so takes too long. Can we somehow solve the sorting problem at the same time we are solving the closest-pair problem? In other words, we would like to strengthen the induction hypothesis for the closest-pair problem to include sorting. Induction Hypothesis: given a set of <n points in the plane, We

know how to find the closest distance and how to Output the set sorted according to the point’s y coordinates.

We have already seen how to find the minimal distance if the points are sorted in each step according to their y coordinates. Hence, the only thing that we need to do to extend this hypothesis is to sort the set of n points when the two subsets (of size n/2) are already sorted. But, this sorting is exactly merge-sort. The main advantage of this approach is that we do not have to sort every time we combine the solutions — we only have to merge. Since merging can be done in O(n) steps, the recurrence

14 | P a g e

relation becomes T(n) = 2T(n/2) + 0(n), where T(2)= 1, which implies that T(n) = O(n log n). Let T(n) be the time required to solve the problem for n points:

Divide: O (1)

Conquer: 2T(n/2)

Combine: O (n) The precise form of the recurrence is: T(n) = T(_n/2_) + T(_n/2_) + O (n) Final recurrence is T(n) = 2T (n/2) + O(n), which solves to T(n) = O(n log n).

15 | P a g e

Implementation

The following algorithms have been implemented in C. And the mentioned code

for the given algorithms are given below.

CODE: BRUTE FORCE

//This is a brute force implementation of the closest pair problem.

//The time complexity is O(n2)

#include<stdio.h>

#include<math.h>

#include <stdlib.h>

#include<assert.h>

#include<time.h>

#define MAX 32767

#define NP 10

//structure defined to represent a point with X and Y coordinate.

typedef struct pnt

{

double x;

double y;

} point;

//global declarations.

point p1,p2;

double shortestDistance = MAX;

//function to find closest pair by brute force method.

void bruteforce(point Points[ ])

{

int index1,index2,d,i,j;

for(i=0; i<NP-1; i++)

{

for(j=i+1; j<NP; j++)

{

16 | P a g e

d=sqrt(pow((Points[i].x-Points[j].x),2) + pow((Points[i].y-

Points[j].y),2)); //finding Euclidean distance.

if(d<shortestDistance)

{

shortestDistance=d;

p1=Points[i];

p2=Points[j];

}

}

} printf("\n\nShortest distance: %lf", shortestDistance);

printf("\n\nShortest points: point1: (%f , %f) and point2: (%lf , %lf)", p1.x,

p1.y, p2.x,

p2.y);

}

//main function

int main()

{

int i, c = 0;

double *DATA;

point pts[NP];

FILE *fp;

clock_t start,end;

double TIME;

fp = fopen("InputData.txt","r");

assert(fp);

DATA = (double *)calloc(sizeof(double),2*NP);

assert(DATA);

for(i=0; i < 2*NP; i++)

fscanf(fp,"%lf",&DATA[i]);

for(i = 0; i < NP; i++)

{

pts[i].x = DATA[c++];

pts[i].y = DATA[c++];

}

17 | P a g e

printf("\nThe points are: \n");

for(i = 0; i < NP; i++)

{

printf("\n(%lf , %lf)",pts[i].x,pts[i].y); //printing the points on console.

}

start=clock();

bruteforce(pts); //call to closest pair function.

end=clock();

TIME=(double)(end-start)/CLOCKS_PER_SEC;

printf("\n\nTime taken is: %lf",TIME);

fclose(fp);

return 0;

}

CODE: DIVIDE & CONQUER

//This is a divide and conquer implementation of the closest pair problem. //The time complexity is O(nlogn) #include <stdio.h> #include <stdlib.h> #include <math.h> #include<assert.h> #include<time.h> #define MAX 32767 #define NP 10 //structure defined to represent a point with X and Y coordinate. typedef struct pnt { double x; double y; } point; //global declarations. point p1,p2; double shortestDistance = MAX;

18 | P a g e

//function defined to sort the array wrt X-coordinate in O(nlogn) time. void quicksortByX(point A[ ],int p,int r) { int q;

if(p<r) { q = partitionByX(A,p,r); quicksortByX(A,p,q-1); quicksortByX(A,q+1,r); } } int partitionByX(point A[ ],int p,int r) { int s, q; double z; point temp; z = A[p].x; q = p; for(s=p+1 ; s<=r ; s++) { if (A[s].x < z) { q++; temp = A[q]; A[q] = A[s]; A[s] = temp; } } temp = A[p]; A[p] = A[q]; A[q] = temp;

19 | P a g e

return q; } //function defined to sort the array wrt Y-coordinate in O(nlogn) time. void quicksortByY(point A[ ],int p,int r) { Int q; if(p<r)

{ q = partitionByY(A,p,r); quicksortByY(A,p,q-1); quicksortByY(A,q+1,r); } } int partitionByY(point A[ ],int p,int r) { int s, q; double z; point temp; z = A[p].y; q = p; for(s=p+1 ; s<=r ; s++) { if (A[s].y < z) { q++; temp = A[q]; A[q] = A[s]; A[s] = temp; }

20 | P a g e

} temp = A[p]; A[p] = A[q]; A[q] = temp; return q; } //function to calculate minimum. double minimum(double d1, double d2) { if(d1<d2) return d1; else return d2; } // merge pointsByX(low to mid) and pointsByX(mid+1 to high) back, so that pointsByY[low to high] are sorted by y-coordinate void merge(point PointsByX[], point PointsByY[], int lowBound, int mid, int highBound) { int i; for(i =lowBound; i <=highBound; i++) {

PointsByY[i] = PointsByX[i]; } //Only sort pointsByY from lowBound to highBound //Need not sort the entire array because the later calculation only uses part of the array quicksortByY(PointsByY, lowBound, highBound); } //closest function. double closest(point PointsByX[], point PointsByY[], point temp[], int lowBound, int highBound) {

21 | P a g e

if (highBound<=lowBound) //terminating condition for divide and conquer. return MAX; int mid = (lowBound + highBound)/2; //middle index point median = PointsByX[mid]; //middle point. double d1 = closest(PointsByX,PointsByY,temp,lowBound,mid); //recursive calls, left sub problem. double d2 = closest(PointsByX,PointsByY,temp,mid+1,highBound); //recursive calls, right sub problem. double d = minimum(d1,d2); // merge back so that PointsByY array is sorted by y-coordinate // only from index lowBound to index highBound is sorted merge(PointsByX, PointsByY, lowBound, mid, highBound); //call to merge function // temp[0 to k-1] has a sequence of points closer than delta, sorted by y-coordinate int k = 0; int i, j; for(i = lowBound; i<=highBound; i++) { if(abs(PointsByY[i].y - median.y) < d) { temp[k] = PointsByY[i]; k++; } } // compare each point to its neighbors with y-coordinate closer than d for(i = 0; i < k; i++) { for(j=i+1; (j<k) && (temp[j].y-temp[i].y < d); j++) {

22 | P a g e

double distance = sqrt(pow((temp[i].x-temp[j].x),2) + pow((temp[i].y-temp[j].y),2));

if(distance < d) d = distance; if(distance < shortestDistance) { shortestDistance = d; p1 = temp[i]; p2 = temp[j]; } } } return shortestDistance; } //function to find closest pair void closestpair(point Points[]) {

int i; int N = NP;

if(N<=1) return; //sort by x-coordinate point PointsByX[NP]; for(i = 0; i < N; i++) {

PointsByX[i] = Points[i]; //copy the points array as it is into the pointsByX array

} quicksortByX(PointsByX,0,N-1); //call to quick sort to sort it wrt to X-coordinate.

23 | P a g e

// check for identical points for (i = 0; i < N-1; i++) {

if ((PointsByX[i].x == PointsByX[i+1].x) && (PointsByX[i].y == PointsByX[i+1].y))

{ shortestDistance = 0.0; p1 = PointsByX[i]; p2 = PointsByX[i+1]; printf("\n\nShortest distance: %f", shortestDistance);

printf("\n\nShortest points: point1: (%f , %f) and point2: (%f , %f)", p1.x, p1.y, p2.x, p2.y);

return; } } //displayPoints(pointsByX); point PointsByY[N]; for(i=0; i<N; i++) PointsByY[i] = PointsByX[i]; //temporary array point temp[N];

printf("\n\nShortest distance: %f", closest(PointsByX, PointsByY, temp, 0, N-1));

printf("\n\nShortest points: point1: (%f , %f) and point2: (%f , %f)", p1.x, p1.y, p2.x, p2.y);

} //main function int main() { int i; point pts[NP];

24 | P a g e

FILE *fp; clock_t start,end; double TIME; fp = fopen("D:\\close.txt","w"); assert(fp); /* point pts, PointsByX, PointsByY; pts = malloc(sizeof(point) * NP); //array of points

PointsByX = malloc(sizeof(point) * NP); //array to hold points sorted by X coordinate. PointsByY = malloc(sizeof(point) * NP); //array to hold points sorted by Y coordinate.

*/ for(i = 0; i < NP; i++) { //randomly generates X and Y coordinates. pts[i].x = 100 * (double) rand()/RAND_MAX; pts[i].y = 100 * (double) rand()/RAND_MAX; } printf("\nThe points are: \n"); for(i = 0; i < NP; i++) { printf("\n(%f , %f)",pts[i].x,pts[i].y); //printing the points on console. fprintf(fp,"%f %f\n",pts[i].x,pts[i].y); //printing into file. } start=clock(); closestpair(pts); //call to closest pair function. end=clock(); TIME=(double)(end-start)/CLOCKS_PER_SEC; printf("\n\nTime taken is: %lf",TIME); fclose(fp); return 0; }

25 | P a g e

Results OUTPUT: BRUTE FORCE

OUTPUT: DIVIDE & CONQUER

26 | P a g e

Conclusion

A naive algorithm of finding distances between all pairs of points and selecting the minimum requires O (dn2) time. It turns out that the problem may be solved in O(n log n) time in a Euclidean Space of fixed dimension d.

27 | P a g e

Bibliography

Introduction To Algorithms, A Creative Approach -- Udi Manber Pg. 295

Introduction To Algorithms (3ed) -- CLRS Pg. 1039

The Algorithm Design Manual (2ed) -- Steven S Skiena Pg. 595

Algorithms Design Techniques and Analysis -- M H Alsuwaiyel Pg. 209

Algorithm Design -- Kleinberg and Tardos Pg. 243

Algorithms -- Robert Sedgewick Pg. 369

www.saurabhschool.com

seminar report (final)

Documents