comparison of various classification techniques for satellite...

International Journal Of Scientific & Engineering Research, Volume 4, Issue 1, January-2013 1 ISSN 2229-5518

1

1

Comparison of Various Classification Techniques for Satellite Data

1Manoj Pandya,

1Astha Baxi,

1M.B. Potdar,

1M.H. Kalubarme,

2Bijendra Agarwal

Abstract - Computer interpretation of remote sensing data is referred to as quantitative analysis because of its ability to identify pixels

based upon their numerical properties and owing to its ability for counting pixels for area estimates. It is also generally called classification,

which is a method by which labels may be attached to pixels in view of their spectral character. This labeling is implemented by a computer

by having trained it beforehand to recognize pixels with spectral similarities. Clearly the image data for quantitative analysis must be

available in digital form. This is an advantage with image data types, such as that from Landsat, SPOT, IRS, etc, as against more

traditional aerial photographs. The latter require digitization before quantitative analysis can be performed. Various classification

techniques adopted in this study include unsupervised classifiers like K-Means, ISODATA and supervised classifiers like Minimum

Distance, Maximum Likelihood Classifier, Parallelpiped and Enhanced Seeded region Growing Technique.

Index Terms – Satellite Data, Classification, supervised, unsupervised, K-Means, ISODATA, MXL, Prallelpiped, Seeded Region

Growing

—————————— ——————————

1. INTRODUCTION

atellite imagery is a source of large amount of information.

A two dimensional image that is recorded by satellite

sensors is the mapping of the three dimensional visual

world. The captured two dimensional signals are sampled

and quantized to yield digital images. An Image is worth a

thousand words. Satellite image interpreters from various

domains extract Information by marking polygon features on

image. Various classification methods facilitate to

automatically classify objects from image. There are several

methods which can be used to extract features.

2. IMAGE SEGMENTATION

Segmentation of an image is defined by a set of regions that

are connected and non-overlapping, so that each pixel in a

segment in the image acquires a unique region label that

indicates the region it belongs to. Segmentation is one of the

most important elements in automated image analysis, mainly

because at this step the objects or other entities of interest

are extracted from an image for subsequent processing,

such as description and recognition.

3 CLASSIFICATION METHODS

There are basically two kinds of classification methods.

Unsupervised and supervised.

Unsupervised Classifiers:

These kinds of classifiers don’t require training site or

training resources to classify objects.

K-MEANS

In statistics and data mining, k-means clustering is a

method of cluster analysis which aims to partition n

observations into k clusters in which each observation belongs

to the cluster with the nearest mean.

It is an iterative procedure.

• In first step, it assigns an arbitrary initial cluster

vector.

• The second step classifies each pixel to the closest

cluster.

• In the third step the new cluster mean vectors are

calculated based on all the pixels in one cluster.

• The second and third steps are repeated until the

"change" between the iteration is small. The "change" can be

defined in several different ways; either by measuring the

distances the mean cluster vector has changed from one

iteration to another or by the percentage of pixels that have

changed between iterations.

• The objective of the k-means algorithm is to minimize

the within cluster variability. The objective function (which is

to be minimized) is:

Sums of squares distances (errors) between each pixel and

its assigned cluster centre.

S

————————————————

Manoj Pandya, Astha Baxi, M.B. Potdar & M.H. Kalubarme are currently affiliated with Bhaskaracharya Institute for Space Applications and Geo-Informatics (BISAG), Gandhinagar, Gujarat, India PIN 382007 E-mail: [email protected]

Dr. Bijendra Agarwal is Director, VJMS College, VADU, Kadi, Mahesana, Gujarat, India


1

2

1

3

1

4

Where, C(x) is the mean of the cluster that pixel x is

assigned to.

Minimizing the SSdistances is equivalent to minimizing the

Mean Squared Error (MSE). The MSE is a measure of the

within cluster variability.

ISO Data (Iterative Self-Organizing) Classifier

ISO Data stands for Iterative Self-Organizing Data Analysis

Techniques. This algorithm allows the number of clusters to

be automatically adjusted during the iteration by merging

similar clusters and splitting clusters.

Clusters are merged if either the number of members (pixel)

in a cluster is less than a certain threshold or if the centres of

two clusters are closer than a certain threshold. Clusters are

split into two different clusters if the cluster standard

deviation exceeds a predefined value and the number of

members (pixels) is twice the threshold for the minimum

number of members. [1]

The ISODATA algorithm is similar to the k-means

algorithm with the distinct difference that the ISODATA

algorithm allows for different number of clusters while the k-

means assumes that the number of clusters is known a priori.

• Each pixel is compared to each cluster mean and

assigned to the cluster whose mean is closest in Euclidean

distance.

• A new cluster center is computed by averaging the

locations of all the pixels assigned to that cluster.

• The Sum of Squared Errors (SSE) computes the

cumulative squared difference (in the various bands) of each

pixel from its cluster center for each cluster individually, and

then sums these measures over all the clusters.

• The algorithm will stop either when the # iteration

threshold is reached or the max % of unchanged pixel

threshold is reached

Supervised Classifiers:

These kinds of classifiers require adequate training sites or

training signatures for each class of a given image.

Minimum Distance Classifier

This classifier is also referred to as central classifier. This

classifier is the simplest classifier.

• Analyst first computes mean of each training class.

• Next the distance (Euclidian) of each pixel from the

mean is calculated.

• The pixel is assigned to that class whose distance is

nearest to the mean (i.e. the Euclidian distance is minimum

between the pixel and the mean).

The distance is defined as an index of similarity so that the

minimum distance is identical to the maximum similarity.

Figure 1 shows the concept of a minimum distance classifier.

The following distances are often used in this procedure. [7]

Figure 1: Minimum Distance Classifier

It is used to classify unknown image data to classes which

minimize the distance between the image data and the class in

multi-feature space. In the Euclidean plane, if p = (p1, p2) and q = (q1, q2) then the

distance is given by

Fig: 1 Schema for feature tables using pre-defined data types

Parallel Piped Classifier:

The parallelepiped classifier (often termed multi-level slicing)

divides each axis of multi-spectral feature space, as shown in

an example in Figure 2

Figure 2: Schematic concept of parallel piped classifier in three dimensional feature spaces

The decision region for each class is defined on the basis of a

lowest and highest value on each axis. The accuracy of

classification depends on the selection of the lowest and

highest values in consideration of the population statistics of

each class. In the two-dimensional feature space this forms a

rectangular box. We can have as many boxes as number of

classes. All the data points which fall within the box are

labeled to that class. In this respect, it is most important that

the distribution of population of each class is well understood.

[2]


1

5

1

6

1

7

1

8

Maximum Likelihood Classifier (MXL):

The maximum likelihood classifier is one of the most popular

methods of classification in remote sensing, in which a pixel

with the maximum likelihood is classified into the

corresponding class. The likelihood Lk is defined as the

posterior probability of a pixel belonging to class k. [4]

Lk = P(k)*P(X/k) / P(i)*P(X/i)

where P(k) : prior probability of class k

P(X/k) : conditional probability to observe X from class k, or

probability density function

Usually P(k) are assumed to be equal to each other and

P(i)*P(X/i) is also common to all classes.

Therefore Lk depends on P(X/k) or the probability density

function.

For mathematical reasons, a multivariate normal distribution

is applied as the probability density function. In the case of

normal distributions, the likelihood can be expressed as

follows. [5]

Where,

n: number of bands

X: image data of n bands

Lk(X) : likelihood of X belonging to class k

Xk : mean vector of class k

: variance-covariance matrix of class k

is determinant of

In the case where the variance-covariance matrix is

symmetric, the likelihood is the same as the Euclidian

distance.

Figure 3: concept of the maximum likelihood method

Seeded Region Growing Technique:

Seed point selection is based on some user criterion like

pixels in a certain gray-level range, pixels evenly spaced on a

grid, etc. An image is segmented into regions with respect to a

set of q seeds. Given the set of seeds, S1, S2, . . . , Sq, each step

of SRG involves one additional pixel to one of the seed sets.

Moreover, these initial seeds are further replaced by the

centroids of these generated homogeneous regions, R1, R2, ... ,

Rq, by involving the additional pixels step by step. The pixels

in the same region are labeled by the same symbol and the

pixels in variant regions are labeled by different symbols. All

these labeled pixels are called the allocated pixels, and the

others are called the unallocated pixels. Let H be the set of all

unallocated pixels which are adjacent to at least one of the

labeled regions. [14]

Where, N(x, y) is the second-order neighborhood of the

pixel (x, y) as shown in Figure 1. For the unlabeled pixel (x, y)

є H, N(x, y) meets just one of the labeled image region Ri and

define (x, y) є {1, 2, ..., q} to be that index such that N(x, y) ∩

R(x, y) . (x, y, Ri) is defined as the difference between the

testing pixel at (x, y) and its adjacent labeled region Ri. (x, y,

Ri) is calculated as

Where g(x, y) indicates the values of the three color

components of the testing pixel (x, y), g (Xci , Y ci ) represents

2


the average values of three colors components of the

homogeneous region Ri, with g(Xci , Y ci ) the centroid of Ri.

Figure 1 identification of pixels around the seed pixel

If N(x, y) meets two or more of the labeled regions, (x, y)

takes a value of i such that N(x, y) meets Ri and (x, y, Ri) is

minimized.

This seeded region growing procedure is repeated until all

pixels in the image have been allocated to the corresponding

regions.

Enhanced Seeded Region Growing:

This technique was specifically designed by Baxi et al., 2012,

for mobile applications along with mahalanobis distance

classifier [16]. SRG is also very attractive for semantic image

segmentation by involving the high-level knowledge of image

components in the seed selection procedure.

The FEATURE TABLE stores a collection of features.

The location information with metadata is considered as seed

points which can be super-imposed on the geog-referenced

image using Application Programming Interface (API). These

seed points generate training sites for SRG technique

Enhanced Classifier:

The distance of pixel from the mean of given class is

calculated by variance and co-variance methods. It differs

from Euclidean distance in that it takes into account the

correlations of the data set and is multivariate effect size.

where X : vector of image data (n bands)

X = [ x1, x2, .... xn]

Xk : mean of the kth class

Xk = [ m1, m2, .... mn]

(

9

(

10


k : variance matrix

k : variance-covariance matrix

4 COMPARISONS OF CLASSIFIERS

Figure 1 (a) FCC LISS 3 Image of Surendranagar district India with Cotton

Seeds marked in yellow circles

Figure 1 (b) Paddy Seeds in yellow circles

CLASSIFIED IMAGE

Figure 3 Enhanced SRG for both classes

ACCURACY ASSESSMENT

Kappa coefficient is used to evaluate the accuracy. The

original intent of Cohen's Kappa was to measure the degree of

agreement or disagreement of two or more people observing

the same phenomenon. Cohen's kappa measures the

agreement between two raters who each classify N items into

C mutually exclusive categories. The equation for K is:

K =Pr(a) -Pr(e)/1 - Pr(e)

Where Pr(a) is the relative observed agreement among

raters or the total agreement probability, and Pr(e) is the

hypothetical probability of chance agreement, using the

observed data to calculate the probabilities of each observer

randomly saying each category. If the raters are in complete

agreement then K = 1. If there is no agreement among the

raster, then K ≤ 0. The better performing classifiers should

then have the higher K.

Table: Comparison of various classifiers


Figure: Classifier vs. Kappa co-efficient chart

5 CONCLUSIONS

Various Classification techniques for satellite images have

different accuracy. Hybrid method of SRG and Mahalanobis

technique leads to better accuracy.

The extension to the ESRG classifier is Artificial Neural

Network (ANN) and Fuzzy logic. Object oriented

classification also leads to precisely identify objects like trees,

car, buses etc.

ACKNOWLEDGEMENTS

The authors would like to thank to the Director, BISAG, T .P.

Singh for his inspiration and motivation.

REFERENCES

[1] http://www.yale.edu/ceo/Projects/swap/landcover/Unsupervise

d_classification.htm

[2] Parallel Piped Classifier “http://stlab.iis.u-

tokyo.ac.jp/~wataru/lecture/rsgis/rsnote/cp11/11-4-1.gif”

[3] Variance

“www.stat.yale.edu/~pollard/Courses/241.fall97/Variance.pdf”

[4] Maximum Likelihood http://stlab.iis.u-

tokyo.ac.jp/~wataru/lecture/rsgis/rsnote/cp11/cp11-7.htm

[5] http://edndoc.esri.com/arcobjects/9.2/net/shared/geoprocessing/

spatial_analyst_tools/how_maximum_likelihood_classification

_works.htm

[6] Mahabolis_Dist.txt

http://research.cs.tamu.edu/prism/lectures/iss/iss_l12.pdf

[7] Joseph, George (2003) Fundamentals of Remote Sensing ISBN:

81 7371 457 6

[8] Ahmed Rekik, Mourad Zribi, et al. An Optimal Unsupervised

Satellite image Segmentation Approach Based on Pearson

System and k-Means Clustering Algorithm Initialization

[9] Tobler, W. R. “A classification of map projections”, Annals of the

Association of American Geographers, Vol. 52, pp. 167-175.

[10] Sergio Bernabé, Antonio Plaza, A New Tool for Classification of

Satellite Images Available from Google Maps: Efficient

Implementation in Graphics Processing Units

[11] Piscataway, NJ: IEEE, Knowledge-based semi-supervised

satellite image classification

[12] Aykut AKGÜNa,, A.Hüsnü ERONAT, et at. Comparing

Different Satellite Image Classification Methods: An

Application In Ayvalik District,Western Turkey

[13] Frank Y. Shih, Shouxian Cheng,Computer Vision Laboratory,

College of Computing Sciences, New Jersey Institute of

Technology, Newark, NJ 07102, USA,Automatic seeded region

growing for color image segmentation (2005)

[14] Jianping Fan a, Guihua Zeng Frank Y. Shih, et al. , Seeded

region growing: an extensive and comparative study

[15] http://en.wikipedia.org/wiki/Region_growing

[16] Astha Baxi, Manoj Pandya, M .H. Kalubarme and M .B.

Potdar,2012. Supervised Classification of Satellite Imagery

using Enhanced Seeded Region Growing Technique

[17] Jianping Fan, Guihua Zeng , Mathurin Body , Mohand-Said

Hacid, “Seeded region growing: an extensive and

comparative study”, Elsevier B.V. , Pattern Recognition Letters

26 , 2005

[18] Rolf Adams and Leanne Bischo, “Seeded region growing”, IEEE

transactions on pattern analysis and machine intelligence, vol.

16, no. 6, June 1994

INSTITUTE OF TECHNOLOGY, NIRMA UNIVERSITY, AHMEDABAD – 382 481, 08-10 DECEMBER, 2011

Supervised Classification of Satellite Imagery using

Enhanced Seeded Region Growing Technique

Astha Baxi, Manoj Pandya, M .H. Kalubarme and M .B. Potdar

Abstract - The image classifications techniques have been

practiced by remote sensing experts following certain methods

like unsupervised and supervised. Supervised classification

requires precise human intervention to extract features.

Enhanced Seeded region growing technique is an image

segmentation method; where the image pixel is seeded by

latitude and longitude recorded during ground truth data

collection using GPS. The Enhanced seeded region growing

technique generates clusters based upon 8 nearest neighbor

pixel connections. Pattern recognition standard software is

trained for the spectral signatures of the corresponding pixels.

Then the supervised classification algorithm can be used.

The system can leverage the potential of Location based

services (LBS) and Information Communication Technology

(ICT) to dynamically pull the latitude and longitude from the

server using web services and gateway protocols. This method

requires less effort to extract features from the image. This

scheme is applied on satellite imagery covering surendranagar

district in Gujarat, India.

Index Terms - Image segmentation, Location Based

Services, Satellite Image Classification, Seeded Region Growing

Technique, Supervised Classification.

I. INTRODUCTION

In the study of digital image processing, classification

techniques have been considered as basic approaches for

feature extraction and analysis. The classical methods of

unsupervised classification comprising K-Means and

ISODATA undergo limitations in terms of accuracy. The

methods of supervised classification encompassing

Maximum Likelihood (MXL), Parallelepiped Classifier,

Minimum Distance classifier etc. require precise human

intervention in identification of training sites or signature

marking for the system.

An Enhanced Seeded Region Growing (SRG) technique

identifies training sites from the corresponding location of the

site from the ground truth data. The location comprising

latitude, longitude and feature metadata can be superimposed

on the geo-referenced satellite image. The geographical

location is denoted as a seed pixel. This seed pixel is

epitomized as a training signature. The algorithm classifies

the image accordingly. The system requires less human

intervention compared to the conventional supervised

classification methods.

As the tone of a particular object depends upon the

reflection of the sunlight; it may vary on different days. A

satellite travels a path in a day. For different path, the

signature may be different. Thereafter, the signature can be

used to classify the images from the same path only.

The paper initially explains the seeded region growing

technique, and then an enhanced algorithm based on seeded

region growing technique is proposed in section III. The

algorithm is implemented on remote sensing data of site

covering surendanagar district in Gujarat, India. The result of

the output is discussed in section IV. Applications and

limitations of proposed algorithm are described in further

sections.

II. SEEDED REGION GROWING TECHNIQUE

Seeded region growing is a simple region-based image

segmentation method. It is also classified as a pixel-based

image segmentation method since it involves the selection of

initial seed points. This approach to segmentation examines

neighboring pixels of initial “seed points” and determines

whether the pixel neighbors should be added to the region.

The seed points are the pixels representing the signature of a

class. The process is iterated on, in the same manner as

common data clustering algorithms.

In conventional SRG technique the first step is to select a

set of seed points as described in [1]. Seed point selection is

based on some user criterion like pixels in a certain gray-level

range, pixels evenly spaced on a grid, etc. An image is

segmented into regions with respect to a set of q seeds. Given

the set of seeds, S1, S2, . . . , Sq, each step of SRG involves

one additional pixel to one of the seed sets. Moreover, these

initial seeds are further replaced by the centroids of these

generated homogeneous regions, R1, R2, ... , Rq, by

involving the additional pixels step by step. The pixels in the

same region are labeled by the same symbol and the pixels in

variant regions are labeled by different symbols. All these

labeled pixels are called the allocated pixels, and the others

are called the unallocated pixels. Let H be the set of all

unallocated pixels which are adjacent to at least one of the

labeled regions.

Where, N(x, y) is the second-order neighborhood of the

pixel (x, y) as shown in Fig 1. For the unlabeled pixel (x, y) є

H, N(x, y) meets just one of the labeled image region Ri and

define (x, y) є {1, 2, ..., q} to be that index such that N(x, y)

∩ R (x, y) . (x, y, Ri) is defined as the difference between

the testing pixel at (x, y) and its adjacent labeled region Ri.

(x, y, Ri) is calculated as

1

978-1-4577-2168-7/11/$26.00 ©2011 IEEE

INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN TECHNOLOGY, „NUiCONE – 2011‟

Where g(x, y) indicates the values of the three color

components of the testing pixel (x, y), g (Xci , Y ci ) represents

the average values of three colors components of the

homogeneous region Ri, with g(Xci , Y ci ) the centroid of Ri.

Fig. 1 Identification of pixels around the seed pixel

If N(x, y) meets two or more of the labeled regions, (x, y)

takes a value of i such that N(x, y) meets Ri and (x, y, Ri) is

minimized.

This seeded region growing procedure is repeated

until all pixels in the image have been allocated to the

corresponding regions. The definition of Equation (1) and (3)

ensures that the final classification will divide the image into

a set of regions as homogeneous as possible on the basis of

the given constraints. SRG algorithm is robust, rapid and free

of tuning parameters. It can correctly separate the regions that

have the same properties we define. The concept is simple

and we only need a small numbers of seed point to represent

the property we want. We can also choose multiple criteria at

the same time. It performs well with respect to noise.

However, SRG algorithm requires the mechanism to

automate the seed selection procedure.

III. METHODOLOGY

Use of GEO ICT can be converged in image classification

process to generate seed points. The GPS enabled mobile

devices can be used to collect feature details and position of

the location. The location details are transmitted to the

centralized server at the time of ground survey in the form of

GEO SMS using GSM modem or Gateway services

protocols. The server fetches corresponding information and

stores in the database. The location information with

metadata is considered as seed points which can be super-

imposed on the geog-referenced image using Application

Programming Interface (API). These seed points generate

training sites for SRG technique. All images taken from the

same path on that particular day can be classified using this

method. Once the signature data is collected, the image can

be classified by the following steps:

Fig. 2 Use of GEO ICT in the seed pixel based image data classification

1. Select the area of interest (AOI) from the image,

which is to be classified. The entire image can also be treated

as AOI.

2. The starting pixel of AOI is denoted as St (X, Y). If

the width of AOI is say W and height is H then define an

array list of size A[W, H] and initialize it as A[ i, j]=0 for

every i ≤ W, j ≤ H and i, j ≥ 0. 3. Now plot all the signature sites (training sites) on the

image and store its equivalent pixel‟s intensity values.

4. Find the mean of these intensity values. Say, I.

5. Define a stack- single dimension array list, Say S of

some size. Here stack size is defined to overcome the stack

overflow problem. For better performance, machines having

higher memory are recommended as it is required to limit the

stack size.

6. Define a threshold value T where the intensity

difference between the given pixel and „I' should not be

greater than T. Here, the threshold value should be given

according the clarity of the image to be classified. If the

image is clear i.e. every prominent object is visibly

distinguishable on image then we can reduce the value of T

and vice versa.

7. Now start with the pixel St (X, Y). The intensity of

St (X, Y) is say, ǐ.

If St (X, Y) is not labelled and If |I- ǐ| ≤ T then mark A [i, j]

as labelled. Find its 8-connected neighbouring pixels and

2

INSTITUTE OF TECHNOLOGY, NIRMA UNIVERSITY, AHMEDABAD – 382 481, 08-10 DECEMBER, 2011

push them in S if S is not full. If S is full then leave the

neighbouring pixels unlabelled.

If St (X, Y) is not labelled and |I- ǐ| > T then take the next

available pixel in AOI, say (X+1, Y) and repeat step 7.

8. If S is not empty then pop a pixel and find its

intensity. Continue the same process as described in step 7

until S is empty. If S is empty then go to the next available

pixel in AOI, say (X+1, Y) and repeat the same procedure

described in step 7.

9. Continue this procedure until the entire AOI is

covered.

Here, the efficiency of this algorithm depends upon factors

like Signature sites, clarity of the image to be classified, T,

size of S and the time difference between the ground truth

collection and satellite acquisition of the data.

IV. EXPERIMENT AND RESULTS

The algorithm is tested on LISS-3 imagery of an area

covering parts of surendranagar and Ahmedabad district in

Gujarat, India. The size of image is 1000 x 500. It is a

multispectral false colour composite (FCC) image consisting

of 3 bands which are NIR, RED and GREEN. The parameters

of the algorithm are as follows. T=10%, size of S=16 and

three signature sites for each cotton and paddy are taken

whose locations are given in table 1. Here, the average tone

of all the three pixels is taken as a seed pixel. So there is one

seed point for each class. Table I Location of seed pixels

Ground

Cover

Type

Latitude Longitude

Cotton 1 22o42‟19.34”N 70o53‟52.54”E

Cotton 2. 22o40‟26.21”N 70o58‟55.18”E

Cotton 3. 22o38‟15.13”N 70o55‟59.48”E

Paddy 1. 22o39‟44.86”N 71o4‟52.46”E

Paddy 2. 22o39‟19.89”N

71o1‟46.68”E

Paddy 3. 22o40‟4.36”N

70o57‟34.48”E

From the knowledge about the area we can say that there

are mainly 2 types of crops i.e. cotton and paddy grown in

this area. Fig. 3.a is the FCC image of the area. Here, RED

clusters show cotton crop while Maroon clusters display

paddy crop. The centre of circles marked as yellow is taken

as seed pixels. Fig. 3.b shows the seed point locations for

Paddy. First the algorithm is applied to classify only one class

i.e. cotton whose output is shown in fig. 4.a. In the image

green colour represents Cotton while all the other pixels

displayed in black represent unclassified area. Fig. 4.b

represents the output of parallel piped classification method

implemented on the same image using the same training site.

The output is compared with the ground truth data in table 2.

Fig. 3 (a) FCC LISS-3 Image of Surendranagar district in Gujarat, India

Fig. 3 (b) Paddy Seeds

Fig. 4 (a) Result of Enhanced SRG for single class

Fig. 4 (b) Result of parallel Pipe classification techniques

3

INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN TECHNOLOGY, „NUiCONE – 2011‟

Table II Comparison of Area of cotton crop

Classifica

tion

Method

Area

(sq

Km)

Actual

Area

(sq

Km)

Diffe

rence

(sq

Km)

%

Area

%

Differe

nce

Enhanced

Seeded

Reg

Growing

25.36 29.8

4.44

5.100

6711

4

14.8993

2886

Same algorithm is also used to classify multiple classes of

cotton and paddy. Fig. 5 shows the output in which Red

colour represents the cotton class while green represents the

paddy. The efficiency of the algorithm here is calculated

using the kappa co-efficient. Confusion matrix is prepared

and displayed in Table 3.The derived Kappa co-efficient

value is K=0.89. From table 2 and 3 we can say that the

efficiency of the algorithm varies between 85 to 90%

depending upon the number of signature sites per class.

Fig. 5 Enhanced SRG for 2 classes

Table III Confusion Matrix

Class Cotton Paddy

Cotton 46168 1537

Paddy 2562 28020

V. APPLICATION

The proposed method is used to identify a particular type

of feature from an image. Different types of signatures are

generated for different types of features in an image. The

algorithm requires few seed points; hence it is suitable to

serve interactive, real time and on-the-fly image classification

concepts. Once the image is classified, physical dimensions

like length, perimeter, area etc. can be derived. Same

signatures can also be used to classify other satellite images

having same temporal period and satellite path. The same

method can be replicated to extract multiple types of classes

like soil, water bodies, wasteland, land use etc.

VI. LIMITATION

The efficiency of this algorithm depends upon the

signature data. Hence it is required to collect the ground truth

data from at least two to three sites. Sufficient amount of seed

points improve the efficiency of the algorithm.

VII. CONCLUSION

The proposed algorithm requires less signature sites i.e.

training sites per class compared to Maximum Likelihood

(MXL) and Artificial Neural Network (ANN). The efficiency

is better than parallel piped classification technique and

comparable with MXL and ANN. It is faster than ANN.

VIII. ACKNOWLEDGEMENT

The authors would like to thank Mr. T .P. Singh, The Director, BISAG for his inspiration and motivation. The

authors would also like to thank to Mr. Apoorva Dalvadi,

BISAG for providing required resource datasets.

IX. REFERENCES

[1] Jianping Fan, Guihua Zeng , Mathurin Body , Mohand-Said

Hacid, “Seeded region growing: an extensive and comparative

study”, Elsevier B.V. , Pattern Recognition Letters 26 , 2005

[2] Rolf Adams and Leanne Bischo, “Seeded region growing”, IEEE

transactions on pattern analysis and machine intelligence, vol. 16,

no. 6, June 1994

[3] Sanghoon Lee, “Change detection in land-cover pattern using

region growing segmentation and fuzzy classification”, IEEE

[4] Robert M. Haralick, M.C. Zhang, and Roger W. Ehrich ,

“Dynamic programming approach for context classification using

the Markov random field”

[5] Bryant, J. 1979. “On the clustering of multi-dimensional pictorial

data”. Pattern recognition, vol. 11, pp. 115-125

[6] Deng, Y., Manjunath, B.S., 2001. “Unsupervised segmentation of

color–texture regions in image and video”. IEEE Trans. Pattern

Anal. Machine Intell. 23 (8), 800–810.

[7] Fan, J., Yau, D.K.Y., Elmagarmid, A.K., Aref, W.G., 2001a.

“Automatic image segmentation by integrating color-based

extraction and seeded region growing”. IEEE Trans. Image

Process. 10 (10), 1454–1466.

[8] Qiyao Yu and David A. Clausi, Senior Member, IEEE, "IRGS:

Image segmentation using edge penalties and region growing".

ieee transactions on pattern analysis and machine intelligence,

vol. 30, no. 12, December 2008

[9] A.K. Jain and R.C. Dubes, Algorithms for clustering data.

Prentice Hall, 1988.

[10] John A. Richards · Xiuping Jia, "Remote sensing digital image

analysis an introduction", 4th Edition.

[11] http://en.wikipedia.org/wiki/Region_growing

4

International Journal of Scientific & Engineering Research, Volume 3, Issue 9, September-2012 1 ISSN 2229-5518

Decision Support System for Industrial Estates using Geo Spatial Technology

1Krunal Patel,

1Manoj Pandya,

2Dr. N.N. Jani

Abstract – To meet the need of various administrative operations such as estate infrastructure planning and management, allotment and

regulation of the industrial plots, establishing new estates at various locations, Geo Spatial Technology is the key solution. Geo spatial

technology covers spatial dimension which facilitates to visualize outlook of industrial estates at dynamic geographical scale. The

integration of domain Knowledge with Geo-spatial datasets and technology leads to successful implementation of the system. System

should have capability to process and render enterprise level decisions and provide aid to plan, regulate and control land use.

Index Terms – Geo Database, Decision Support System, DSS, Oracle, Estate Management, GIS

1. INTRODUCTION

ndustrial estates are specific areas zoned for industrial

activity in which infrastructure such as roads, power, and

other utility services is provided to facilitate the growth of

industries and to minimize impacts on the environment [1].

The effective use of Geo ICT leads to better planning and

monitoring of geographical entities. To leverage the potential

of Geo ICT, a Web based geo-spatial industrial estate

management system is developed that allows Selection of any

estate of Gujarat State, Display of estate details like – plot

details, infrastructure facility, Utilities, Amenities, Allotment

status of plot, Property Account details, Overlay of revenue

maps and satellite image on the estate map, Customized

query and analysis functions for the Planning, Land and

Engineering divisions of GIDC, Linkage with the Oracle

database to provide up-to-date data, on the land transactions

and interactive decisions through report generation

mechanism.

2 RATIONALE OF THE MODEL

Industrial Estates have led to the development of large

urban regions especially in the States of Gujarat and

Maharashtra, wherein large-scale city/ town development has

taken place [2]. Effective development can be achieved by

implementing Geographical Information System (GIS). The

Maps improve decision-making capabilities. At user interface

level, the most essential for the stakeholders is that MIS and

GIS should be incorporated into a single application so that

the user has a single interface with which to interact. Spatial

dimension allows harnessing various analyses like visibility,

buffer, intersection, union etc. Effective decisions can be

carried out by leveraging the potential of GIS. The system

should be efficient, scalable and cost-effective.

3 DATABASE PLATFORMS

Database is a foundation for various kinds of research

activity and projects. In this paper a scenario of Gujarat

Industrial Development Corporation (GIDC) is considered

where department’s industrial estate database is stored on

Oracle database. There are several database platforms which

can be scaled for enterprise solutions. The geo database of

industrial estate is warehoused in Microsoft® SQL Server 2005

as the solution has been developed with Microsoft DOT NET

2005 which has better performance with native platform. The

GIS system used in this system is an indigenous and low cost.

Fig: 1 A synoptic view of industrial Estate System

I

————————————————

Manoj Pandya and Krunal Patel are currently working as a senior project scientist in Bhaskaracharya Institute for Space Applications and Geo-Informatics (BISAG), Gandhinagar, Gujarat, India PIN 382007 E-mail: [email protected], [email protected]

Dr. N.N. Jani is affiliated with S.K. patel Institute, Gandhinagar, Gujarat, India, E-mail: [email protected]


4 METHODOLOGIES

Spatial database creation is a vital step where interactive

decisions are required like finding the associatively of

Industrial Plots, buffer analysis and hence propose site

suitability based on custom criteria. Spatial entities are

digitized on satellite imagery of cartosat data having 2.5 meter

resolution. Cadastral maps have also been co-registered with

satellite imagery.

Fig: 2 Workflow of the System

Spatial database:

To construct spatial data, cartosat satellite imagery having

resolution of 2.5 meters is used for creating base maps

including industrial plots. Departmental data is linked with

the spatial database to make a dynamic seamless information

matrix. Spatial dataset like Location of Industrial estates,

Layout of Industrial estates, Plot-wise details like Unsold

notes; Taxation; Maintenance etc., Layout of Utilities like

Industrial Zones, Water Supply System, Waste Water

Treatment System Plant, Solid Waste Management, Drainage,

Power Supply etc., Infrastructure facilities, Super imposing

Revenue map, Amenities etc. are derived.

5 DECISION SUPPORT SYSTEM (DSS)

DSS is knowledge based system that serves the

management, operations, and planning levels of an

organization and helps to make decisions, which may be

rapidly changing and not easily specified in advance [3].

Web Interface has authentication condition to allow specific

users to access the portal. The portal has in built query builder

to process user’s custom criteria. In this scenario, bridging

between spatial and aspatial information has lead to an

enterprise level solution by facilitating certain tools like Daily

land data creation, MIS report generation (Monthly,

Quarterly, Vacant Plot List, etc.), Auto notice generation,

Updation of regional office transactions to central database

server, Receipt and pay order generation, On-demand auto-

generation of Offer letter (Lease deed, Scrutiny report, etc.),

Selective dispatch of generated reports (eMail and/or hard

copy), Report Generation etc.

Satellite Imagery and geographical entities are created at

Bhaskaracharya Institute for Space Applications and Geo-

Informatics (BISAG).

Fig: 3 Digitized layout map of Viramgam GIDC estate overlaid on

satellite imagery

Web interface has facility to super impose multiple

amenities, selecting interactive criteria based plots etc. on

industrial plots as shown in below figure.

Fig 4: An example of Criteria for Outstanding amount >= 500000

By executing the criteria of Outstanding Amount >= ‘50000’,

plot fields which satisfy the criteria will be highlighted with

yellow color.

Fig 5: Super-imposing BORE and E.S.R. on Plot of Industrial Estates


By super-imposing BORE location and E.S.R. location, it is

easy to find the source of water. Nearest source of water

supply using buffer analysis can be derived easily.

6 APPLICATIONS

The proposed methodology is capable to converge and

extract datasets from the given data warehouse. Real time

Decisions can be taken by implementing GIS engine which

considers spatial dimension. The system provides aid to plan

to regulate and control land use in the vicinity of the estate

providing buffer zones around the estates. Decision makers

can recommend land use controls around the estates for

controlling and minimizing adverse environmental impacts,

Recommending necessary effluent treatment and waste

disposal facilities and other needed abatement infrastructures

needed to be commonly used by all industries of the estate,

Monitoring of the various planning, allotment details of the

aspect etc.

7 LIMITATIONS

Huge data in the form of spatial layers is process oriented.

The proposed methodology can be optimized by introducing

distributed computing. Parallel Processing of tasks can lead to

improve the efficiency to produce spatial and aspatial output

from complex user defined queries.

8 CONCLUSIONS

The proposed solution is scalable to enterprise level.

Industrial Estates are crucial for financial growth of any

nation. Geo spatial technology can not only be useful for

better planning but also useful for better management of

assets in industries. Convergence of MIS and GIS technologies

can provide near real time information resulting in efficient

and effective decision support system (DSS) to help multiple

areas like Health, Defense, and Disaster etc.

ACKNOWLEDGEMENTS

The authors would like to thank to the Director, BISAG, T .P.

Singh and for his inspiration and motivation and Manager

S&A, GIDC, Mr. H V Bagtharia for providing valuable

technical support.

REFERENCES

[1] http://www1.ifc.org/wps/wcm/connect/72b3340048855c0c8ad4d

a6a6515bb18/industrialestates_PPAH.pdf?MOD=AJPERES&CA

CHEID=72b3340048855c0c8ad4da6a6515bb18.

[2] http://cg.gov.in/opportunities/Industrial%20Estates.pdf

[3] http://en.wikipedia.org/wiki/Decision_support_system

[4] Hettige M, Martin P, Singh M, Wheeler D (1995). The

Industrial Pollution Projection System, Policy Research

Department, Policy Research Working Paper 1431. World Bank.

[5] Smith VK, Andrès JE (1996). Environmental and Trade Policies:

Some Methodological Lessons. Environ. Develop Econ., 1(1):

19-40.

[6] Townroe P.M. (1972) Some behavioural considerations in the

industrial location decision Reional. Studies 6, 261-272

[7] Smith, D.M. (1976) Industrial Location: An Economic

Geographical Analysis. New York

[8] Visser, E.J. (2009), The complementary dynamic effects of

clusters and networks for Industry Innovation

[9] Weber, A. (1999), Theory of the Location of Industries, Chicago,

IL, University of Chicago Press.

[10] World Commission on Environment and Development 1987.

Our Common Future. Oxford: Oxford University Press.

[11] United Nations Environment Program 198 1. Guidelines for

Assessing Industrial Impact and Environmental Criteria for the

siting of Industry.

[12] Mohan. M.. Singh. M.P. and T.S. Panwar 1993. Industrial

Growth and Management in Urban Areas. In. Heptulla. N.

(ed.) Environmental Protection in Developing Countries. New

Delhi: Oxford & IBH Publishing Co. PVT. Ltd. pp 29-34.

INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN ENGINEERING, TECHNOLOGY AND SCIENCES (IJ-CA-ETS)

ISSN: 0974-3588 | JULY’11- DEC’11 | Volume: 4 Issue 2 |

SUPERIMPOSITION OF ADAPTED GEOGRAPHICAL DATASETS ON GOOGLE MAP

1 MANOJ PANDYA, 2 JAY DAVE, 3 NIRAV SUTHAR, 4 PARTH BHAGAT, 5 DR. BIJENDRA AGRAWAL

1, 2, 3, 4 Bhaskaracharya Institute for Space Applications and Geo-Informatics, Gandhinagar, Gujarat 382007, India

[email protected] 5 VJKM Institute, Vadu, Kadi, Mahesana, Gujarat, India

ABSTRACT: The map navigation has become an essential means nowadays. In the world of web based GIS Google has started these services since February 2005. Google Maps APIs let the user put Google Maps in the web page. One has to undergo problems in certain scenario when the data is in ESRI© shapefile format and having different map projection. As far as vector data are concerned, industry has adopted the primitive file format that is provided by ESRI© since 1994. Google Maps APIs takes XML file as input which is then deployed on Google Maps with the help of API classes. So, the user has to first convert the shape file into XML file. To resolve certain problems, software has been developed. The application takes ESRI standardized shape file as an input (i.e. Combined file with extension *.shp, *.shx & *.dbf) file & gives the output as a well defined XML file. This XML file then can be placed in the web application & selecting that file from the web application the resulting XML file will be plotted onto Google Maps window with the help of Google Maps APIs. Keywords: Google API, Map Projection, Superimposition vector data, geo-referenced data overlay, GIS. 1. INTRODUCTION Google maps outfit geo-referenced satellite imagery and several vector data layers like road, river, city etc. In certain scenario, it is essential to superimpose user defined data layers like administrative boundary; cadastral layers etc. over Google maps. Different organizations stockpile data in different map projections and datum. Each map projection has certain parameters like central meridian, false easting, false northing, scale factor etc. If the vector geometry vertices points are superimposed without run time transformation, the vector may be superimposed with shifting in the edges. On the fly map projection transformation is required to convert current map projection parameters to the equivalent Latitude and Longitude 2. SYSTEM ARCHITECTURE In general practice, Google serves Map to the client browser using Asynchronous JavaScript And XML (AJAX). Custom information in the form of XML or KML can be used to pin point user defined location on Google Map. XML and KML consists well defined file format. Their specification is an open standard. The enterprise architecture uses database servers like Oracle©, SQL© Server etc. The spatial data can be stored as binary object or spatial geometry in the enterprise Database server.

Figure 1. Map projection Architecture

3. MAP PROJECTION: A map projection is any method of representing the surface of a sphere or other three-dimensional body on a plane. Map projections can be constructed to preserve one or more properties like area, shape, distance etc. though not all of them simultaneously. At the same time, geographical entities can be projected in a single map projection. To overlay different projection geographical entities, map projections and datum should be same. To overcome this problem, ‘on-the fly’ map-projection can be calculated and then overlaid on the reference map layer. 3.1. LOCATION BASED SERVICES (LBS) LBS include services to identify a location of a person or object. LBS utilize Google Maps as the Google API renders services in customized and

Page 133



simplified fashion. Hence Google Map services are accepted in industries.

Organizations and institutions are accustomed with conventional shape file format. Every organizations and institutions store geographic information in their own map projection. The issue comes when the geometry is superimposed on Google maps as the co-ordinate reference System (CRS) is geographical where the location is measure in latitude and longitude rather than the measurements in easting and northing for local projection system. If the geographical entities overlaid without applying on-the-fly map projection, shifting in map extent is raised that creates trouble while super-imposition process. The proposed solution is to convert map projection using mathematical model. 4. MAP PROJECTION MATH-MODEL Mathematical formulae can be derived to transform one type of map projection system in to the other one For Google maps, geographic map projection is used. The mathematical model converts Transverse Mercator projection coordinates ( N , E ) to their geographic equivalents. Projection Symbols used in the Math-Model:

Map Projections can be modeled in the form of mathematical formulae and equations. Universal Transverse Mercator (UTM) or Transvers Mercator (TM) projections are generally used in organizations. The conversion equations are divided into two parts I. Projection Constats II. Transverse Mercator (TM) to Geographic

I. Projection Constants: Several additional parameters need to be computed before transformations can be undertaken. It is a function of various parameters. These parameters are constant for a projection.

mo is obtained by evaluating m using ∅0 The conversion of Transverse Mercator projection

coordinates (N, E) to geographic coordinates ( , ) is achieved in several steps. First determine N’, m’, n, G, σ , ∅’ II. Transverse Mercator (TM) to Geographic

It is required to determine σ’,ν’, ψ’, t’, E’ and using following formulae:

Page 134



The latitude of the computation point is then computed using:

The longitude of the computation point is determined using:

Hence the value of latitude & longitude is derived. 5. XML AS TRANSITIONAL STORAGE Extensible Markup Language (XML) can be used to store vertices of geographical datasets as intermediate storage. This is required as the quantum of vertices is

very large. The sample XML file which contains tag for marker, polyline and polygon. This XML file can be overlaid on Google Map. Hence XML can store geometry & attributes that can be parsed to display geometry on Google map. <markers> <marker lat="43.65654" lng="-79.90138" label="Marker One"> <infowindow><![CDATA[ Some stuff to display in the<br>First Info Window ]]></infowindow> </marker> <marker lat="23.65654" lng="72.90138" label="Marker two"> <infowindow><![CDATA[ Some stuff to display in the<br>Second Info Window ]]></infowindow> </marker> <marker lat="24.65654" lng="74.90138" label="Marker Third"> <infowindow><![CDATA[ Some stuff to display in the<br>Third Info Window ]]></infowindow> </marker> <line colour="#FF0000" width="4" html="You clicked the red polyline"> <point lat="43.65654" lng="-79.90138" /> <point lat="43.91892" lng="-78.89231" /> <point lat="43.82589" lng="-79.10040" /> </line> <line colour="#10aa00" width="8" html="You clicked the green polyline"> <point lat="43.9078" lng="-79.0264" /> <point lat="44.1037" lng="-79.6294" /> <point lat="43.5908" lng="-79.2567" /> <point lat="44.2248" lng="-79.2567" /> <point lat="43.7119" lng="-79.6294" /> <point lat="43.9078" lng="-79.0264" /> </line> <line colour="#00dfff" width="2" html="You clicked the yellow polygon"> <point lat="23.2478" lng="73.0264" /> <point lat="24.1027" lng="72.6294" /> <point lat="23.2908" lng="75.2567" /> <point lat="24.2448" lng="70.2567" /> <point lat="23.4119" lng="62.6294" /> <point lat="23.9048" lng="71.0264" /> </line> </markers> 6. APPLICATION Super-imposition of geographical datasets in custom projections can become easier to overlay on Google maps. Without any physical transformations, the geographical layers can be overlaid and thematic

Page 135



maps can be generated. Following figure shows that organization’s data in red border color can be overlaid on Google map.

Figure 2. Overlaid geographical datsets on Google

Map (using XML datasets) 7. CONCLUSION The mathematical model provides leverage to integrate geographical datasets on a same platform from various map projections. The mathematical model can be evolved for numerous kinds of co-ordinate reference systems to carry out thematic maps for various geo-spatial analyses. 8. REFERENCES

[1] http://mathworld.wolfram.com/MercatorProjection.html [2] http://en.wikipedia.org/wiki/Mercator_projection [3] Tobler, W. R. (1962). “A classification of map projections”, Annals of the Association of American Geographers, Vol. 52, pp. 167-175. [4] http://www.gpsy.com/gpsinfo/geotoutm/ [5] Pickles, J., ed. (1995) Ground Truth: The social Implications of Geographic Information Systems, New York: Guilford Press. [6] Berthon, S., and Robinson, A. (1991) The Shape of the World: The Mapping and Discovery of the Earth. Chicago: Rand McNally. [7] http://mysite.du.edu/~jcalvert/math/mercator.htm [8] http://surveying.wb.psu.edu/sur162/UTM/UTM.htm [9] http://gisremote.blogspot.com/2008/02/about-map-projection-what-is-map.html [10] C. P. Lo & Albert K.W. Yeung (2004) “Concepts and Techniques of Geographic Information Systems” [11] David Salomon (2006) “Transformations and Projections in Computer Graphics” [12] EARL F. BURKHOLDER (2008) “THE 3-D GLOBAL SPATIAL DATA MODEL- Foundation of the Spatial Data Infrastructure”

Page 136

International Journal of Computer Applications (0975 – 8887)

Volume 52– No.19, August 2012

28

Distributed Commuting Augmented Shortest Path Finding for Geo Spatial Datasets

ABSTRACT Geo Spatial information is a large collection of datasets referring

to the real world entities. Geo Spatial information has evolved in

the last decade which led to produce a vast platform in

Government Administration, Scientific Analysis and other

various sectors especially in disaster management (DM), site

suitability of Check-dams for Irrigation Department etc. It is

required to obtain imperative geographically analyzed solutions

like finding shortest path between sources (i.e. location having

stock of food packets, clinical remedial and therapeutic kits, etc.)

to the destination (i.e. location where disaster emerges).

Departmental data (i.e. village Maps) covers detailed spatial and

attribute information compared to the readily available sources.

Hence custom solutions based on Information Technology are

required to be constructed, processed efficiently and quickly to

outfit seamless performance that facilitates in mission critical

incidences.

Keywords Geo Database, Distributed Computing, Dijkstra, Shortest Path,

GIS, Hadoop, Shapefile

1. INTRODUCTION

Accidents are common disasters that cause much damage to

people’s lives. Due to population growth and the accompanying

development of extensive infrastructures has greatly increased

people's financial condition. This has lead to use large amount of

vehicles over recent years. In case of accidents, it is a critical

situation to get information about the location of the incidence, to

reach at that place and supply necessary therapeutic kit and

clinical remedies. Crucial need is to find out shortest path

between the location of incidence and the nearest location

providing clinical remedies or hospitalization. Available sources

do not cover complete geo spatial details of road network

especially in rural area. Even though detailed information is

available with the department, it is difficult to utilize it in the

time of disaster due to poor processing performance as the server

becomes loaded with handling multiple concurrent requests.

Vital issue is to make a timely and effective decision.

2. DISTRIBUTED DATABASES

A distributed database is a database in which storage devices

are not all attached to a common processing unit such as the

C.P.U. It may be stored in multiple computers located in the

same physical location, or may be dispersed over a network of

interconnected computers. A database may consist two or more

data files located at different sites on a computer network.

Different users can access it without interfering with one another.

However, the DBMS periodically synchronize the scattered

databases to make sure that they all have consistent data.

3. GIS NETWORK MODEL

In GIS systems, networks are modeled as points (For e.g:

street intersections, switches, water valves) and lines (For e.g:

Streets, transmission lines, pipes). Network topological

relationships define how lines connect with each other at nodes.

For the purpose of network analysis it is also useful to define

rules about how flows can move through a network.

A network is defined as a graph G = (N, A) consisting of a set

N of nodes and a set A of arcs with associated numerical values,

such as the number of nodes, n=|N|, the number of arcs, m=|A|,

and the length of an arc connecting nodes i and j, denoted as l (i,

j).

Figure 1: A typical Road Network

As shown in the above figure 1, each vertex is marked with

black color. The segment length or weight is marked with red

color near the center of the segment.

A road network as shown in above figure 1 can be interpreted in

the form of attributes that is compatible to dijkstra as shown in

the below table.

Table 1: Input tabular information for dijkstra Fnode_ Tnode_ Length

1 2 4

2 1 4

2 3 4

3 2 4

2 5 5

5 2 5

5 4 8

4 5 8

4 6 6

6 4 6

3 4 8

4 3 8

Manoj Pandya BISAG

Gandhinagar, Gujarat India

Abdul Zummarwala BISAG


Prashant Chauhan BISAG




29

It is required to pre-process shortest path from one node to other

node with all possible permutation and combinations. The graph

is non-directional. Hence both combinations of Fnode_ and

Tnode_ are to be considered as shown in the following table.

Table 2: Dijkstra processed path for each Fnode_ and

Tnode_

Fnode_ Tnode_ path

1 2 1,2

2 1 2,1

1 9 1,2,3,9

9 1 9,3,2,1

1 6 1,2,5,4,6

6 1 6,4,5,2,1

1 4 1,2,5,4

4 1 4,5,2,1

6 9 6,7,8,9

9 6 9,8,7,6

5 8 5,2,3,9,8

8 5 8,9,3,2,5

4. SHORTEST PATH

In graph theory, the shortest path problem is the problem of

finding a path between two vertices (or nodes) in a graph such

that the sum of the weights of its constituent edges is minimized.

An example is finding the quickest way to get from one location

to another on a road map; in this case, the vertices represent

locations and the edges represent segments of road and are

weighted by the time needed to travel that segment.

For undirected graphs, the shortest path problem can be

formally defined as follows. Given a weighted graph (that is, a

set V of vertices, a set E of edges, and a real-valued weight

function f : E → R), and elements v and v' of V, find a path P (a

sequence of edges) from v to a v' of V so that

is minimal among all paths connecting v to v'.

Shortest paths from one (source) node to all other nodes on a

network are normally referred as one-to-all shortest paths.

Shortest paths from one source node to a subset of the nodes on a

network can be defined as one-to-some shortest paths. Shortest

paths from every node to every other node on a network are

normally called all-to-all shortest paths. When the goal is to

obtain a one-to-one shortest path or one-to-some shortest paths,

the Dijkstra algorithm offers some advantages because it can be

terminated as soon as the shortest path distance to the destination

node is obtained. However, there are other algorithms that can

also be used for finding shortest path in various scenarios.

Dijkstra's algorithm: solves the single-source shortest path

problems.

Bellman–Ford algorithm: solves the single-source problem

if edge weights may be negative.

A* search algorithm: solves for single pair shortest path

using heuristics to try to speed up the search.

Floyd–Warshall algorithm: solves all pairs shortest paths.

Johnson's algorithm: solves all pairs shortest paths, and may

be faster than Floyd–Warshall on sparse graphs.

Dijkstra’s algorithm to find out a shortest path between source

and destination is commonly practiced and easy to implement.

Dijkstra’s algorithm is used in this paper to demonstrate a

prototype model as a code snippet.

Code Snippet:

-- Run the algorithm until we decide that we are finished

WHILE 1 = 1

BEGIN

-- Reset the variable, so we can detect getting no records in the

next step.

SELECT @FromNode = NULL

-- Select the Id and current estimate for a node not done, with

the lowest estimate.

SELECT TOP 1 @FromNode = Id, @CurrentEstimate =

Estimate

FROM #Nodes WHERE Done = 0 AND Estimate <

9999999.999

ORDER BY Estimate

UPDATE #Nodes SET Done = 1 WHERE Id =

@FromNode

-- Update the estimates to all neighbour node of this one (all the

nodes

--there are edges to from this node). Only update the estimate if

the new

-- proposal (to go via the current node) is better (lower).

UPDATE #Nodes

SET Estimate = @CurrentEstimate +

e.Weight, Predecessor = @FromNode

FROM #Nodes n INNER JOIN dbo.Edge e ON n.Id =

e.ToNode

WHERE Done = 0 AND e.FromNode = @FromNode AND

(@CurrentEstimate + e.Weight) < n.Estimate

END;

The code can be compiled as a dynamic link library (dll) to

integrate in a program to solve shortest path for given datasets.

The executed algorithm stores each vertex as a path in a defined

format.

5. DATA MODEL

Spatial dataset can be stored either in normalized or binary

format. Enterprise geo database like Microsoft SQL Server,

Oracle etc. can be used to process large amount of data.

The Dijsktra can’t be directly implemented in GIS datasets.

The Spatial data format is not known to Dijkstra as it can

understand the data in the form of FROM_NODE, TO_NODE

and WEIGHT or LENGTH of the segment. It is required to

calculate each node and edge (segment) length in GIS dataset and

store in the tabular data table.

At higher hierarchical level, there is a great density of roads

covering national Highway, State Highway, Village Road, ODR,

MDR etc. Dijsktra performs all possible permutations and

combinations for finding the shortest path.

For each request, dijkstra algorithm has to process multiple

node and edges that can affect the performance of the system

when considering the concurrent access of multiple users.

6. PRE-PROCESSING NETWORK

ROUTES

The task of finding shortest path for large network is a time

consuming process. It is memory and process intensive. In web

based interface, multiple concurrent users accessing various

combinations of nodes require high end servers. In case of

enterprise environment, single server can’t serve concurrent

Eq. 1



30

requests at the optimal level. To optimize server, there are

several optimization methods available.

Divide and Conquer: In this technique, the task is divided

into several fragments. These fragments are processed in

different process units and after the completion of task, they are

integrated. The information can be either in database or in file

format.

Database is fragmented into several parts by taking ratio of

number of records to the number of processors in a machine. A

file is divided by automated tool (Apache Hadoop) that handles

the consistency and integrity of information.[6]

7. VIRTUALIZATION

It is the creation of a virtual (rather than actual) version of a

hardware platform, operating system (OS), storage device, or

network resources. Virtualization reduces system integration and

administration costs by maintaining a common software baseline

across multiple computers in an organization. A virtual machine

is subjectively a complete machine (or very close), but

objectively merely a set of files and running programs on an

actual, physical machine. [10]

Figure 2: A diagram showing the layout for virtualization.

8. TASK DISTRIBUTION USING HADOOP The Apache Hadoop is an open source software framework

that supports data-intensive distributed applications. [5] It

enables applications to work with thousands of computational

independent computers and petabytes of data. Hadoop works on

file based architecture. Hadoop provides a distributed file system

(HDFS) that stores data on the compute nodes, providing very

high aggregate bandwidth across the cluster. Both Map/Reduce

and the distributed file system are designed so that node failures

are automatically handled by the framework.

Map/Reduce is a programming paradigm that was made

popular by Google where in a task is divided in to small portions

and distributed to a large number of nodes for processing (map),

and the results are then summarized in to the final answer

(reduce). [1] Google and Yahoo use this for their search engine

technology. By the execution of shortest path algorithm using

hadoop map/ reduce processing mechanism, the hardware

resources (CPU) are fully utilized. That is capable to handle large

size of files with HDFS. Redundancy and replication of files are

maintained by Hadoop automatically as soon as the task

executes. [6]

Figure 3: A synoptic view of Hadoop processing mechanism

9. EXPERIMENTATION RESULTS

The edges cross the count of more than 1 lac records for

complete dataset. The dataset corresponds to the road network of

Gujarat state, India provided by BISAG. Shortest Path algorithm

is executed with HDFS.

Figure 4: Snapshot of system showing hadoop map-reduce

functionalities.

The output is obtained using Distributed processing over Hadoop

framework.

Table 3: length, Fnode_, Tnode_ as input and Length,

Path as output

Sr FNODE_ TNODE_

Length

(m) Path

1 21626 1 753286.4

21626,21640,21930,

...,18,22,3,2,1

2 21626 135 736610.7

21626,21640,21930,

...,281,232,135

3 21626 205 727586.5

21626,21640,21930,

...,198,206,205

Table 3 represents the processed nodes and their respective

length and shortest path for possible permutation and

combinations.

GIS map is the graphical representation of entities and objects.

The derived output as shown in the figure 5 is linked with GIS

dataset. The red colored segments correspond to the shortest path

between two node points as show in the below figure.



31

Figure 5: The map highlighted with red colour represents the

shortest path between given two points. [3]

Figure: 6 Nodes vs. Time Chart

It is inferred from the results that the division of tasks to multiple

nodes decreases the computation time upto a certain extent as the

division into tasks and collection of results leads to overhead

upon the hadoop framework. Both the type of applications

declined to yield better results upon further division and increase

in number of nodes.[13]

Table: 4 Time taken to find shortest path by single &

multiple threads.

Number of

Nodes (Single

Threaded)

Time (Single

Threaded)

Number

of Nodes

(Multithre

aded)

Time

(Multithre

aded)

1 73 1 54

2 64 2 47

3 53 3 43

4 44 4 35

5 36 5 30

6 24 6 25

7 16 7 22

8 14 8 11

9 12 9 10

10 10 10 11

10.APPLICATIONS

Distributed Computing using Hadoop can be used in

various areas like rendering 3D imagery, computing the

energy in a system in a molecular model, data mining and

in scientific analysis.

11.LIMITATIONS

Distributed Computing using Hadoop is suitable for

large sized files especially for scientific data analysis but

for small sized files the performance is negligible as

compared to the time required for the setup of distributed

environment.

12.CONCLUSIONS

Applications where Mission critical solutions are

required with real time immediate response, Distributed

Computing with Apache Hadoop is suitable platform.

Hadoop can also be used with personal computers and low

end servers.

13.ACKNOWLEDGEMENTS

The authors would like to thank to the Director, BISAG, T

.P. Singh for his inspiration and motivation.

REFERENCES

[1] Lam, Chuck (July 28, 2010). Hadoop in Action (1st Ed.).

Manning Publications. ISBN 1-935182-19-6.

[2] J. Dean and S. Ghemawat. MapReduce: Simplified Data

Processing on Large Clusters. OSDI ’04, pages 137–150.

[3] Courtesy form BISAG

[4] Z. Mahmood and R. Hill (eds.), Cloud Computing for

Enterprise Architectures, Computer Communications and

Networks, DOI 10.1007/978-1-4471-2236-4_2, © Springer-

Verlag London Limited 2011

[5] VerticaInputFormat. http://www.vertica.com/mapreduce

[6] Tom White. Hadoop:The Definitive Guide[M]. United

States of America: O’Reilly Media, Inc. 2009.

[7] Jeffrey Dean, Sanjay Ghemawat. MapReduce:Simplied data

processing on large clusters[C]. Proceedings of the 6th

Symposium on Operating System Design and

Implementation. New York: ACM Press. 2004:137-150.

[8] Raghavendra, C. S., Kumar, V. K. P. and Hariri S.,

“Reliability analysis in Distributed systems”, IEEE

Transactions on Computers, Volume 37, Issue 3, Pg. 352 –

358, March 1988.

[9] Kin-Sun-Wah and McAlister D.F, Reliability optimization

of computer communication network, IEEE Trans. On

Reliability, Vol.37, No. 2, Pp.275- 287 (1998) Dccember.

[10] GPFS in the Cloud: Storage Virtualization with NPIV on

IBM System p and IBM System Storage DS5300.

[11] Bo Peng, B. C. (2009). Implementation Issues of A Cloud

Computing Platform. IEEE , 8.

[12] R. Agrawal and J.C. Shafer , "Parallel Mining of

Association Rules," Distributed Systems Online March

2004.

[13] D.W. Cheung , et al., "Efficient Mining of Association

Rules in Distributed Databases,"IEEE Trans. Knowledge

and Data Eng., vol. 8, no. 6, 1996,pp.911-922.

International Journal Of Scientific & Engineering Research, Volume 3, Issue 3, March‐2012 1 ISSN 2229‐5518

Geo-processing using Oracle Spatial Geo Database

1Manoj Pandya, 1Pooja Nair, 1Parthi Gandhi, 1Shubhada Pareek, 2Dr. Bijendra Agarwal

Abstract - In the last two decades, Geographic Information Systems (GIS) has been widely developed and is applied to various fields, which include resources management, environment management, prevention of disaster, planning area, education and national defense etc in many sections and subjects. However traditional GIS application systems can’t interact with each other and was considered as an isolated island of information facing problem of interoperability. This is because of different data models adopted by different GIS applications. The spatial data interoperability concept brought forward a huge amount of geospatial information for efficacious management, sharing and increase in value usage. OGC standards are developed in a unique consensus process supported by the OGC's industry, government and academic members to enable geo-processing technologies to interoperate, or "plug and play".

Index Terms – Geo Database, Geo processing, Oracle Spatial, OGC, 3D Rendering, Extrude, GIS

1. INTRODUCTION ithin an enterprise, various software packages are used and in many cases they can’t talk with each other. This

situation is ubiquitous and also exists between different departments. An Open GIS Consortium (OGC) is an international industry consortium of more than 400 companies, government agencies and universities participating in a consensus process to develop publicly available interface standards. Within a local government or enterprise, the spatial data is centrally stored. Interoperable metadata enables organizations to use the right tool for the job while eliminating complicated data transfers and multiple copies of the same data throughout the enterprise or department. The application server is a mediation system, this model uses oracle application server as the mediation system, and through the application server the application client sends WMS or WFS request and get the map server for background application. The three‐tier structure model exposes a GIS portal which is an online GIS for outside of the department. Any client can request the server if it accords with WMS or WFS specification.

2 DATABASE PLATFORMS There are several database platforms which provide the

spatial support. Oracle® is matured to support OGC standards. However there are other database platforms

available like Microsoft® SQL Server 2008, IBM DB2 etc. Open Source database platforms are also available in the realm of proprietary products. MySQL and PostGres with PostGIS extension can server large scale Geo‐Database.

3 ORACLE SPATIAL DATA MODEL Oracle® offers a spatial data model that provides basic

enterprise access to geospatial information. This spatial data model provides a standard structure for point, line, and area features. Oracle Spatial 10g XE (Express Edition) is used in this prototype model. This database platform is free to use upto 4 GB of data. However enterprise version has no limit. Basic DDL (CREATE, ALTER, DROP) and DML (INSERT, UPDATE, DELETE) statements can be used for Database Management. With Spatial, the geometric description of a spatial object is stored in a single row, in a single column of object type SDO_GEOMETRY in a user‐defined table.

Fig: 1 Schema for feature tables using pre-defined data types

The GEOMETRY_COLUMNS table describes the available feature tables and their Geometry properties.

W

———————————————— • Manoj Pandya is currently working as a senior project scientist in

Bhaskaracharya Institute for Space Applications and Geo-Informatics (BISAG), Gandhinagar, Gujarat, India PIN 382007 E-mail: [email protected]

• Pooja Nair, Parthi Gandhi, Shubhada Pareek are pursuing project at BISAG

• Dr. Bijendra Agarwal is Head Of Department (HOD), VJMS College, VADU, Kadi, Mahesana, Gujarat, India


GEOMETRY_COLUMNS table provides information on the feature table, spatial reference, geometry type, and co‐ordinate dimension for each Geometry column in the database. CREATE TABLE GEOMETRY_COLUMNS ( F_TABLE_CATALOG CHARACTER VARYING NOT NULL, F_TABLE_SCHEMA CHARACTER VARYING NOT NULL, F_TABLE_NAME CHARACTER VARYING NOT NULL, F_GEOMETRY_COLUMN CHARACTER VARYING NOT NULL, ……………….. ……………….. ……………….. GEOMETRY_TYPE INTEGER, COORD_DIMENSION INTEGER, MAX_PPR INTEGER, SRID INTEGER NOT NULL REFERENCES SPATIAL_REF_SYS, CONSTRAINT GC_PK PRIMARY KEY (F_TABLE_CATALOG, F_TABLE_SCHEMA, F_TABLE_NAME, F_GEOMETRY_COLUMN) ) Example 1: Creation of Geometry Table The SPATIAL_REF_SYS table describes the coordinate system and transformations for Geometry. CREATE TABLE SPATIAL_REF_SYS ( SRID INTEGER NOT NULL PRIMARY KEY, AUTH_NAME CHARACTER VARYING, AUTH_SRID INTEGER, SRTEXT CHARACTER VARYING(2048) ) Example 2: Creation of Spatial Reference System The FEATURE TABLE stores a collection of features.

Metadata is required to store ancillary information about Geo database. Metadata is data about data. It allows common information to be stored at a common level. The attributes like owner name, version number, description of layer, remarks, annotation etc are stored in metadata. CREATE TABLE ANNOTATION_TEXT_METADATA AS { F_TABLE_CATALOG AS CHARACTER VARYING NOT NULL, ………………… …………………

………………… A_TEXT_DEFAULT_MAP_BASE_SCALE AS CHARACTER VARYING, A_TEXT_DEFAULT_ATTRIBUTES AS CHARACTER VARYING} Example 3: Creation of Metadata The geometry metadata describing the dimensions, lower and upper bounds and tolerance in each dimension is stored in a global table owned by MDSYS (which users should never directly update). Each Spatial user has the following views available in the schema associated with that user: USER_SDO_GEOM_METADATA contains metadata information for all spatial tables owned by the user (schema). This is the only view that you can update, and it is the one in which Spatial users must insert metadata related to spatial tables. ALL_SDO_GEOM_METADATA contains meta‐data information for all spatial tables on which the user has SELECT permission. Spatial users are responsible for populating these views. For each spatial column, you must insert an appropriate row into the USER_SDO_GEOM_METADATA view. Metadata prototype is given below. INSERT INTO spatial_ref_sys VALUES (101, ʹPOSCʹ, 32214, ʹPROJCS[ʺUTM_ZONE_14Nʺ, GEOGCS[ʺWorld Geodetic System 72ʺ, DATUM[ʺWGS_72ʺ, ELLIPSOID[ʺNWL_10Dʺ, 6378135, 298.26]], PRIMEM[ʺGreenwichʺ, 0], UNIT[ʺMeterʺ,1.0]], PROJECTION[ʺTransverse_Mercatorʺ], PARAMETER[ʺFalse_Eastingʺ, 500000.0], PARAMETER[ʺFalse_Northingʺ, 0.0], PARAMETER[ʺCentral_Meridianʺ, ‐99.0], PARAMETER[ʺScale_Factorʺ, 0.9996], PARAMETER[ʺLatitude_of_originʺ, 0.0], UNIT[ʺMeterʺ, 1.0]]ʹ); Example 4: Inserting Map Projection parameters

4 RATIONALE OF THE MODEL

This model is scalable to obtain certain objectives. The model implements centralization enterprise spatial data. If database is centralized, it will be more convenient for data manager to update, maintain and distribute. The model enables organizations to use the right tool for the job while eliminating complicated data transfers and multiple copies of the same data throughout the enterprise. The model also helps to implement data sharing between enterprises, especially as data sources for local emergency department. Because emergency department can access and maintain the all data they need. After all, data redundancy can be reduced.


Fig: 2 Data translation component model

The oracle spatial data server can be established at centralized server. Different users can access Geo Spatial information through web browser. The application server is the main tier in internet model, which includes GIS application server, GIS application manager server and application connector. The application in this scenario is developed in Java using NetBeans 6.8 Interactive Development Environment (IDE).

5 GEO PROCESSING Geo‐processing is a GIS operation used to manipulate

spatial data. A typical geo‐processing operation takes an input dataset, performs an operation on that dataset, and returns the result of the operation as an output dataset. Common geo‐processing operations include geographic feature overlay, feature selection and analysis, topology processing, raster processing, and data conversion. Geo‐processing allows for definition, management, and analysis of information used to form decisions. Gujarat state from India is used in this example. Proposed System is developed in JAVA platform. JAVA

SERVER PAGE (JSP) serves the solution in web interface. The Code snippet is given as below for Buffer Analysis using Geo Database. CODE SNIPPET: // ASSIGN VARIABLES int sDragx=Integer.parseInt(startDragx); int sDragy=Integer.parseInt(startDragy); int eDragx=Integer.parseInt(endDragx); int eDragy=Integer.parseInt(endDragy); … BufferedImage buffimage1=null; JGeometry gmtry1,gmtry2; STRUCT dbobj1,dbobj2; Graphics2D g11,g22; Shape sing_sh=null,sh_rect=null;

ResultSet rs1=null,rsbuff=null; Statement st1=null; Connection con=newconn(); //INITIALZING MAP st1=con.createStatement(rs1.TYPE_SCROLL_SENSITIVE,rs1.CONCUR_UPDATABLE);

double boundingbox[]=boundingBox(ʺTABLE_GUJDISTRICTʺ);

buffimage1 = new BufferedImage(800,600,BufferedImage.TYPE_INT_RGB);

g11 = (Graphics2D)buffimage1.getGraphics(); g22 = (Graphics2D)g11;

g22.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);

g22.setPaint(new Color(48,0,0)); g22.fillRect(0,0,800,600); //FIND BOUND BOX VALUES OF A LAYER double

boundingbox[]=boundingBox(ʺTABLE_GUJDISTRICTʺ); //CALCULATE SCALE FACTOR sx = clientwidth / mapw; sy = clientheight / maph; // EXECUTE BUFFER STATEMENT rsbuff= st1.executeQuery(ʺselect

sdo_geom.sdo_buffer(mdys.sdo_geometry(2003,null,null,mdsys.sdo_elem_info_array(1,1003,3), mdsys.sdo_ordinate_array(ʺ + Math.min(sDragx, eDragx) + ʺ,ʺ + Math.min(sDragy, eDragy) + ʺ,ʺ + (Math.min(sDragx, eDragx)+Math.abs(sDragx ‐ eDragx)) + ʺ,ʺ + (Math.min(sDragy, eDragy) +

Math.abs(sDragy ‐ eDragy)) + ʺ)),ʺ + dist + ʺ,0.005) from ʺ + currenttable);

//DRAW GEOMETRY ON MAP dbobj2 = (STRUCT)rsbuff.getObject(1); gmtry2 = JGeometry.load(dbobj2); sing_sh=gmtry2.createShape(); g11.setColor(Color.blue); g11.setStroke(new BasicStroke(1)); g11.draw(sing_sh); dbobj1 = (STRUCT) rs1.getObject(ʺGEOMʺ); gmtry1 = JGeometry.load(dbobj1); JGeometry geomrect = new

JGeometry(2003,gmtry1.getSRID(),arrrect,ordrect); sh_rect=geomrect.createShape();

Example 5: A prototype Code in Java Server Page to create Buffer using Geo database


Fig 3(a): Buffer Creation for give Area of Interest (AOI) Data source: BISAG

3 (b) Data inside Yellow Ring (buffered)

3 (c) Data inside Red

Ring (buffered)

3 (d) Union of Two Rings (buffered)

To perform geo‐processing in the GIS system, it requires

very efficient algorithm and good processing power. It is required to find the association between two objects

in Geo ‐processing. If two objects are A and B it can be associated like A touches B or A inside B or A equals B etc. Other GIS operations can be performed like difference between two geometries, Union, Intersection.

Table 1: Various operations based on Set theory

6 TOPOLOGICAL RELATIONSHIPS Topology is a major area of mathematics concerned with

properties that are preserved under continuous deformations of objects, such as deformations that involve stretching, but no tearing or gluing. It emerged through the development of concepts from geometry and set theory, such as space, dimension, and transformation.

Fig 4 Possible association combinations between objects A and B

7 METHODOLOGY Use of oracle Geo Database can be utilized by converging

Information Communication Technology (ICT). The Geo Database communicates with Web based GIS Map engine. The


GIS map engine talks with client with authentication from Java Application Programming Interface (API).

Fig: 5 Interface to access Oracle Spatial Geometry

8 3D RENDERING Oracle spatial can extrude a Two‐Dimensional Geometry to

Three Dimension. Two‐dimensional footprints of buildings can be extruded using the EXTRUDE function in the SDO_UTIL package to erect a building on the two‐dimensional footprint by specifying the ground height and the top height for each vertex of the two‐dimensional geometry.

Fig 6(a) Example of a two-dimensional solid with the top heights and

ground heights specified and Fig 6(b) the extruded solid object SDO_UTIL.EXTRUDE ( geometry SDO_GEOMETRY, groundheights SDO_NUMBER_ARRAY, topheights SDO_NUMBER_ARRAY,

result_to_be_validated VARCHAR2 tolerance NUMBER ) RETURN SDO_GEOMETRY

Example 6: Extrude schema The parameters can be interpreted as shown below: geometry: This specifies the input two‐dimensional SDO_GEOMETRY object that needs to be extruded. groundheights: This is an array of numbers, one each for each vertex for use as the ground height (minimum z value). If only one number is specified, then all vertices get the same value (that is specified here). topheights: This is an array of numbers, one each for each vertex for use as the top height (minimum z value). If only one number is specified, then all vertices get the same value (that is specified here). result_to_be_validated: This is a character string that can be set to either ʹTRUEʹ or ʹFALSEʹ. This string informs Oracle whether to validate the resulting geometry. tolerance: This specifies the tolerance to use to validate the geometry. A prototype example extruding a Polygon to a Three‐Dimensional Solid is given below. SELECT SDO_UTIL.EXTRUDE( SDO_GEOMETRY ‐‐ first argument to validate is geometry ( 2003, ‐‐ 2‐D Polygon NULL, NULL, SDO_ELEM_INFO_ARRAY(1, 1003, 1 ‐‐ A polygon element ), SDO_ORDINATE_ARRAY (0,0, 2,0, 2,2, 0,2, 0,0) ‐‐ vertices of polygon ), SDO_NUMBER_ARRAY(‐1), ‐‐ Just 1 ground height value applied to all vertices SDO_NUMBER_ARRAY(1), ‐‐ Just 1 top height value applied to all vertices ʹFALSEʹ, ‐‐ No need to validate 0.5 ‐‐ Tolerance value ) EXTRUDED_GEOM FROM DUAL; Example 7: Extruding a sample data using SQL statement EXTRUDED_GEOM(SDO_GTYPE, SDO_SRID, SDO_POINT(X, Y, Z), SDO_ELEM_INFO, SDO_ORDINA ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐SDO_GEOMETRY( 3008, ‐‐ 3‐Dimensional Solid Type NULL, NULL, SDO_ELEM_INFO_ARRAY( 1, 1007, 1, ‐‐ Solid Element


1, 1006, 6, ‐‐ 1 Outer Composite Surface made up of 6 polygons 1, 1003, 1, ‐‐ First polygon element starts at offset 1 in SDO_ORDINATES array 16, 1003, 1, ‐‐ second polygon element starts at offset 16 31, 1003, 1, ‐‐ third polygon element starts at offset 31 46, 1003, 1, ‐‐ fourth polygon element starts at offset 46 61, 1003, 1, ‐‐ fifth polygon element starts at offset 61 76, 1003, 1), ‐‐ sixth polygon element starts at offset 76 SDO_ORDINATE_ARRAY( ‐‐ ordinates storing the vertices of the polygons 0, 0, ‐1, 0, 2, ‐1, 2, 2, ‐1, 2, 0, ‐1, 0, 0, ‐1, 0, 0, 1, 2, 0, 1, 2, 2, 1, 0, 2, 1, 0, 0, 1, 0, 0, ‐1, 2, 0, ‐1, 2, 0, 1, 0, 0, 1, 0, 0, ‐1, 2, 0, ‐1, 2, 2, ‐1, 2, 2, 1, 2, 0, 1, 2, 0, ‐1, 2, 2, ‐1, 0, 2, ‐1, 0, 2, 1, 2, 2, 1,2, 2, ‐1, 0, 2, ‐1, 0, 0, ‐1, 0, 0, 1, 0, 2, 1, 0, 2, ‐1)) Example 8: output of Extruded sample Following example shows the satellite image of a building in New York City, USA taken from Google map. The Building ground height and top height are measured in the form of latitude, longitude and altitude with Global Positioning System (GPS). The building corner points are marked and respective co‐ordinates are extruded using SDO_UTIL.EXTRUDE method.

Fig: 7(a) Building in a satellite image with edges marked with dark spots. The extruded geometry vertices returns an array that is plotted as an object as shown in the below figure.

Fig 7(b): Extruded building

9 APPLICATIONS The proposed method is used to derive new datasets from

the data warehouse. The geo‐processing kind of tasks that require very efficient algorithms can be effectively carried out by leveraging the potential of Oracle spatial geo database. 3D Elevation Model, Triangulation Irregular Network (TIN) can also be generated from geo database. Moreover, centralized data‐centric and secured applications can be developed and scaled at enterprise level.

10 LIMITATIONS The geo‐processing task and extrude task are memory

intensive. The proposed methodology can be optimized by introducing cloud based storage and routing mechanism. Parallel Processing can also optimize the performance to render the output in terms of geographical maps.

11 CONCLUSIONS The proposed solution is scalable to enterprise level. The

Spatial Data Infrastructure (SDI) can be established that can be helpful to derive spatial data and attributes from various datasets of different organizations in multiple combinations that can provide near real time information resulting in efficient and effective decision support system (DSS) to help multiple areas like Health, Defense, Disaster etc.

ACKNOWLEDGEMENTS

The authors would like to thank Mr. T .P. Singh, The Director of BISAG for his inspiration and motivation.

REFERENCES [1] Oracle Spatial User guide and Reference Release 2002,

a96630.pdf [2] OpenGIS Coordinate Transformation Service Implementation

Specification from http://www.opengeospatial.org/standards. [3] http://www.opengeospatial.org/ogc [4] http://en.wikipedia.org/wiki/Geoprocessing [5] http://en.wikipedia.org/wiki/Topology [6] Z.L. Li, R.L. Zhao, and J. Chen, ʺA generic algebra for spatial

relations,ʺ Progressing in Natural Science, vol. 12, no. 7, pp. 528‐536, July 2002.Oracle® Spatial User’s Guide and Reference Release 9.2

[7] Ravi Kothuri, Albert Godfrind, and Euro Beinat, Pro Oracle® Spatial for Oracle Database 11g, Apress, 2007

[8] C. Murray, “Oracle spatial Topology and Network Data Models,” Oracle Corporation, pp. 193‐251, 2005

[9] http://wikihelp.autodesk.com/Infr._Map_Server/enu/2012/Help/0000‐Administ0/0104‐Data_Mod104/0182‐Data_Mod182

International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 Foundation of Computer Science FCS, New York, USA Volume – No. , January 2013 – www.ijais.org

1

Detection of DDoS Attack in semantic web

Prashant Chauhan

BISAG

Gandhinagar

Gujarat, India

[email protected]

Abdul Jhummarwala

BISAG

Gandhinagar

Gujarat, India

[email protected]

Manoj Pandya

BISAG

Gandhinagar

Gujarat, India

[email protected]

ABSTRACT

We are currently living in an era where information is very

crucial and critical measure of success. Various forms of

computing are also available and now a days it is an era of

distributed computing where all the task and information are

shared among participated nodes. This type of computing can

have homogeneous or heterogeneous platform and hardware.

The concept of cloud computing and virtualization has gained

much momentum and has become a more popular phrase in

information technology. Many organizations have started

implementing these new technologies to further reduce costs

through improved machine utilization, reduced administration

time and infrastructure costs.

Cloud computing also faces challenges. One of such problem

is DDoS attack so in this paper we will focus on DDoS attack

and how to overcome from it using honeypot. For this here

open source tools and software are used.

Typical DDoS solution mechanism is single host oriented and

in this paper focused on distributed host oriented solution that

meets scalability.

General Terms

Cloud Security, Network Monitoring, Distributed Log

Processing.

Keywords

Cloud Security, DDoS, Honeypot, Hadoop.

1. INTRODUCTION In cloud computing hardware, software etc. are available to

the end user in form of services. The user can subscribe for

the required service and he will be charged on the go. Means

he needs not to pay while resources are not in use. In cloud

security we have basic three characteristics confidentiality,

integrity and availability.

Availability is one of important aspect in cloud security

stating availability of service while user want to consume but

due to some security threats availability can not be achieved.

DDoS attack is one such threat which is distributed form of

Denial of Service attack in which service is consumed by

attacker and legitimate user can not use the service.

We can find solution against DDoS attack but they are based

on single host and lacks performance so here Hadoop system

for distributed processing is used.

2. Cloud Computing In cloud computing every resource is available to end user in

form of service and user pay only for what he has used. This

kind of freedom for the user leads cloud services more

popular. Cloud computing services are separated by layer

shown in fig 1.

Fig 1 Layers of Cloud Computing

1. Application Layer

This layer provides Software as a Service (SaaS).

Applications are provided on demand basis in this layer.

Highly scalable internet based applications are hosted on the

cloud and offered as a service to the end user. Google Docs,

SalesForce.com, Acrobat.com are some popular SaaS.

2. Platform Layer

This layer provides Platform as a Service (PaaS). In this layer

platforms used to design, develop, build and test application

are provided by the cloud infrastructure. Some popular PaaS

providers are Google App Engine, Azure Service Platform.

3. Infrastructure Layer

This layer provides Infrastructure as a Service (IaaS). In this

layer hardware and network equipment are provided as

service. It also provides storage and database management

and computing capabilities on demand. Some well known


2

IaaS providers are Amazon Web Services, Go Grid and 3

Tera.

3. Cloud Security Cloud security has three basic aspects called confidentiality,

Integrity and Availability. This are also called basic security

goals. Fig 2 shows cloud security goals and availability is

focused in this research paper. DDoS attack is an issue of

availability of any cloud service.

A. Confidentiality

Confidentiality is the prevention of the intentional or

unintentional unauthorized disclosure of contents. Sometime

it is referred as Secrecy or Privacy in other terms. It means

that Confidentiality is the prevention of the intentional or

unintentional unauthorized disclosure of contents. Only

authorized person can read, write, view, print and know about

the content is available.

To ensure confidentiality network security protocols, Data

Encryption services, Authentication services are used.

Fig 2 Cloud Security Goals

B. Integrity

Integrity is the guarantee that message sent is received

without changing and the message is not intentionally or

unintentionally altered.

To ensure integrity Firewalls, Intrusion Detection Systems

etc. are used.

C. Availability

Availability is the element that creates reliability and stability

in the system. It means that the data is available whenever

asked.

To ensure Availability redundant disks and network system

and Backup utility are used.

4. Distributed Denial of Service Attack

4.1 Definition and Purpose A denial-of-service attack (DoS attack) is an attempt to make

a computer resource unavailable to its intended users.

Distributed denial-of-service attack (DDoS attack) is a kind of

DoS attack where attackers are distributed and targeting a

victim. It generally consists of the concerted efforts of a

person, or multiple people to prevent an Internet site or

service from functioning efficiently or at all, temporarily or

indefinitely.

Motives of attackers to perform Denial of Service attack could

be of different nature; an attacker might want to show his

power by attacking a large, popular Web site that will permit

him or her to be recognised in the underground community.

Another possibility of motive is a political one, for example, a

political party can ask an attacker to do a DoS attack on the

Web site of a concurrent political party. Most common motive

is the commercial one; a company can ask an attacker to

attack a concurrent; during the time where the commercial

website is unavailable it will lose lots of money and worst, the

trust of user and they might want to go shopping on another

website.

4.2 How DDoS Attack works DDoS attack is performed by infected machines called bots

and group of bots is called botnet. This bots (Zombie) are

controlled by an attacker by installing malicious code or

software which acts as per command passed by attacker. Bots

are ready to attack anytime upon receiving command from the

attacker. Many types of agents have scanning capability that

permit to identify open port of a range of machines. When the

scanning is finished, the agent takes the list of machines with

open port and launches vulnerability-specific scanning to

detect machines with un-patched vulnerability. If the agent

found a machine with vulnerability, it could launch an attack

to install another agent in the machine.

Fig 3 shows DDoS attack architecture explaining its working

mechanism.

Fig 3 DDoS attack Architecture

For controlling agents, the attackers use Command and

Control (C&C) server, currently, the majority of botnet use

Internet Relay Chat (IRC). Other possibilities exist as Peer-to-

Peer (P2P) C&C, Instant Messaging (IM) C&C or even Web-

based C&C server. When the agent is connected to the C&C it

can ask for updates as IP address for other C&Cs, software

update or exploit software. The agent could also ask for

orders, if the agent is newly installed it could ask order for

protect himself, this could be by asking the C&C the location

of the latest anti-virus and prevent it to detect the agent by

stopping the service.

A DoS attack’s main characteristic is that an attacker attempts

to prevent one or more legitimate users of a service from the

use of the required resources. There are two general forms of

DoS attacks: those that crash services and those that flood

services. Therefore, he attempts (1) to inhibit legitimate

network traffic by flooding the network with useless traffic.


3

(2) to deny access to a service by disrupting connections

between two parties, (3) to block the access of a particular

individual to a service, or (4) to disrupt the specific system or

service itself. Here http flooding as a DDoS attack is

considered and its solution is proposed.

5. Hadoop MapReduce

5.1 Introduction MapReduce, developed by Google, is a software paradigm for

processing a large data set in a distributed parallel way. Since

Google’s MapReduce and Google file system (GFS) are

proprietary, an open-source MapReduce software project,

Hadoop, was launched to provide similar capabilities of the

Google’s MapReduce platform by using thousands of cluster

nodes. Hadoop distributed file system (HDFS) is also an

important component of Hadoop, that corresponds to GFS.

Hadoop consists of two core components: the job

management framework that handles the map and reduces

tasks and the Hadoop Distributed File System (HDFS).

Hadoop's job management framework is highly reliable and

available, using techniques such as replication and automated

restart of failed tasks. The framework has optimization for

heterogeneous environments and workloads, e.g., speculative

(redundant) execution that reduces delays due to stragglers.

Fig 4 Hadoop Multinode Cluster Architecture

Fig 4 shows Hadoop multinode cluster architecture which

works in distributed manner for MapReduce problem.

5.2 DDoS Attack and Hadoop DDoS attack is critical and find easily but conventional

solution are single host oriented. We know that for that we

can use honeypot system to overcome from it. Some network

monitoring tools like wireshark, nmap are available but they

just sniff the network and generate log files in huge size. So to

handle this gigantic file we need a special file system like

HDFS and MapReduce framework for processing log

effectively and efficiently. So Hadoop is a well suited solution

here.

6. MapReduce for Counter-based DDoS

Detection method In this method simple MapReduce algorithm to detect DDoS

with URL counting is implemented. This algorithm needs

three input parameters of time interval, threshold and

unbalanced ratio, which can be loaded through the

configuration property or the distributed cache mechanism of

MapReduce. Time interval limits monitoring duration of the

page request. Threshold indicates the permitted frequency of

the page request to the server against the previous normal

status, which determines whether the server should be

alarmed or not. The unbalanced ratio variable denotes the

anomaly ratio of response per page request between a specific

client and a server. This value is used for picking out attackers

from the clients.

The map function filters non-HTTP GET packets and

generates key values of server IP address, masked timestamp,

and client IP address. The masked timestamp with time

interval is used for counting the number of requests from a

specific client to the specific URL within the same time

duration. The reduce function summarizes the number of URL

requests, page requests, and server responses between a client

and a server. Finally, the algorithm aggregates values per

server. When total requests for a specific server exceeds the

threshold, the MapReduce job emits records whose response

ratio against requests is greater than unbalanced ratio,

marking them as attackers. While this algorithm has the low

computational complexity and could be easily converted to

the MapReduce implementation, it needs a prerequisite to

know the threshold value from historical monitoring data in

advance.

Fig 5 MapReduce for counter-based DDoS detection

7. Result As far as scalability is concerned, the performance of the

counter based DDoS detection method is used by varying

cluster nodes. Figure 6 shows statistical graph of the detection

job with ten worker nodes was completed within 25 minutes

for 500GB and 47 minutes for 1TB, which is over 8 times

faster than one node's and 2.9 times faster than 3 nodes'

respectively. From evaluation results the increased

performance enhancement is observed as the volume of input

traffic becomes large.


4

Fig 6 Completion time of a counter-based DDoS detection

job regarding various # of Hadoop datanodes and traffic

sizes

8. Conclusion Scalability of the detection of DDoS attack is focused in this

paper and focused a Hadoop based DDoS detection model.

The approach of Hadoop based solution solves scalability

issues by parallel data processing for DDoS attack detection

for a huge volume of traffic. Other approaches are single host

based and suggesting memory increment for efficiency but in

this paper suggested a Hadoop based solution which solves

scalability issue by parallel data processing.

This kind of MapReduce job is preferable for Hadoop as it is

Computation oriented framework.

9. REFERENCES [1] NathalieWeiler, Eleventh IEEE International Workshops

on Enabling Technologies,2002, “Honeypots for

Distributed Denial of Service Attacks”

[2] Yuqing Mai, Radhika Upadrashta and Xiao Su, ITCC’04

“J-Honeypot: A Java-Based Network Deception Tool

with Monitoring and Intrusion Detection“

[3] Yeonhee Lee, Wonchul Kang, and Youngseok Lee, 51-

63, TMA'11“A Hadoop-based Packet Trace Processing

Tool”

[4] Yun Yang, Hongli Yang and JiaMi, IEEE, 2011, “Design

of Distributed Honeypot System Based on Intrusion

Tracking”

[5] Anup Suresh Talwalkar, 2011, “HadoopT - Breaking the

Scalability Limits of Hadoop”

[6] Flavien Flandrin, 2010, “A Methodology To Evaluate

Rate-Based Intrusion Prevention System Against

Distributed Denial Of Service”

comparison of various classification techniques for satellite...

Documents