[ieee third international conference on semantics, knowledge and grid (skg 2007) - xi'an, shan...

An Improved Approach and Application for Spatial data mining

Ying Xia, and Xiao-Xi Fu Sino-Korea Chongqing GIS Research Center, College of Computer Science, Chongqing Univ. of

Posts & Telecom,, 400065 Chongqing, P.R China [email protected] , [email protected]

Abstract

The cloud approach can fully consider the uncertainty in process of spatial data mining. However, it has some disadvantages in applications, especially in spatial decision support system. In this paper, we propose an improved fuzzy approach, which remedies these problems. It uses check mechanism to control error of fuzzy and randomness which exist in the spatial decision support system. It improves the cloud approach in that the different factor will be different influence to final evaluation result. It makes the result is more precision and close to real situation. Finally, the performance of proposed method is examined by experiments. The comparative results illustrate that it needs less runtime and gets more accuracy than the original cloud approach. We prove this method is more efficiency by experiments. 1. Introduction

The spatial decision support system is a very important application in spatial data mining and used widely in practice. In the layout of city, government need select site of bus station in a city. In spatial load forecasting, decision-maker needs to evaluate the expansion of the small area in economy and environment [10]. The decision, which gets by spatial decision support system, is essential during investment decision-making. Because, this is a long-term investment and can’t often be readjusted. A suitable decision can reduce cost and enforce operation efficiency [1]. On the other hand, the unsuitable

decision will lead to many serious problems which are beyond remedy [1].

In spatial data mining, if we want get some useful knowledge from GIS (geographical information system) database, the uncertainty won’t be avoided. The uncertainty includes fuzziness and randomness. In the spatial decision support system, both of them are not only among influential factors but also in each factor. The partition of factors will lead to different evaluation result.

In recent year, some works, which applied to spatial decision support system, have enhanced the ability to control about uncertainty using fuzzy logic. The approach based on analytic hierarchy synthetic makes all possible pairwise comparisons with in factors [8]. Another is that it integrates the quantitative data and qualitative ratings to get them [4]. However, all of methods above need a precision function to depict set of fuzzy. Then, the set of fuzzy will lose fuzziness in next process. So the evaluation result is unfaithful by them [6].

In this paper, we present an improved fuzzy approach, which can deal with the problem of lost fuzziness. It uses nebulous cloud to depict set of fuzzy instead of precision function. Therefore, it can keep the fuzziness of attribute during the evaluation process.

However, the cloud approach has some deficiencies. So we extend the traditional cloud approach and propose the improved fuzzy approach. The cloud approach hasn’t a way to control quality of attribute’s partitions. We introduced a check mechanism and a precision parameter. They can find the improper partitions, which can bring to mistake evaluation result and amend them in time.

Second, it can’t consider that each factor will bring different influence to the final evaluated result [3]. That isn’t accord with the real situation. So we add weight parameter in the evaluation process. We can calculate the final result with the precision parameter and weight parameter. So the improved fuzzy approach is more access to real situation in process of decision and the evaluation result is optimization.

──────────────────────── * This research was supported by the MIC(Ministry of Information and Communication), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment) and the Natural Science Foundation of Chongqing 2005BB2059

Third International Conference on Semantics, Knowledge and Grid

0-7695-3007-9/07 $25.00 © 2007 IEEEDOI 10.1109/SKG.2007.28

32

The remained of this paper will be organized as follows. Section 2 is the concepts of the cloud approach. The improved fuzzy approach and process will be presented in section 3. Section 4 gives a case study. Then the simulation experiment will be depicted in section 5. Conclusion is finally drawn in section 6. 2. Related Works

In this section, the cloud approach will be briefly reviewed. In addition, the advantages and disadvantages of it will be introduced.

The cloud approach includes a lot of parts, the important parts are cloud model and concept generalization. The cloud model is an uncertainty model, which transforms between a linguistic term of a qualitative concept and its numerical representation [11, 12].

Cloud model depicts a qualitative concept with 3 numerical characteristics: Expected value (Ex), Entropy (En) and Hyper-Entropy (He) [5]. Expected value (Ex) is the central value of the qualitative concept in the universe of discourse. Entropy (En) expresses the degree of fuzziness of the qualitative concept. Hyper-Entropy (He) is the entropy of entropy (En). It reflects the random of the degree of membership to a qualitative concept. The larger value of He, the more randomness of the degree of membership to a qualitative concept[11]. Figure 1 shows the cloud description with the three digital characteristics.

Fig 1. The cloud description with the three digital

characteristics

The concept generalization is a kind of algorithm, which will divide attribute data into several concept layers [1]. By overlaid cloud model, it can produce attribute generalization. The figure 2 is several cloud models express the attribute generalization. The non-regular data distributions change into the regular data distributions by concept generalization. Then, according to the attribute generalization, we get the weight of attributes by statistics. Finally, the final evaluation result will be calculated [10].

Fig 2. The cloud models of data distribution by

cloud transform The cloud approach provides an effective way to

integrate the randomness and fuzziness by using the formalization- computerized language [10]. It is more appropriate when there is more than one uncertainty at the same time [3].

But it has deficiencies. It can’t consider that each factor will bring different influence to the final evaluated result [3]. In fact, some factors are more important than others. However, the cloud theory hasn’t pay attention to this situation. They think all of factors are equivalence.

3.The Improved Fuzzy Approach And

Process

In this chapter, the improved fuzzy approach will be presented for finding the optimization result and close to real situation.

We extend the cloud approach to remedy deficiencies of it. In spatial decision support system, uncertainty is always met and influences each other. However, the original cloud approach can’t check the quality of partition. At the same time, it doesn’t consider that the different factors have different influence to final evaluation result. In this method, all of factor is the same important to result. It doesn’t accord with real situation and may be lead to error result. Therefore, we use a check mechanism for getting optimization partition of attribute. Then, we design a parameter to solve the problem of factor’s different weight in the real situation. 3.1. Improved Fuzzy Approach Framework

We propose an improved fuzzy approach for spatial decision support system. The figure 3 is the models of this approach.

33

Fig 3. Framework of improved fuzzy approach

Pre-process Data The data in spatial database will

be rearranged according to attribute. We can use them to draw the attribute’s distribution curve.

Partition Attributes with Cloud The non-regular data distributions are changed to the regular distributions by cloud. The partition will be divided into the inter-overlay basic cloud models by cloud. Every cloud model has three numerical characteristics that are prepared for next model.

Check the Partition It checks the quality of partition, which is divided by cloud. The He of every cloud model is the threshold, which is set by user. The partitions of attribute can’t be absolute precision for uncertainty in process of partition. We just get optimization partition.

Therefore, we set a threshold γ, which is the limit of the rank’s random. If He > γ, the cloud model is too discrete. It means that the rank that is divided doesn’t accord with mathematics rule. So we need divide the rank again or combine with other ranks. Whereas, if He < γ, the cloud model is convergence. The curve likes the normal distribution curve. It accords with mathematics rule. Finally, all of He in the cloud model is less than β. These ranks are appropriate.

Precision Parameter α The α depicts the accuracy degree of the affecting factor’s rank. He gets from cloud model above. The bigger He is more divergent information and the partition is more inaccuracy. iHe is the inaccuracy degree of the rank i in an attribute.

iα is the precision’s degree of the rank. Because, iHe is a percent. iα and iHe depict the opposite concept. So, if iHe is increased, iα decreases at the same time. The constant 100 will make the iα keep this trend. We design the formulae (1) as follow:

1100i

iHeα = (1)

When iα is larger, the partition to this rank is more convergence; the accuracy of partition is larger; the partition’s quality is higher. The quality of partition will influence the quality of the final result.

Weight Parameter β It depicts the weight of a factor to the whole system. In original cloud approach, it thinks all of factor in system are the same weight. However, the different factors have different influence to final evaluation result in real situation. So the weight of factors is different in system. For example, in the bus station decision system, the weight of population density is more important than the expansion of this place. The weight parameter gets from statistics and survey.

The weight parameter β is used to change the weight of attribute after getting criterion’s accuracy. In the improved fuzzy approach, when the criterion and weight by statistics and survey are optimization, the result is best. It is different with original cloud approach, which just has one weight of factor getting from statistics.

We give a definition first for explaining the evaluation process. C and Dare the matrixes which are m rows and n columns. C ⊗ D means each element in the C will multiply counterpoint element in the D. We will get matrix L that is like as follow: (1 , 1 )i m j n≤ ≤ ≤ ≤

L C D= ⊗ 11 12 111 12 1

21 22 221 22 2

1 21 2

... ...

... ...... ... ... ... ... ... ... ...

... ...

nn

nn

m m mnm m mn

c c c d d dc c c d d d

c c c d d d

= ⊗

( )ij ij m nC D ×= × 11 12 111 12 1

21 22 221 22 2

1 21 2

...

...... ... ... ...

...

nn

nn

m m mnm m mn

c d c d c dc d c d c d

c d c d c d

× × × × × × = × × ×

3.2. The process of method

The steps of method are the following: Step 1: The distribution function is gained from

database. The distribution function is gained according to the attribute value in the database. Each attribute have a function.

Step 2: The attribute will be generalized to produce the higher concept by cloud model. Every attribute will be divided several ranks. Calculating the hype entropy He of every cloud model.

Every cloud model is checked by γ. If He > γ, the ranks of this attribute will be divided again. Repeat this step until all of He is less than γ.

34

Step 3: The α of all of attributes compose the matrix K. 11α means that it is the degree of precision of the attribute 1 and the highest rank. n1α means that it is the degree of precision of the attribute 1 and the lowest rank.

1100i

iHeα =

The β of all of attributes compose the matrix S. n1β means that it is the weight of attribute 1 and the lowest rank to the whole system.

The matrix W is the weight of the attributes in each by statistics.

11 12 1

21 22 2

1 2

...

...... ... ... ...

...

n

n

m m mn

K

α α αα α α

α α α

=

11 12 1

21 22 2

1 2

...

...... ... ... ...

...

n

n

m m mn

S

β β β β β β = β β β

11 12 1

21 22 2

1 2

...

...... ... ... ...

...

n

n

m m mn

w w ww w w

W

w w w

=

Step 4: Calculation of evaluation result .The final evaluation is calculated by the 3 weights of attribute. The precision’s degree of attribute α and the weight of attribute to the whole attributes β are 2 parameters in this process. They can depict the precision weight of the attribute i and the whole system weight of it. According to formula (2), we can get the matrix R which multiply the weight of the accuracy and weight by statistics

( )j ij m n

R W KWi K ×

= ⊗ = ×

1

2

11 1211 12 1

21 2221 22 2

1 21 2

...

...... ... ... ...

...

nn

nn

m m mnm m mn

w w ww w w

w w w

α α αα α α

α α α

× × × × × × = × × ×

11 12 1

21 22 2

1 2

...

...... ... ... ...

...

n

n

m m mn

r r rr r r

r r r

=

S is the matrix of different weight to the whole attributes. H is the matrix of multiplication by S and R.H means all of rank and attributes have different weight to the final evaluation.

( )j ij m n

H S RSi R ×

= ⊗ = ×

1

2

11 1211 12 1

21 2221 22 2

1 21 2

...

...... ... ... ...

...

nn

nn

m m mnm m mn

r r rr r r

r r r

β × β × β × β × β × β × = β × β × β ×

11 12 1

21 22 2

1 2

...

...... ... ... ...

...

n

n

m m m n

h h hh h h

h h h

=

P is the matrix of a site, which has the one rank in different attribute. The formula (3) means that the each attribute of a place belongs to a rank. T is the matrix of multiplication by H and P

T P H= ⊗ (3) E is the final result will be calculated. Q is the row

vector. The entire element in it is 1. (1 1 ... 1)Q =

TE Q T Q= 11 12 1

21 22 2

1 2

... 1

... 1(1 1 ... 1)

... ... ... ... ...... 1

n

n

m m mn

t t tt t t

t t t

=

Repeat step 4 until get all of site and arrange order for them. 4. Case Study

We simulate the government to select bus station for city planning. The data come from QARCT [3], a source data generator, and the synthetic. The attribute of shopping centre’s distance is regarded as an example in this case. We select some important influential factors in the spatial decision support system for the bus station selection. The initial geographical database of a city likes as Table 1.

Table 1. The Initial Geographical Database

No. Shopping centre

Length of street

Park … Residential

area

1 6607 7710 935 … 3658

2 659 9520 1574 … 621

… … … … … …

600 2136 8762 7554 … 6542

Step1: We need deal with all of attributes to

generalization by cloud model. It is the number type of records in initial database. The attribute has been generalized by cloud model. They are defined as the language values of A-near, B-mid-near, C-mid-far and D-far. At the same time, every rank of the three digital characteristics has been got. The inaccuracy’s value of each rank of shopping centre’s distance calculated respectively, 0.027, 0.03, 0.013, and 0.021.

Step 2: The matrix K is the precision of attribute’s rank that is calculated by formula (1). The matrix S is the each rank weight of an attribute to the whole

35

attributes. W is the weight of each rank in an attribute by statistics. The table 2 is the 3 kinds of weight of attributes. Cloud models and statistics get them.

111

1 1 0 .371 00 100 0 .027A

KH e

= = =×

… 4

1 1 0.4 8100 100 0.021

nnD

KH e

= = =×

Table 2 The Three Kinds Of Weights Of Attributes

0.37 0.33 0.77 0.480.26 0.77 0.303 0.385... ... ... ...

0.714 0.5 0.33 0.77

K

=

0.172 0.31 0.296 0.2220.145 0.306 0.355 0.194

... ... ... ...0.179 0.324 0.267 0.23

S

=

0.051 0.092 0.088 0.0660.024 0.051 0.059 0.032

... ... ... ...0.0096 0.018 0.014 0.023

W

=

( )j ij ij m n

H W K SWi K S ×

= ⊗ ⊗ = × ×

0.0032 0.0094 0.02 0.0070.0009 0.012 0.0063 0.0024

... ... ... ...0.0012 0.003 0.0012 0.0041

=

Step 3: The matrix H is the weight of each rank of all of attributes. Q is the row vector. The entire elements are 1. Table 3 is a site, which has the one rank in different attribute. E is the score of final evaluation to a site. We can calculate other candidate sites by the method.

Table 3. The Candidate Place Has The Different

Influential Attributes The value of weight

Attributes A B C D

Shopping centre √

Length of street √

… … … … … Residential area √

1 0 0 00 0 0 1. . . . . . . . . . . .0 0 1 0

P

=

0 .0 0 3 2 0 0 0

0 0 0 0 .0 0 2 4. . . . . . . . . . . .0 0 0 .0 0 1 2 0

T P H

= ⊗ =

TE Q T Q= 0.0032 0 0 0 1

0 0 0 0.0024 1(1 1 ... 1)

... ... ... ... ...0 0 0.0012 0 1

=

=0.034 5. Performance Evaluation

We carried out a serial of simulation experiments to prove the improved fuzzy approach. All of the experiments were conducted on a computer with an Intel Pentium IV CPU 1.8GHz, 768 MB of main memory, and running MS Windows XP. Both approaches were implemented using Microsoft Visual C++ 6.0. The experiment data included the statistics and simulation data that were generated by the QARCT [3]. The generator is popular in simulating spatial decision support system.

02468

1012

1 2 5 10 12 14 16 18 20 25Number of dat a(×1000)

Tim

e(se

cond

)

CA IFA

Fig 4. Time comparison between using IFA and CA

The precision of each rank by cloud model

The weight of each rank to the whole attributes The weight of each rank by statistics

Attribute

A B C D A B C D A B C D Shopping

centre 0.37 0.33 0.77 0.48 0.172 0.31 0.296 0.222 0.051 0.092 0.088 0.066

Length of street 0.26 0.77 0.303 0.385 0.145 0.306 0.355 0.194 0.024 0.051 0.059 0.032

… … … … … … … … … … … … … Residential

area 0.714 0.5 0.33 0.77 0.179 0.324 0.267 0.23 0.0096 0.018 0.014 0.023

36

0. 88

0. 9

0. 92

0. 94

0. 96

0. 98

1

1 2 5 10 12 14 16 18 20Number of dat a(× 1000)

Accu

racy

(%)

CA IFA

Fig5. Accuracy comparison between using IFA

and CA

We compare the performance of the improved fuzzy approach (IFA) and the cloud approach (CA). Figure 4 shows the comparison of runtime between them, where the number of input data is from 1000 to 25000. We can see that the improved fuzzy approach need less time than the cloud approach. The reason is that the cloud approach generates too many candidates. But, in our approach, the check mechanism will prune number of candidates .It can produce high quality rank directly. It can reduce the quantity of computability

Figure 5 shows the accuracy comparison between improved fuzzy approach and cloud approach. We can see that improved fuzzy approach is superior to cloud approach in control of accuracy. Since in improved fuzzy approach, the check mechanism controls qualities of partitions in factors. It can find some improper partitions, which will lead to mistake evaluation result and amend them in time. However, the cloud approach hasn’t judge quality of these partitions. It makes accuracy decreases.

All the experiment results above, we can tell that improved fuzzy approach performs well in terms of runtime and accuracy.

6. Conclusion

In this paper, an improved fuzzy approach and its application are proposed for spatial data mining. In spatial decision support system, the check mechanism can control quality of partitions from cloud approach. We can get optimization partitions by it. The method can fully consider that different attribute has different influence to final result. It makes the final evaluation

result is close to real situation and more precision. We compare it with other method in simulation experiments, and results show the improved fuzzy approach is better than the other method. It can reduce the runtime and provide better accuracy.

7. References [1] Chen-Tung Chen, “A fuzzy approach to select the location of the distribution center”, Fuzzy Sets and Systems 118 pp. 65-73, 2001 [2] Casaca,A., Presutto, F., Rebelo, I., Pestana, G. and Grilo, A. “An Airport Network for Mobiles Surveillance”, In Proc. of the16th International Conference on Computer Communic-ation, Beijing, China, 2004. [3] Yi Du “The Research And The Application Of Association Rules In Data Mining”, 2000 , ISBN 7-121-00308-2,pp. 1703-1708 [4] G.A. Spohrer, T.R. Kmak, “Qualitative analysis used in evaluationg alternative plant location scenarios”, Indust. Eng, .August 52-56, 1984 [5] Gabriel Pestana, Miguel Mira da Silva, “An Airport Decision Support System For Mobiles Surveillance & Alerting”, MobiDE’05, June 12,2005 [6] Deyi Li, Haijun Meng, Xuemei Shi, Xinzhou Wang, “On spatial data mining and knowledge discovery (SDMKD)”, Geomatics and Information Science of Wuhan University, 2001, 26(6):491-499. [7] Deyi Li, “The Cloud Control Method and Balancing Patterns of Triple Link Inverted Pendulum Systerms”, Chinese Engineering Science. 1999,Vol 1,No 2, p41-46 [8] YAN-LI LU ,JING-YUAN HAN, FA-CHAO LI, “Fuzzy Synthetic Evaluation On Customer Loyalty Based On Analytic Hierararchy Process” The Fourth International Conference On Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005 [9] Xie Cui-hua, Li Yun, “A Constructing Algorithm of Concept Lattice with Attribute Generalization Based on Cloud Models”, The fifth international Conference on Computer and information Technology, 2005 [10] Xueming Yang ,Jinsha Yuan, “Application of Uncertainty Reasoning Based on Cloud Theory in Spatial Load Forecasting”, The 6th world congress on intelligent control and automation , June 21-23, 2006 [11] Guangwei Zhang, Jianchu Kang, “Towards a Trust Model with Uncertainty for e-commerce Systems”, IEEE International Conference on e-business Engineering, 18-21 Oct. 2005, pp. 200- 207 [12] Lingbo zhang, Fuchun Sun, “Cloud Model based Control of Flexible-Link Manipulators”, Neural Networks and Brain, ICNN&B '05. International Conference , Volume: 2, 2005,pp. 1067- 1072

37

[ieee third international conference on semantics, knowledge and grid (skg 2007) - xi'an, shan...

Documents