[ieee third international conference on semantics, knowledge and grid (skg 2007) - xi'an, shan...
TRANSCRIPT
An Improved Approach and Application for Spatial data mining
Ying Xia, and Xiao-Xi Fu Sino-Korea Chongqing GIS Research Center, College of Computer Science, Chongqing Univ. of
Posts & Telecom,, 400065 Chongqing, P.R China [email protected] , [email protected]
Abstract
The cloud approach can fully consider the uncertainty in process of spatial data mining. However, it has some disadvantages in applications, especially in spatial decision support system. In this paper, we propose an improved fuzzy approach, which remedies these problems. It uses check mechanism to control error of fuzzy and randomness which exist in the spatial decision support system. It improves the cloud approach in that the different factor will be different influence to final evaluation result. It makes the result is more precision and close to real situation. Finally, the performance of proposed method is examined by experiments. The comparative results illustrate that it needs less runtime and gets more accuracy than the original cloud approach. We prove this method is more efficiency by experiments. 1. Introduction
The spatial decision support system is a very important application in spatial data mining and used widely in practice. In the layout of city, government need select site of bus station in a city. In spatial load forecasting, decision-maker needs to evaluate the expansion of the small area in economy and environment [10]. The decision, which gets by spatial decision support system, is essential during investment decision-making. Because, this is a long-term investment and can’t often be readjusted. A suitable decision can reduce cost and enforce operation efficiency [1]. On the other hand, the unsuitable
decision will lead to many serious problems which are beyond remedy [1].
In spatial data mining, if we want get some useful knowledge from GIS (geographical information system) database, the uncertainty won’t be avoided. The uncertainty includes fuzziness and randomness. In the spatial decision support system, both of them are not only among influential factors but also in each factor. The partition of factors will lead to different evaluation result.
In recent year, some works, which applied to spatial decision support system, have enhanced the ability to control about uncertainty using fuzzy logic. The approach based on analytic hierarchy synthetic makes all possible pairwise comparisons with in factors [8]. Another is that it integrates the quantitative data and qualitative ratings to get them [4]. However, all of methods above need a precision function to depict set of fuzzy. Then, the set of fuzzy will lose fuzziness in next process. So the evaluation result is unfaithful by them [6].
In this paper, we present an improved fuzzy approach, which can deal with the problem of lost fuzziness. It uses nebulous cloud to depict set of fuzzy instead of precision function. Therefore, it can keep the fuzziness of attribute during the evaluation process.
However, the cloud approach has some deficiencies. So we extend the traditional cloud approach and propose the improved fuzzy approach. The cloud approach hasn’t a way to control quality of attribute’s partitions. We introduced a check mechanism and a precision parameter. They can find the improper partitions, which can bring to mistake evaluation result and amend them in time.
Second, it can’t consider that each factor will bring different influence to the final evaluated result [3]. That isn’t accord with the real situation. So we add weight parameter in the evaluation process. We can calculate the final result with the precision parameter and weight parameter. So the improved fuzzy approach is more access to real situation in process of decision and the evaluation result is optimization.
──────────────────────── * This research was supported by the MIC(Ministry of Information and Communication), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment) and the Natural Science Foundation of Chongqing 2005BB2059
Third International Conference on Semantics, Knowledge and Grid
0-7695-3007-9/07 $25.00 © 2007 IEEEDOI 10.1109/SKG.2007.28
32
The remained of this paper will be organized as follows. Section 2 is the concepts of the cloud approach. The improved fuzzy approach and process will be presented in section 3. Section 4 gives a case study. Then the simulation experiment will be depicted in section 5. Conclusion is finally drawn in section 6. 2. Related Works
In this section, the cloud approach will be briefly reviewed. In addition, the advantages and disadvantages of it will be introduced.
The cloud approach includes a lot of parts, the important parts are cloud model and concept generalization. The cloud model is an uncertainty model, which transforms between a linguistic term of a qualitative concept and its numerical representation [11, 12].
Cloud model depicts a qualitative concept with 3 numerical characteristics: Expected value (Ex), Entropy (En) and Hyper-Entropy (He) [5]. Expected value (Ex) is the central value of the qualitative concept in the universe of discourse. Entropy (En) expresses the degree of fuzziness of the qualitative concept. Hyper-Entropy (He) is the entropy of entropy (En). It reflects the random of the degree of membership to a qualitative concept. The larger value of He, the more randomness of the degree of membership to a qualitative concept[11]. Figure 1 shows the cloud description with the three digital characteristics.
Fig 1. The cloud description with the three digital
characteristics
The concept generalization is a kind of algorithm, which will divide attribute data into several concept layers [1]. By overlaid cloud model, it can produce attribute generalization. The figure 2 is several cloud models express the attribute generalization. The non-regular data distributions change into the regular data distributions by concept generalization. Then, according to the attribute generalization, we get the weight of attributes by statistics. Finally, the final evaluation result will be calculated [10].
Fig 2. The cloud models of data distribution by
cloud transform The cloud approach provides an effective way to
integrate the randomness and fuzziness by using the formalization- computerized language [10]. It is more appropriate when there is more than one uncertainty at the same time [3].
But it has deficiencies. It can’t consider that each factor will bring different influence to the final evaluated result [3]. In fact, some factors are more important than others. However, the cloud theory hasn’t pay attention to this situation. They think all of factors are equivalence.
3.The Improved Fuzzy Approach And
Process
In this chapter, the improved fuzzy approach will be presented for finding the optimization result and close to real situation.
We extend the cloud approach to remedy deficiencies of it. In spatial decision support system, uncertainty is always met and influences each other. However, the original cloud approach can’t check the quality of partition. At the same time, it doesn’t consider that the different factors have different influence to final evaluation result. In this method, all of factor is the same important to result. It doesn’t accord with real situation and may be lead to error result. Therefore, we use a check mechanism for getting optimization partition of attribute. Then, we design a parameter to solve the problem of factor’s different weight in the real situation. 3.1. Improved Fuzzy Approach Framework
We propose an improved fuzzy approach for spatial decision support system. The figure 3 is the models of this approach.
33
Fig 3. Framework of improved fuzzy approach
Pre-process Data The data in spatial database will
be rearranged according to attribute. We can use them to draw the attribute’s distribution curve.
Partition Attributes with Cloud The non-regular data distributions are changed to the regular distributions by cloud. The partition will be divided into the inter-overlay basic cloud models by cloud. Every cloud model has three numerical characteristics that are prepared for next model.
Check the Partition It checks the quality of partition, which is divided by cloud. The He of every cloud model is the threshold, which is set by user. The partitions of attribute can’t be absolute precision for uncertainty in process of partition. We just get optimization partition.
Therefore, we set a threshold γ, which is the limit of the rank’s random. If He > γ, the cloud model is too discrete. It means that the rank that is divided doesn’t accord with mathematics rule. So we need divide the rank again or combine with other ranks. Whereas, if He < γ, the cloud model is convergence. The curve likes the normal distribution curve. It accords with mathematics rule. Finally, all of He in the cloud model is less than β. These ranks are appropriate.
Precision Parameter α The α depicts the accuracy degree of the affecting factor’s rank. He gets from cloud model above. The bigger He is more divergent information and the partition is more inaccuracy. iHe is the inaccuracy degree of the rank i in an attribute.
iα is the precision’s degree of the rank. Because, iHe is a percent. iα and iHe depict the opposite concept. So, if iHe is increased, iα decreases at the same time. The constant 100 will make the iα keep this trend. We design the formulae (1) as follow:
1100i
iHeα = (1)
When iα is larger, the partition to this rank is more convergence; the accuracy of partition is larger; the partition’s quality is higher. The quality of partition will influence the quality of the final result.
Weight Parameter β It depicts the weight of a factor to the whole system. In original cloud approach, it thinks all of factor in system are the same weight. However, the different factors have different influence to final evaluation result in real situation. So the weight of factors is different in system. For example, in the bus station decision system, the weight of population density is more important than the expansion of this place. The weight parameter gets from statistics and survey.
The weight parameter β is used to change the weight of attribute after getting criterion’s accuracy. In the improved fuzzy approach, when the criterion and weight by statistics and survey are optimization, the result is best. It is different with original cloud approach, which just has one weight of factor getting from statistics.
We give a definition first for explaining the evaluation process. C and Dare the matrixes which are m rows and n columns. C ⊗ D means each element in the C will multiply counterpoint element in the D. We will get matrix L that is like as follow: (1 , 1 )i m j n≤ ≤ ≤ ≤
L C D= ⊗ 11 12 111 12 1
21 22 221 22 2
1 21 2
... ...
... ...... ... ... ... ... ... ... ...
... ...
nn
nn
m m mnm m mn
c c c d d dc c c d d d
c c c d d d
= ⊗
( )ij ij m nC D ×= × 11 12 111 12 1
21 22 221 22 2
1 21 2
...
...... ... ... ...
...
nn
nn
m m mnm m mn
c d c d c dc d c d c d
c d c d c d
× × × × × × = × × ×
3.2. The process of method
The steps of method are the following: Step 1: The distribution function is gained from
database. The distribution function is gained according to the attribute value in the database. Each attribute have a function.
Step 2: The attribute will be generalized to produce the higher concept by cloud model. Every attribute will be divided several ranks. Calculating the hype entropy He of every cloud model.
Every cloud model is checked by γ. If He > γ, the ranks of this attribute will be divided again. Repeat this step until all of He is less than γ.
34
Step 3: The α of all of attributes compose the matrix K. 11α means that it is the degree of precision of the attribute 1 and the highest rank. n1α means that it is the degree of precision of the attribute 1 and the lowest rank.
1100i
iHeα =
The β of all of attributes compose the matrix S. n1β means that it is the weight of attribute 1 and the lowest rank to the whole system.
The matrix W is the weight of the attributes in each by statistics.
11 12 1
21 22 2
1 2
...
...... ... ... ...
...
n
n
m m mn
K
α α αα α α
α α α
=
11 12 1
21 22 2
1 2
...
...... ... ... ...
...
n
n
m m mn
S
β β β β β β = β β β
11 12 1
21 22 2
1 2
...
...... ... ... ...
...
n
n
m m mn
w w ww w w
W
w w w
=
Step 4: Calculation of evaluation result .The final evaluation is calculated by the 3 weights of attribute. The precision’s degree of attribute α and the weight of attribute to the whole attributes β are 2 parameters in this process. They can depict the precision weight of the attribute i and the whole system weight of it. According to formula (2), we can get the matrix R which multiply the weight of the accuracy and weight by statistics
( )j ij m n
R W KWi K ×
= ⊗ = ×
1
2
11 1211 12 1
21 2221 22 2
1 21 2
...
...... ... ... ...
...
nn
nn
m m mnm m mn
w w ww w w
w w w
α α αα α α
α α α
× × × × × × = × × ×
11 12 1
21 22 2
1 2
...
...... ... ... ...
...
n
n
m m mn
r r rr r r
r r r
=
S is the matrix of different weight to the whole attributes. H is the matrix of multiplication by S and R.H means all of rank and attributes have different weight to the final evaluation.
( )j ij m n
H S RSi R ×
= ⊗ = ×
1
2
11 1211 12 1
21 2221 22 2
1 21 2
...
...... ... ... ...
...
nn
nn
m m mnm m mn
r r rr r r
r r r
β × β × β × β × β × β × = β × β × β ×
11 12 1
21 22 2
1 2
...
...... ... ... ...
...
n
n
m m m n
h h hh h h
h h h
=
P is the matrix of a site, which has the one rank in different attribute. The formula (3) means that the each attribute of a place belongs to a rank. T is the matrix of multiplication by H and P
T P H= ⊗ (3) E is the final result will be calculated. Q is the row
vector. The entire element in it is 1. (1 1 ... 1)Q =
TE Q T Q= 11 12 1
21 22 2
1 2
... 1
... 1(1 1 ... 1)
... ... ... ... ...... 1
n
n
m m mn
t t tt t t
t t t
=
Repeat step 4 until get all of site and arrange order for them. 4. Case Study
We simulate the government to select bus station for city planning. The data come from QARCT [3], a source data generator, and the synthetic. The attribute of shopping centre’s distance is regarded as an example in this case. We select some important influential factors in the spatial decision support system for the bus station selection. The initial geographical database of a city likes as Table 1.
Table 1. The Initial Geographical Database
No. Shopping centre
Length of street
Park … Residential
area
1 6607 7710 935 … 3658
2 659 9520 1574 … 621
… … … … … …
600 2136 8762 7554 … 6542
Step1: We need deal with all of attributes to
generalization by cloud model. It is the number type of records in initial database. The attribute has been generalized by cloud model. They are defined as the language values of A-near, B-mid-near, C-mid-far and D-far. At the same time, every rank of the three digital characteristics has been got. The inaccuracy’s value of each rank of shopping centre’s distance calculated respectively, 0.027, 0.03, 0.013, and 0.021.
Step 2: The matrix K is the precision of attribute’s rank that is calculated by formula (1). The matrix S is the each rank weight of an attribute to the whole
35
attributes. W is the weight of each rank in an attribute by statistics. The table 2 is the 3 kinds of weight of attributes. Cloud models and statistics get them.
111
1 1 0 .371 00 100 0 .027A
KH e
= = =×
… 4
1 1 0.4 8100 100 0.021
nnD
KH e
= = =×
Table 2 The Three Kinds Of Weights Of Attributes
0.37 0.33 0.77 0.480.26 0.77 0.303 0.385... ... ... ...
0.714 0.5 0.33 0.77
K
=
0.172 0.31 0.296 0.2220.145 0.306 0.355 0.194
... ... ... ...0.179 0.324 0.267 0.23
S
=
0.051 0.092 0.088 0.0660.024 0.051 0.059 0.032
... ... ... ...0.0096 0.018 0.014 0.023
W
=
( )j ij ij m n
H W K SWi K S ×
= ⊗ ⊗ = × ×
0.0032 0.0094 0.02 0.0070.0009 0.012 0.0063 0.0024
... ... ... ...0.0012 0.003 0.0012 0.0041
=
Step 3: The matrix H is the weight of each rank of all of attributes. Q is the row vector. The entire elements are 1. Table 3 is a site, which has the one rank in different attribute. E is the score of final evaluation to a site. We can calculate other candidate sites by the method.
Table 3. The Candidate Place Has The Different
Influential Attributes The value of weight
Attributes A B C D
Shopping centre √
Length of street √
… … … … … Residential area √
1 0 0 00 0 0 1. . . . . . . . . . . .0 0 1 0
P
=
0 .0 0 3 2 0 0 0
0 0 0 0 .0 0 2 4. . . . . . . . . . . .0 0 0 .0 0 1 2 0
T P H
= ⊗ =
TE Q T Q= 0.0032 0 0 0 1
0 0 0 0.0024 1(1 1 ... 1)
... ... ... ... ...0 0 0.0012 0 1
=
=0.034 5. Performance Evaluation
We carried out a serial of simulation experiments to prove the improved fuzzy approach. All of the experiments were conducted on a computer with an Intel Pentium IV CPU 1.8GHz, 768 MB of main memory, and running MS Windows XP. Both approaches were implemented using Microsoft Visual C++ 6.0. The experiment data included the statistics and simulation data that were generated by the QARCT [3]. The generator is popular in simulating spatial decision support system.
02468
1012
1 2 5 10 12 14 16 18 20 25Number of dat a(×1000)
Tim
e(se
cond
)
CA IFA
Fig 4. Time comparison between using IFA and CA
The precision of each rank by cloud model
The weight of each rank to the whole attributes The weight of each rank by statistics
Attribute
A B C D A B C D A B C D Shopping
centre 0.37 0.33 0.77 0.48 0.172 0.31 0.296 0.222 0.051 0.092 0.088 0.066
Length of street 0.26 0.77 0.303 0.385 0.145 0.306 0.355 0.194 0.024 0.051 0.059 0.032
… … … … … … … … … … … … … Residential
area 0.714 0.5 0.33 0.77 0.179 0.324 0.267 0.23 0.0096 0.018 0.014 0.023
36
0. 88
0. 9
0. 92
0. 94
0. 96
0. 98
1
1 2 5 10 12 14 16 18 20Number of dat a(× 1000)
Accu
racy
(%)
CA IFA
Fig5. Accuracy comparison between using IFA
and CA
We compare the performance of the improved fuzzy approach (IFA) and the cloud approach (CA). Figure 4 shows the comparison of runtime between them, where the number of input data is from 1000 to 25000. We can see that the improved fuzzy approach need less time than the cloud approach. The reason is that the cloud approach generates too many candidates. But, in our approach, the check mechanism will prune number of candidates .It can produce high quality rank directly. It can reduce the quantity of computability
Figure 5 shows the accuracy comparison between improved fuzzy approach and cloud approach. We can see that improved fuzzy approach is superior to cloud approach in control of accuracy. Since in improved fuzzy approach, the check mechanism controls qualities of partitions in factors. It can find some improper partitions, which will lead to mistake evaluation result and amend them in time. However, the cloud approach hasn’t judge quality of these partitions. It makes accuracy decreases.
All the experiment results above, we can tell that improved fuzzy approach performs well in terms of runtime and accuracy.
6. Conclusion
In this paper, an improved fuzzy approach and its application are proposed for spatial data mining. In spatial decision support system, the check mechanism can control quality of partitions from cloud approach. We can get optimization partitions by it. The method can fully consider that different attribute has different influence to final result. It makes the final evaluation
result is close to real situation and more precision. We compare it with other method in simulation experiments, and results show the improved fuzzy approach is better than the other method. It can reduce the runtime and provide better accuracy.
7. References [1] Chen-Tung Chen, “A fuzzy approach to select the location of the distribution center”, Fuzzy Sets and Systems 118 pp. 65-73, 2001 [2] Casaca,A., Presutto, F., Rebelo, I., Pestana, G. and Grilo, A. “An Airport Network for Mobiles Surveillance”, In Proc. of the16th International Conference on Computer Communic-ation, Beijing, China, 2004. [3] Yi Du “The Research And The Application Of Association Rules In Data Mining”, 2000 , ISBN 7-121-00308-2,pp. 1703-1708 [4] G.A. Spohrer, T.R. Kmak, “Qualitative analysis used in evaluationg alternative plant location scenarios”, Indust. Eng, .August 52-56, 1984 [5] Gabriel Pestana, Miguel Mira da Silva, “An Airport Decision Support System For Mobiles Surveillance & Alerting”, MobiDE’05, June 12,2005 [6] Deyi Li, Haijun Meng, Xuemei Shi, Xinzhou Wang, “On spatial data mining and knowledge discovery (SDMKD)”, Geomatics and Information Science of Wuhan University, 2001, 26(6):491-499. [7] Deyi Li, “The Cloud Control Method and Balancing Patterns of Triple Link Inverted Pendulum Systerms”, Chinese Engineering Science. 1999,Vol 1,No 2, p41-46 [8] YAN-LI LU ,JING-YUAN HAN, FA-CHAO LI, “Fuzzy Synthetic Evaluation On Customer Loyalty Based On Analytic Hierararchy Process” The Fourth International Conference On Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005 [9] Xie Cui-hua, Li Yun, “A Constructing Algorithm of Concept Lattice with Attribute Generalization Based on Cloud Models”, The fifth international Conference on Computer and information Technology, 2005 [10] Xueming Yang ,Jinsha Yuan, “Application of Uncertainty Reasoning Based on Cloud Theory in Spatial Load Forecasting”, The 6th world congress on intelligent control and automation , June 21-23, 2006 [11] Guangwei Zhang, Jianchu Kang, “Towards a Trust Model with Uncertainty for e-commerce Systems”, IEEE International Conference on e-business Engineering, 18-21 Oct. 2005, pp. 200- 207 [12] Lingbo zhang, Fuchun Sun, “Cloud Model based Control of Flexible-Link Manipulators”, Neural Networks and Brain, ICNN&B '05. International Conference , Volume: 2, 2005,pp. 1067- 1072
37