[ieee 2011 fourth international workshop on advanced computational intelligence (iwaci) - wuhan,...

�

Abstract—Effective mining technology can extract the spatial distribution pattern of the road network traffic flow. In this paper, the similarities between traffic flow objects with spatial temporal characteristics were measured by introducing the Dynamic Time Warping (DTW) and the shortest path analysis method. We proposed a kind of clustering analysis method for road network traffic flow data. So that traffic flow data objects with similar properties and space correlation are clustered into one class, which found that the spatial distribution pattern of road traffic flow. The experimental results show that the method was effective. The road network was classified reasonably, and classification results could provide traffic zone division with decision auxiliary support.

I. INTRODUCTION

OR traffic flow on road network, there are different spatial distribution patterns. Such as linear pattern for major road traffic flow, surface pattern for thriving road and so on.

According to the characteristics of the spatial distribution of traffic flow, dynamic traffic zone partition is one of the research hotspots of intelligent transportation system. But road traffic zone partition also can produce corresponding change with the traffic flow peak, normal and bottom period change. We can apply effective mining technology to extract the spatial distribution pattern on road network. It is helpful topartition traffic area, manage and control road network traffic,and increase the road network capacity and ease the traffic pressure.

Clustering analysis is one of the most useful methods in knowledge acquisition, and is used to discovering underlying clusters and interest distributed pattern from data itself. Cluster analysis techniques in general can be divided into two categories, namely, crisp and fuzzy clustering. Crisp clustering is a kind of non-overlapping partitions method. And exploiting it means that an object either belongs to one class or not according to some proximity measure and clustering criterion. But the crisp clustering can cut off the link between objects and cause more deviation for clustering results. While the issue of uncertainty support in clustering task leads to the introduction of algorithms that use fuzzy logic concepts in clustering procedure [1]. A common fuzzy clustering algorithm is the Fuzzy C-Means (FCM)[2]. It attempts to find the most characteristic point in each cluster, which can be

Manuscript received July 15, 2011. This work was supported by the independent research project of Wuhan University under Grant 111022.

Hu Chunchun is with the School of Geodesy and Geomatics, Wuhan University, 129 Luoyu Road,Wuhan 430079, China (e-mail: [email protected]).

Yan Xiaohong is with Shen Zhen Geotechnical Investigation&Surveying Institute Co. Ltd, 1043 Shangbu Road, ShenZhen,518028, China.

considered as the “center” of the cluster, and the grade of membership for each object in the clusters [1].

Spatial clustering analysis of traffic flow can find the spatial distribution patterns on road network, and it make that traffic flow data objects with similar properties and spatial association, are clustered into one class. For related research on clustering analysis of traffic flow, similarity measure of traffic flow time series was studied in the literature [3], and it achieved the effective separation of traffic flow time series. In the literature[4], it clustered traffic flow sequence by employing partition clustering technique, and could identify road traffic flow of the TOD ( time of day) interval according to different flow. However, aforementioned research about traffic flow clustering analysis only considered the time attribute of road traffic flow, without considering the spatial distribution characteristics on road network. Traffic flow on road network is related with temporal information and road segment, so the spatial clustering is different from the common clustering method. Similarity measurement of traffic flow not only consider the difference of traffic flow time series with dynamic characteristic, but also focus topology relation between road segments on road network. Based on the fuzzy clustering algorithm ideas, considered the road network traffic flow distribution, the paper propose a new clustering method which can well measure the similarity of traffic flow object, and further find potential traffic pattern from a large number of road network traffic flow sequences.

II. SIMILARITY MEASUREMENT OF THE TRAFFIC FLOW OBJECT

A good spatial clustering algorithm can group road segments according to the data characteristics of road network.Traffic flow sequence on road network is multidimensional,so a suitable distance function is essential in order to express the similarity between two road sections. At present, the Lp paradigm [5] and Dynamic Time Warping (DTW)[6] are distance function for time series similarity analysis. Among them, similarity distance measure algorithm is simple and easy to implement based on the Lp paradigm, but it can only be treated with equal length of time sequences. Whereas DTW isa kind of dynamic programming method for time series similarity measure, and it is not subjected to time sequence length limit. Given two time sequence Q and C which their data lengths are n and m respectively, we can calculate the distances between them in order to compare the similarity.Smaller distance express greater the similarity. In distance matrix of two different time sequence, a group of contiguous matrix elements set, which defined dissimilarity relations between sequences, is called a curved path. The aim of DTW method is to search the minimum total length of the curved path.

Mining Traffic Flow Data Based on Fuzzy Clustering Method ChunChun Hu and XiaoHong Yan

F

245

Fourth International Workshop on Advanced Computational Intelligence Wuhan, Hubei, China; October 19-21, 2011

978-1-61284-375-9/11/$26.00 @2011 IEEE

The minimum total length of curved path can be calculated by dynamic programming method by using the formula (1). If the point (i, j) is located on the optimal path, then sub path from point (1, 1) to (i, j) is also a local optimal solution. The best path can be obtained by recursive search the local optimal solution between time starting point (1, 1) and the end of (m,n).

)}1,();1,1();,1(min{),(),( 111,1

��

�

jiSjiSjiScqdScqdS

ii

(1)

Real-time obtained road traffic flow information has spatial temporal properties, and related with each segment on road network. So similarity measurement of the traffic flow object also considered the topology relation of road network, and spatial similarity degree is higher between connective and reachable road sections. The connectivity and accessibility of road could be measured by the shortest path analysis on road network. And the dynamic shortest path length between two road sections was defined as the similarity measurement function.

For the graphic structure of G = {V, E} corresponding to road network, V represents a road node and E represents a road edge on network. And traffic flow sequence was set to Ti= {Ti1, Ti2, ... , Tin} (n was used to express the nth time period of traffic flow sequence) generated by each edge Ei = <vi0, vi1>.Spatial temporal similarity measurement function of the traffic flow is defined as follows:

),(),(),( 10 jijiji vvPathShorestTTDTWEETFSM �� (2) where DTW (Ti, Tj) express the traffic flow sequence similarity distance of two roads Ei and Ej, Shorest-Path(vi0, vj1)is spatial similarity distance between the beginning node of the road Ei and the ending node of the road Ej. TFSM (Ei, Ej) value is small, then the Ei and Ej more similar.

III. THE FUZZY CLUSTERING METHOD BASED ON SPATIAL TEMPORAL SIMILARITY MEASURE FUNCTION

A. The FCM Algorithm A common fuzzy clustering algorithm is the Fuzzy

C-Means (FCM), an extension of classical C Means algorithm for fuzzy applications [7]. It uses fuzzy techniques to cluster data. And in the algorithm, an object can be clustered to more than one cluster, which compatible with the status of real data. The FCM clustering algorithm has been widely used to obtain the fuzzy c-partition. It is a kind of fuzzy clustering algorithm based on object function. Given a dataset X={X1, X2,… ,Xn}with s dimension, the object of FCM is to partition dataset X into c homogeneous fuzzy clusters by minimizing the function Jm.

),()(1 1

2ij

c

i

n

j

mijm VXduJ ��

� �

� (3)

where c is the number of cluster, n is number of data and uij is the membership degree of data point Xj belonging to the fuzzy cluster Ci. Vi is the ith cluster centroid, m is weighting exponent and controls the fuzziness of membership of each datum [8]. The d2(Xj , Vi) represents the Euclidean distance between Xj and Vi.

The FCM solution is a mathematical planning problem, and the data set X can be divided into different categories c by minimizing the objective function. The limited condition of function Jm is that the sum of membership degree (uij ), which Xj belonging to each of cluster ci, equals to 1. It is described as follow.

1

1��

�

c

iiju nj ��1 10 �� iju (4)

The FCM algorithm is carried out in the following steps1) First: Initialize threshold ε and cluster centroidsV(0), set

k=0. 2) Second: Given a predefined number of cluster c, and a

chosen value of m. 3) Third: Compute matrix of the membership degree U(k)

=[ uij] for i=1,2,…,c using (5).

��

��

��

�c

i

mij

mij

ij

VXd

VXdu

1

)1

1(2

)1

1(2

)),((

)),(( (5)

4) Fourth: Update the fuzzy cluster centroid Vi(k+1) for i=1,2,…,c using (6).

� �

�

�

�� n

j

mij

n

jj

mij

iku

XkukV

1

1

)(

)(1 (6)

5) Fifth: If meet (7), then iteration halts; Otherwise return the third step.

�||)1()(|| �� kvkv (7) The FCM algorithm always converge a local maximum

value through above iteration calculation [2].

B. The New Fuzzy Clustering Algorithm In order to extract meaningful road traffic distribution

pattern, the formula (2) is exploited as a fuzzy similarity measure function, and the new objective function of FCM isdefined as follows:

� ��

�c

i

n

jji

mijm EETFSMuJ

1 1),()( (8)

where TFSM (Ei, Ej) is the similarity measure function of the traffic flow on road network. So new fuzzy clustering algorithm was described as follows:

1) Step1: To build the topology structure of the road network.

2) Step2: To randomly select nc road traffic flow sequence as the initialized nc cluster center from the road network.

3) Step3: To calculate the degree of membership matrix U(k)

according to (5). The specific steps were as follows: --First, calculate the minimum dynamic bending path

between each road segment and clustering center traffic flow sequence by DTW algorithm on the road network according to (1).

--Second, calculate the shortest path between the start node of each road segment and the end node of clustering center byDijkstra algorithm on the road network.

246

--Third, the compution of d2(Xj , Vi) in (4) is computed by (2).

4) Step 4: To adjust the fuzzy clustering center according to (6). For each cluster center, We find out road segments with the minimal dynamic curved path of traffic flow sequence compared with clustering center by DTW algorithm. And these road segments with traffic flow sequence will be new the fuzzy clustering center.

5) Step5: Repeat Step 3 and Step 4 until the maximum number of iterations tmax or (7) to meet .

IV. EXPERIMENTAL RESULTS

The experimental data set was certain city road network contains 3648 road sections, and these road sections were given with road traffic flow sequence. Traffic flow data were collected at 5 minutes interval. In the experiment, the traffic flow sequences with the traffic flow peak (between seven o'clock to nine thirty in the morning) among a day were exploited. And the traffic flow sequence of each road segment contained thirty traffic flow data. Table 1 described the traffic flow sequence data.

For obtaining the reliable experimental results, the parameters of FCM as fitness function are set to the weighting exponent m=2, which is common choice of the FCM algorithm, in the range of [1.5, 2.5] [10]. And the maximum number of iterations tmax is equal to 40 and iteration termination condition ε is equal to 0.0001. For the fuzzy clustering was an unsupervised classification method, we tookthe cluster number c = 3, 4, 5 to observe the results of the experiment.

The clustering result of traffic flow sequence was shown in figure 1 when the clustering number c= 3 on road network. In figure 1, three different road traffic flows were distinguished by three different colors. And the three colors representedthree different traffic flow sequence and the road network division with spatial distribution characteristics. Red road segments showed the traffic flow peak between seven o'clock and nine thirty in the morning. Figure 2 and figure 3 showed the clustering results of the road network when cluster number was equal to 4 and 5. Compared with the classification results

of c = 3, it could more reflected the traffic distribution pattern on the road network.

V. CONCLUSIONS Fuzzy cluster analysis is an important data analysis tool, it

Fig. 1. Road network partition when c=3

Fig. 2. Road network partition when c =4

Fig. 3. Road network partition when c =5

TABLE I TRAFFIC FLOW SEQUENCE DATA

Road segment

ID

Road segment starting

node

Road segment ending node

Traffic capacity Time

1000- 1001

1000 1001 64 2009-02-2407:00-07:05

1000- 1001

1000 1001 70 2009-02-2407:05-07:10

1000- 1001

1000 1001 80 2009-02-2407:10-07:15

… … … … …1098- 10491098- 10491098- 1049…

1098

1098

1098

…

1049

1049

1049

…

46.56

46.44

48.62

…

2009-02-2407:05-07:10 2009-02-2407:15-07:20 2009-02-2407:25-07:30 …

247

is a non-supervised classification method. Based on the fuzzy clustering algorithm ideas, considered the road network traffic flow distribution, the paper propose a new clustering method which can well measure the similarity between traffic flow objects, and further find potential traffic pattern from a large number of road network traffic flow sequences. Specially, the experiment result can provide traffic zone division with decision auxiliary support.

REFERENCES

[1] M. Halkidi, Y. Batistakis and M. Vazirgiannis, “On clustering validation techniques,” Journal of Intelligent Information Systems.2001, vol.17, pp.107-145, 2001.

[2] J. C. Beadek, Plenum Press, New York ,1981.

[3] J. T. Ren, Q.Xie, and J. Yin, “Traffic flow time series separation methods,” Computer Applications, vol.4, pp. 937-939, 2005.

[4] A. Hauser Trisha and T.Scherer William, “Data mining tools for real-time traffic signal decision support & maintenance,” in IEEE International Conference on Systems, Man, and Cybernetics, TucsonAZ USA, 2001, vol.3, pp. 1471-1477.

[5] C. Fanoutsos, M. Ranganathan, and Y. Manolopounos, “Fast subsequence matching in time-series database,” in 94 Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, New York USA , 1994, pp. 419-429.

[6] D. J. Berndt and J.Clifford, “ Finding patterns in time series: A dynamic programming approach,” In Advances in Knowledge Discovery and Data Mining, pp. 229-248, 1996.

[7] J. C. Bezdeck and R. Ehrlich and W. Full, “FCM:Fuzzy c-means algorithm,” Computers and Geoscience, vol.10, pp.191-203, 1984.

[8] D. W. Kim, K. H. Lee and D. Lee, “Fuzzy cluster validation index based on inter-cluster proximity,” Pattern Recognition Letters, vol. 24, pp. 2561-2574, 2003.

[9] E. Dijkstra, “A not e on two problems with graphs,” Numerische mathematik, vol.1, pp. 269-271, 1959.

[10] N. R. Pal and J.C.Bezdek, “On cluster validity for the fuzzy c-means model,” IEEE Transactions on Fuzzy Systems, vol.3, pp.370-379,1995.

[11] E. Keogh, S.Chu, D.Hart, and M. Pazzani, “An online algorithm for segmenting time series,” in First IEEE International Conference on Data Mining, 2001, pp.289-296.

[12] J. Han and M. Kamber, Data Mining: Concepts and Techniques. San Francisco: Morgan Kaufmann Publishers, 2001.

[13] B. Zheng, J. Chen, S. Xia, and Y. Jin, “Data analysis of vessel traffic flow using clustering algorithms,” in IEEE Computer Society,Washington USA, 2008, vol. 2, pp. 243-246.

[14] J. Deshpande, K. Dande, V. Deshpande, and A. Abhyankar,“ Parameterization of traffic flow using Sammon-fuzzy clustering,” in 2009 IEEE International Conference on Vehicular Electronics andSafety, 2009, pp.146-150.

248

Pattern recognition with fuzzy objective function algorithms.

[ieee 2011 fourth international workshop on advanced computational intelligence (iwaci) - wuhan,...

Documents