traffic incident detection using multiple-kernel support vector machine

44

Transportation Research Record: Journal of the Transportation Research Board, No. 2324, Transportation Research Board of the National Academies, Washington, D.C., 2012, pp. 44–52.DOI: 10.3141/2324-06

Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education, 800 Dongchuan Road, Minhang District, Shanghai, China. Corresponding author: J. Xiao, [email protected].

have been adapted to detect traffic incidents. Oh et al. developed a vehicle image processing system to detect traffic incidents with a vehicle tracking algorithm and traffic conflict technology (6). The inductive loop detector is the most commonly used sensor in traffic surveillance and management applications. Many advanced algo-rithms focus on detecting traffic incidents by using inductive loop detector data and have achieved good results. Jin and Ran presented a new automatic incident detection (AID) algorithm based on funda-mental diagrams to detect traffic incidents (7). Srinivasan et al. (8) and Cheu et al. (9) evaluated the incident detection performance of three promising neural network models—multilayer feed-forward neural network, basic probabilistic neural network, and constructive probabilistic neural network—and concluded that the last model had the highest potential in the freeway AID system.

Although artificial neural networks have achieved better perfor-mance than the classical AID algorithms, there is a defect limiting its wide application. The defect is that artificial neural networks can-not provide a clear explanation of the principle regarding how their parameters adjust, and it is difficult to obtain the optimal parameters of neural networks. Yuan and Cheu applied the standard support vector machine (SVM) to traffic incident detection for the first time (1). Their method is a single-kernel learning algorithm (10, 11) and improves the detection performance greatly. But the generalization performance of the SVM classifier depends strongly on the kernel function and parameters, and it is a challenging task to select appro-priate kernel function and parameters for the SVM algorithm. Until now, there has not been a structured way to choose them.

Chen et al. used the SVM ensemble method to detect traffic inci-dents (12). Their method not only avoids the burden of choosing appropriate kernel functions and parameters but also improves the average performance of different SVM classifiers. The defects of the SVM ensemble algorithm include two aspects: the training time for SVM ensemble is very long and the SVM ensemble is susceptible to draw the unstable individual SVM classifier into the ensemble. In order to solve these two problems, a new type of SVM to detect traffic incidents is adopted based on a multiple-kernel learning algorithm (13–16) and called the multiple-kernel learning SVM (MKL-SVM). In this paper, the multiple-kernel function is obtained by using a convex combination of basic kernel functions chosen randomly from the classical kernels with different parameters, such as Gaussian ker-nels, polynomial kernels, and sigmoid kernels. This multiple-kernel function is used to construct the MKL-SVM classifier and detect traffic incidents. In this way, the burden of choosing the appropriate kernel function and parameters for the SVM can be avoided. Some experiments have been performed to evaluate the performance of the standard SVM (1), SVM ensemble (12), and MKL-SVM (13). The experimental results show that the performance of the MKL-SVM is much better than that of the standard SVM and slightly better than the SVM ensemble. More important, the performance of the MKL-SVM is very stable.

Traffic Incident Detection Using Multiple-Kernel Support Vector Machine

Jianli Xiao and Yuncai Liu

This paper presents applications of the multiple-kernel learning support vector machine (MKL-SVM) in traffic incident detection. The standard SVM was applied in traffic incident detection and achieved good results. However, the results depended greatly on the kernel function and param-eters, and choosing the appropriate ones for the SVM was a procedure of trial and error. Unlike the SVM, the MKL-SVM used a convex com-bination of basic kernel functions instead of a single basic kernel func-tion to construct an adaptive SVM model. This adaptive model could improve average performance in traffic incident detection just by ran-domly selecting the kernel function and parameters. As a result, the MKL-SVM avoided the burden of choosing the appropriate kernel func-tion and parameters. The SVM ensemble algorithm trained many indi-vidual SVM classifiers to construct the classifier ensemble and then used this classifier ensemble to detect traffic incidents. Consequently, train-ing occurred many times. Compared with the SVM ensemble algorithm, the training time cost of the MKL-SVM was much lower because training occurred only once. Extensive experiments were performed to evaluate the performance of three algorithms: standard SVM, SVM ensemble, and MKL-SVM. The experimental results showed that the performance of the MKL-SVM was much better than that of the standard SVM and slightly better than the SVM ensemble. More important, the performance of the MKL-SVM was stable.

Traffic incident detection is an important activity of freeway traffic monitoring and control. Traffic incidents are defined as nonrecurring events such as accidents, disabled vehicles, spilled loads, temporary maintenance and construction activities, signal and detector mal-functions, and other special and unusual events that disrupt the normal flow of traffic and cause motorist delay (1). If the incident cannot be handled in a timely fashion, it will increase traffic delay, reduce road capacity, and often cause second traffic accidents. Timely detection of incidents is critical to successful implementation of an incident management system for freeways.

In recent years, more advanced algorithms have been successfully applied to traffic incident detection. Compared with those classical algorithms—such as the California algorithm (2), Bayesian algorithm (3), autoregressive integrated moving average algorithm (4, 5) and so on—these advanced algorithms have a lower misclassification rate, higher correct detection rate, lower false alarm rate, and slightly faster detection time. Some techniques based on video processing

Xiao and Liu 45

MKL-SVM FraMeworK

Because the MKL-SVM has a close relationship with the standard SVM, the standard SVM is introduced first according to the work by Vapnik (10) and Zhang et al. (11).

Foundation of Standard SVM

The SVM proposed by Vapnik can solve not only the linear classi-fication problem but also the nonlinear classification problem (10). First, the SVM extracts training vectors that lie closest to the class boundary and that are called support vectors. Then it uses these vectors to construct a decision boundary that can separate the data optimally. In this subsection, a brief introduction to the SVM according to linear and nonlinear classification is given (11).

Linear Classification Problem

For the linear classification problem, a set of linear separable training samples is given:

S x y x y x y x yi i l l= ( ) ( ) ( ) ( ){ }1 1 2 2, , , , . . . , , , . . . , , (( )1

where xi ∈ Rn; Rn is the n dimensional sample space; i = 1, 2, . . . , l; yi ∈ {−1,1} is the class label of xi; and l is the number of training samples.

The general form of the linear classification function is g(x) = w • x + b, which corresponds to a separating hyperplane w • x + b = 0, where w is the normal vector of the hyperplane and b is a variable. Since g(x) can be normalized to satisfy |g(x)| ≥ 1 for all xi, the distance from the closest point to the hyperplane is 1/iwi. Among the separat-ing hyperplanes, the optimal separating hyperplane is defined, which is the one for which the distance to the closest point is maximal. As mentioned earlier, the closest point to the hyperplane is 1/iwi. Find-ing the optimal separating hyperplane amounts to maximize 1/iwi is equivalent to minimizing iwi. This optimal problem is

min ( ),w b

w wφ( ) = 1

22

2

subject to

y w x b i li ii +( ) ≥ =1 1 2 3, , . . . , ( )

where ϕ(w) is an object function of w. A Lagrange function is constructed to deduce the dual problem of Equations 2 and 3:

L w b w y w x bi i ii

l

, , ( )α α( ) = − ( ) +( ) −( )=∑1

21 4

2

1

i

and the dual problem is

min ( )α

α α1

25

111

y y x x ai j i j i j ii

l

j

l

i

l

i( ) −===∑∑∑

subject to

yi ii

l

α ==∑ 0 6

1

( )

α i i l≥ =0 1 2 7, , . . . , ( )

where α = (α1, . . . , αi, . . . αl)T is the Lagrange multiplier vector and T is the transposition of the matrix. The sequential minimal optimi-zation algorithm (17) can be used to solve the constrained quadratic programming problem of Equations 5 through 7 and get the optimal solution as follows:

α α α* *, . . . , * ( )= ( )T

l18

w y xi i ii

l

* * ( )==∑α

1

9

b y y x xj i i i ji

l

* * ( )= − ( )=∑α i

1

10

With α* and b*, the decision function is obtained:

f x y x x bi i ii

l

( ) = ( ) +

=

∑sign α* * ( )i

1

11

Nonlinear Classification Problem

When the data are not linearly separable, the SVM modifies its object function as follows by introducing slack variables and a penalty factor:

min ( ), ,w b

ii

l

w w Cξ

φ ξ( ) = +=∑1

212

2

1

where C is a penalty factor by manual setting, ξ = (ξ1, . . . , ξl)T is a slack vector, and T is the transposition of the matrix. However, the input data can be mapped into a high-dimensional feature space by using the kernel function. The optimal separating hyperplane is con-structed in the high-dimensional feature space, and the dot produc-tions xi • xj and xi • x are changed to K(xi, xj) = (ϕ(xi) • ϕ(xj)) and K(xi, x) = (ϕ(xi) • ϕ(x)), where K is a kernel function. Following the foregoing rule and using Equation 5, the dual problem and its optimal solution in the nonlinear situation can be obtained. The dual problem is

min (α

α α α1

213

111

y y K x xi j i j i j ii

l

j

l

i

l

i( ) −===∑∑∑ ))

subject to

yi ii

l

α ==∑ 0 14

1

( )

0 1 2 15≤ ≤ =α i C i l, , . . . , ( )

The methods for solving the standard SVM are various; the most commonly used method is the sequential minimal optimization algorithm (17), and the optimal solution is

α α α* *, . . . , * ( )= ( )T

l116

w y xi i ii

l

* * ( )= ( )=∑α Φ

1

17

46 Transportation Research Record 2324

b y y K x xj i i i ji

l

* * , ( )= − ( )=∑α

1

18

where Φ is a projection function, which projects a sample from a low-dimensional space into a high-dimensional space.

With α* and b*, the decision function in this situation can be obtained:

f x y K x x bi i ii

l

( ) = ( ) +

=

∑sign α* , * ( )1

19

extension of Standard SVM to MKL-SVM

For an MKL-SVM framework based on the simple MKL algorithm (13), a convenient approach is to consider that the kernel K(x, x′) is actually a convex combination of basic kernels:

K x x d K x x d dm m mm

M

mm

M

, ,

(

′( ) = ′( ) ≥ == =

∑ ∑with 0 1

21 1

00)

where M is the total number of kernels and dm is a weight for the kernel Km. Each basic kernel Km may use either the full set of vari-ables describing x or subsets of variables stemming from different data sources (18). In addition, the kernels Km can be chosen from the classical kernels with different parameters, such as Gaussian kernels, polynomial kernels, and sigmoid kernels. Within this framework, the problem of data representation through the kernel is then transferred to the choice of weights dm (13).

Plugging Equation 20 into Equation 13 and transforming the min-imization problem into the maximization problem, the associated dual problem of MKL-SVM is derived as follows:

max ,α

α α α− ( ) +===∑∑∑1

2 11

y y d K x xi j i j m m i j ii

l

mj

l

i 11

21l

∑ ( )

subject to

yi ii

l

α ==∑ 0 22

1

( )

0 1 2 23≤ ≤ =α i C i l, , . . . , ( )

where d = (d1, d2, . . . , dm, . . . , dM)T is a weight vector.After the dual problem is obtained, the next key problem is how

to solve the optimal problem (Equations 21 through 23). If the optimal d has been obtained, which is d*, then d* is plugged into Equation 21, and the MKL-SVM problem is changed into the stan-dard SVM problem. In this situation, the algorithm for solving the standard SVM problem can be used to solve the current problem. The procedure for solving the MKL-SVM optimal problem according to that idea is described next.

Let function J(d) be the optimal objective value of the primal problem of MKL-SVM. Because of strong duality, J(d) is also the objective value of the dual problem:

J d y y d K x xi j i j m m i j iimi j

( ) = − ( ) + ∑∑∑1

22α α α* * , * (

,

44)

Now the reduced gradient algorithm (13) is taken to compute weight vector d. By simple differentiation of the dual function J(d) with respect to dm,

∂∂

= − ( ) ∀∑J

dy y K x x m

mi j i j m i j

i j

1

225α α* * , ( )

,

For u, the index of the largest component of vector d, the differentiation of J(d) with respect to du is

∂∂

= − ( ) ∀∑J

da a y y K x x u

ui j i j u i j

i j

1

226* * , ( )

,

Let D = (D1, D2, . . . , Dm, . . . , DM)T be the reduced gradient of J(d). Dm is computed with

D

dJ

d

J

d

J

d

J

ddm

mm u

m um=

= ∂∂

− ∂∂

>

− ∂∂

+ ∂∂

>

0 0 0if and

if 00

0

and

for

m u

J

d

J

dm u

v uv u dv

≠

∂∂

− ∂∂

=

≠ >∑

,

( )27

where v is the index of a component of vector d, where dv > 0 and v is not the index of the largest component of d.

After the reduced gradient D has been obtained, the optimal d according to the simple MKL algorithm can be obtained (13). Then to transform the MKL problem into a standard SVM problem,

K d Km mm

= ∑

In this situation, the optimal solution can be obtained with an SVM solver like the sequential minimal optimization algorithm (17).

PerForMance criteria For aiD

The performance of incident detection algorithms is often evaluated by the following statistical metrics, namely, detection rate (DR), false alarm rate (FAR), mean time to detection (MTTD), and classification rate (CR). The definitions are from other studies (5, 12, 19).

DR is defined as the ratio of the number of detected incidents to the total number of incidents, which denotes the accuracy for detecting incident cases:

DRnumber of incident cases detected

total number=

oof incident cases× 100 28% ( )

FAR is defined as the fraction of false alarm cases to the total number of nonincident instances. The smaller the FAR, the better the performance:

FARnumber of false alarm cases

total number of non=

iincident cases× 100 29% ( )

Time to detection is the difference between the time the incident was detected and the actual time the incident occurred; MTTD is the

Xiao and Liu 47

average time to detection of an incident over m incidents successfully detected (12):

MTTD = + + + + +t t t t

mi m1 2 30

. . . . . .( )

where ti is the length of time to detect the ith incident case and m is the number of incident cases detected. The smaller the MTTD, the better the performance.

CR is defined as the proportion of instances that were correctly classified based on total instances in the data set (12):

CRnumber of instances correctly classified

tota=

ll number of instances× 100 31% ( )

The next criterion used in this study is the receiver operating char-acteristic (ROC) curve (20), which is useful for organizing classifiers and visualizing their performance. The area under the ROC curve, called the AUC, has been employed as a measurement of AID algorithms (19). The AUC can be obtained by the trapezoidal rule; that is, summing the areas of trapezoids under the ROC curve. The AUC value represents the quality of the classifier’s performance. The larger the AUC, the better the performance, and the maximal value of AUC is 1. The ROC criterion has the advantages that it is not necessary to make any assumption on the prior probability of the class distribution or to specify the misclassification costs.

Besides that, another criterion, the performance index (PI), is obtained from the work by Chen et al. (12). It combines the criteria DR, FAR, MTTD, and CR, and can evaluate the classifier’s performance more comprehensively. PI can be computed as follows:

PI DR FARMTTD

THDDR FAR MTTDMTTD

CR= −( ) + + +w w w w1 i i 11

32

−( )CR

( )

where wDR, wFAR, wMTTD, and wCR are the weights of DR, FAR, MTTD, and CR, respectively, and THDMTTD is the threshold of MTTD. In the experiments described here, wDR = wFAR = wMTTD = 1/3, wCR = 0, and THDMTTD = 10, as used by Chen et al. (12), and the method to select these weights and the threshold are also from the work by Chen et al. (12).

exPeriMentS on traFFic inciDent Detection

In this section, two groups of experiments are described: the first group is based on an I-880 data set (21), and the second data set is based on raw data from the California Department of Transportation Performance Management System (PeMS), which is available from the website (http://pems.dot.ca.gov). The PeMS data set includes some noisy data, and it is used to test the performance stability of the algorithms. Both of the groups of experiments are adopted to evaluate the performance of three algorithms: standard SVM (1), SVM ensemble (12), and MKL-SVM (13). Evaluation indicators include DR, FAR, MTTD, CR, AUC, and PI. Compared with the other four indicators, AUC and PI can evaluate the performance comprehensively (12).

In order to demonstrate that the MKL-SVM algorithm has very sta-ble performance, the kernel functions and parameters for the algorithm are randomly selected. Meanwhile, in order to prove that the general performance of the MKL-SVM algorithm is better than the optimized

performance of the standard SVM algorithm and the SVM ensemble algorithm, appropriate kernel functions and parameters are selected for the standard SVM and SVM ensemble algorithms.

In this study, each sample is a six-dimensional row vector that includes the parameters of the upstream and downstream volume, speed, and occupancy. AID models are constructed to describe the pattern of the variation of the upstream and downstream parameters and then these models are used to detect the traffic incidents.

experiment Parameters and Procedures

Parameters of Experiments

Some parameters are adopted to make the procedures of the experi-ments more automatic and optimized. In addition, some symbols are used to denote specific concepts. For clarity, the definition of each parameter and symbol is given in Table 1.

Procedure for Experiments

The experiments are performed according to the following procedure:

Step 1. Divide the whole data set into a training set S and a test set T. The whole data set is taken as the test set T, the parameter nS is used to control the size of the training set S, and S is obtained by taking out nS samples from the front to the back of T.

Step 2. Do sampling with replacement from training set S for n times, and then get the training subsets S1, S2, . . . , Sn. The parameter r is used to control the ratio of the number of samples in the training subset to the number of samples in the training set; the number of samples in the training subset is nsub = nS • r. The range of r is (0, 1).

Step 3. While j = 1, 2, . . . , n,– Use the training subset Sj to train the jth individual SVM

classifier fsj,– Use the training subset Sj to train the jth MKL-SVM clas-

sifier fj, and– Use all the j existing individual SVM classifiers to construct

the jth SVM ensemble classifier esj.

TABLE 1 Definitions of Symbols and Parameters Used in Experiments

Symbol or Parameter Definition

Symbol T Test set S Training set Sj jth training subset (namely, jth subset of training set S ) fsj jth individual SVM classifier esj jth ensemble SVM classifier fj jth individual MKL-SVM classifier

Parameter n Total number of training subsets nT Number of samples in test set T nS Number of samples in training set S nsub Number of samples in training subset (each training subset

has same number of samples) r Ratio of nsub to nS

nall Total number of samples in whole data set ninc Number of incident samples in whole data set nnoninc Number of nonincident samples in whole data set


Step 4. While j = 1, 2, . . . , n,– Test the performance of the jth individual SVM classifier fsj

on test set T,– Test the performance of the jth MKL-SVM classifier fj on

test set T, and– Test the performance of the jth SVM ensemble classifier esj

on test set T.

experiments on i-880 Data Set without noisy Data

Data Description

For traffic incident detection, the most well-known database is I-880, which was collected by Petty et al. (21) at the I-880 freeway in the San Francisco Bay Area, California. The loop detector data (some with incidents and some without) were collected from a 14.8-km segment of I-880 between the Marina and Wipple exits in both directions. There were 18 loop detector stations in the northbound direction and 17 stations in the southbound direction (1). The loop detector data were collected at 30-s intervals in the form of lane-specific volume, speed, and occupancy. Generally, most researchers use the average values of volume, speed, and occupancy computed from all lanes at a station to construct AID models (1, 8, 9, 12). There were 45 incident cases in the data collected between February 16 and March 19, 1993, and between July 27 and October 29, 1993.

Parameter Setting of Experiments

To divide the whole I-880 data set properly into test set, training set and training subset, some parameters need to be set, as shown in Table 2.

Experimental Results

In these experiments, 20 individual SVM classifiers, 20 SVM ensemble classifiers, and 20 MKL-SVM classifiers were set. The

performance of each classifier on the I-880 data set was tested. Then the averages and variances of the performance of the 20 individual SVM classifiers, 20 SVM ensemble classifiers, and 20 MKL-SVM classifiers were calculated. These results are summarized in Table 3. In Table 3, the results are presented in the form average ± variance, and the best results are highlighted in bold. To make a visual comparison of the performance of all classifiers, they are plotted in Figure 1. In each part of Figure 1, for the standard SVM algorithm and the MKL-SVM algorithm, the X-axis denotes the jth SVM or MKL-SVM classifier (where j is the current value of the X-axis) and for the SVM ensemble algorithm, the X-axis denotes the jth SVM ensemble classi-fier (it also denotes the number of members in the ensemble, namely, the jth SVM ensemble is constructed by j individual SVM classifiers).

experiments on PeMS Data Set with noisy Data

Data Description

PeMS is a real-time archived data management system for trans-portation data. It collects raw detector data from California freeways in real time. Six days of raw data collected in October 1993 from I-880 were downloaded. It was found that in a whole day the raw data always miss some samples and there are noisy data in the raw data. So data interpolation was done to complete the raw data and improve its quality. The loop detector data were collected at 30-s intervals in the form of lane-specific volume, speed, and occupancy. The average values of volume, speed, and occupancy were calculated from all the lanes at a station. Finally, a PeMS data set was produced that includes the latest loop detector data. There are 8,840 samples in the PeMS data set, including 1,640 incident samples and 7,200 non-incident samples. The noisy data in the data set can help to test the performance stability of the algorithms.

Parameter Setting of Experiments

Since a new data set was used to perform the experiments, the param-eter values need to be updated. The updated parameter values can be seen in Table 2.

TABLE 2 Parameter Setting of Experiments

Parameter Setting n nT nS nsub r nall ninc nnoninc

Experiments for I-880 data set 20 50,000 28,850 577 0.02 50,000 4,136 45,864

Experiments for PeMS data set 20 8,840 5,000 500 0.1 8,840 1,640 7,200

TABLE 3 Experimental Results of All Three Algorithms with I-880 Data Set

Algorithm DR (%) FAR (%) MTTD (min) CR (%) AUC (%) PI

Standard SVM 77.06 4.05 3.22 94.39 95.07 0.197±12.28 ±2.25 ±1.55 ±1.76 ±2.43 ±0.087

SVM ensemble 84.63 3.68 2.24 95.35 95.61 0.138±2.36 60.20 ±0.40 60.18 ±0.94 ±0.021

MKL-SVM 86.12 3.95 1.91 95.23 95.89 0.12361.63 ±0.39 60.24 ±0.30 61.11 60.010

Note: Performance is presented as average ± variance. Best results are in bold.

0 5 10 15 20 0.4

0.5

0.6

0.7

0.8

0.9

1

the j-th individual SVM (or MKL-SVM) classifier

DR

standard SVM

SVM ensemble

MKL-SVM

(a)

0 5 10 15 20 0

0.02

0.04

0.06

0.08

0.1

0.12


FA

R

standard SVM

SVM ensemble

MKL-SVM

(b)

0 5 10 15 20 0

1

2

3

4

5

6

7

8


MT

TD

standard SVM

SVM ensemble

MKL-SVM

(c)

0 5 10 15 20 0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1


CR

standard SVM

SVM ensemble

MKL-SVM

(d)

0 5 10 15 20 0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1


AU

C

standard SVM

SVM ensemble

MKL-SVM

(e) (f)

0 5 10 15 20 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5


PI

standard SVM

SVM ensemble

MKL-SVM

FIGURE 1 Experimental results of standard SVM, SVM ensemble, and MKL-SVM on I-880 data set: performance on (a) DR, (b) FAR, (c) MTTD, (d) CR, (e) AUC, and (f ) PI.


Experimental Results

As mentioned earlier, the raw data of the PeMS data set always miss some samples and include noisy data, which seriously reduces the quality of the PeMS data set. So the experimental results from the PeMS data set are much worse than the experimental results from the I-880 data set on the whole. The experimental results from the PeMS data set are summarized in Figure 2 and Table 4. In each part of Figure 2, for the standard SVM algorithm and the MKL-SVM algorithm, the X-axis denotes the jth SVM or MKL-SVM classifier (where j is the current value of the X-axis). For the SVM ensem-ble algorithm, the X-axis denotes the jth SVM ensemble classifier (it also denotes the number of members in the ensemble; namely, the jth SVM ensemble is constructed by j individual SVM classifiers). In Table 4, for each algorithm of the standard SVM, SVM ensemble, and MKL-SVM, the averages and variances are calculated for the performance of the total 20 individual classifiers or ensemble clas-sifiers. The results are presented with the form average ± variance, and the best results are highlighted in bold.

Performance evaluation

The performance of all three algorithms—standard SVM, SVM ensemble, and MKL-SVM—in dealing with data set I-880 without noisy data and data set PeMS with noisy data was evaluated. In Figures 1 and 2, a through f evaluate the performance of standard SVM, SVM ensemble, and MKL-SVM on the indicators DR, FAR, MTTD, CR, AUC, and PI, respectively.

I-880 Data Set Without Noisy Data

From Figure 1, it can be seen that the performance of the MKL-SVM is very stable and significantly better than that of the standard SVM, the performance of which fluctuates widely, which shows that the performance is very unstable. The reason for this fluctuation is that the results of the standard SVM algorithm depend strongly on the appropriate kernel function and parameters. Figure 1a indicates that when the kernel function and parameters are chosen appropri-ately, the DR is up to 90% or even higher. In contrast, the DR is down to 50% or even lower. An appropriate kernel function and parameters need to be selected to make the standard SVM classifier obtain optimal performance. But the procedures for choosing an appropriate kernel function and parameters are filled with trial and error. Until now, there has not been a structured way to choose them.

From Figure 1, it can also be seen that the performance of the MKL-SVM algorithm and that of the SVM ensemble algorithm on each indicator are very close to each other. As to the indicators DR, MTTD, AUC, and PI, the performance of the MKL-SVM algorithm is better than that of the SVM ensemble algorithm. As mentioned earlier, AUC and PI can evaluate the performance more comprehen-sively than the other four indicators. To make an overall evaluation, the performance of MKL-SVM is slightly better than that of the SVM ensemble.

In Table 3, the average value of DR for the MKL-SVM is 86.12%, and the average value of DR of the standard SVM is 77.06%. This finding indicates that the MKL-SVM algorithm is more sensitive to traffic incidents and can detect more traffic incidents compared with the standard SVM algorithm. The average values of MTTD for the MKL-SVM and standard SVM are 1.91 min and 3.22 min, respectively, in Table 3, which shows that the MKL-SVM algorithm can detect the incidents more quickly than the standard SVM can.

PeMS Data Set with Noisy Data

From Figure 2, it can be seen that when the data set includes noisy data, the performance of the standard SVM algorithm fluctuates widely and the performance of the MKL-SVM is still stable and significantly better than that of the standard SVM. Also it is found that the performance of the MKL-SVM algorithm is much better than that of the SVM ensemble algorithm on indicators DR, MTTD, AUC, and PI. Obviously, the performance of the MKL-SVM algo-rithm is more stable than that of the SVM ensemble algorithm in Figure 2. The reason for this performance is that the SVM ensemble is susceptible to draw the unstable individual SVM classifiers into the ensemble when the data set has noisy data; this attribute makes the final output of the SVM ensemble classifier worse.

A comparison of Figures 1 and 2 finds that the performance of all three algorithms is less in Figure 2, and the performance of the MKL-SVM algorithm reduced the least, which indicates that the MKL-SVM algorithm has the best ability to tolerate noisy data among the three algorithms.

In Table 4 the average DR value of the standard SVM algorithm is 42.97% and the average DR value of the SVM ensemble algorithm is 36.63%. They are both less than 50%, which is unacceptable. This result indicates that if the average accuracy of the individual SVM classifiers is lower than 50%, the average accuracy of the ensemble classifiers, which are constructed by using these individual classifiers, may become even lower than the average accuracy of the individual SVM classifiers. So drawing the unstable individual SVM classifiers into the ensemble should be avoided.

The standard SVM and the MKL-SVM are both individual clas-sifiers; they only need to train one time. Differing from the standard SVM and MKL-SVM, the SVM ensemble needs to train many indi-vidual SVM classifiers to construct the SVM ensemble. The training time of the SVM ensemble is very long. From Figure 1, a through f, it can be seen that in order to obtain relatively better performance, the SVM ensemble needs about 13 individual SVM classifiers to construct the SVM ensemble, that is, the SVM ensemble algorithm needs to train 13 times. Compared with the SVM ensemble, the MKL-SVM algorithm saves much time.

concLuSionS

The MKL-SVM is a new type of SVM based on multiple-kernel learn-ing. Differing from the SVM, the MKL-SVM uses a convex combina-tion of basic kernel functions instead of a single basic kernel function to construct the SVM model, and it avoids the burden of choosing the appropriate kernel function and parameters. In this research, the traffic incident detection problem is taken as a binary classification problem based on the inductive loop detector data and the MKL-SVM algo-rithm is used to divide the traffic patterns into two groups: the incident traffic pattern and the incident-free traffic pattern.

In this research, two groups of experiments were performed to evaluate the performance of the three algorithms: the standard SVM (1), SVM ensemble (12), and MKL-SVM (13). In the first group of experiments, all three algorithms are used to deal with the I-880 data set without noisy data. The results show that the performance of the MKL-SVM is significantly better than that of the standard SVM and slightly better than that of the SVM ensemble. More important, the performance of the MKL-SVM is very stable.

In order to further test the stability of the three algorithms, they were applied on the PeMS data set with noisy data in the second

0 5 10 15 20 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


DR

standard SVM

SVM ensemble

MKL-SVM

(a)

0 5 10 15 20 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8


FA

R

standard SVM

SVM ensemble

MKL-SVM

(b)

0 5 10 15 20 0

1

2

3

4

5

6

7

8


MT

TD

standard SVM

SVM ensemble

MKL-SVM

(c)

0 5 10 15 20 0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


CR

standard SVM

SVM ensemble

MKL-SVM

(d)

0 5 10 15 20 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


PI

standard SVM

SVM ensemble

MKL-SVM

(f) (e)

0 5 10 15 20 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


AU

C

standard SVM

SVM ensemble

MKL-SVM

FIGURE 2 Experimental results of standard SVM, SVM ensemble, and MKL-SVM on PeMS data set: performance on (a) DR, (b) FAR, (c) MTTD, (d) CR, (e) AUC, and (f) PI.


group of experiments. The experimental results indicate that the MKL-SVM has the best ability to tolerate the noisy data among the three algorithms. By analyzing the experimental results, it is found that if the average accuracy of the individual SVM classifiers is lower than 50%, the average accuracy of the ensemble classifiers constructed by these individual classifiers will become even lower. In order to obtain good results for the SVM ensemble classifier, drawing the unstable individual SVM classifiers into the ensemble should be avoided.

The MKL-SVM is an individual classifier that needs to train only one time, whereas the SVM ensemble needs to train many individual SVM classifiers to construct the SVM ensemble. As a result, compared with the MKL-SVM algorithm, the MKL-SVM algorithm reduces the time cost.

The contribution of this study is that it presents the development of a freeway incident detection model based on the MKL-SVM. That algorithm has not only improved the performance of traffic incident detection but has also enhanced the stability of the per-formance. The MKL-SVM algorithm can be successfully utilized in traffic incident detection and the other classification problems. As to future work, the authors will concentrate on constructing the MKL-SVM ensemble to detect traffic incidents.

acKnowLeDgMentS

This work was supported by the National Science Foundation of China, China National 973 Program, and the Science and Technology Commission of Shanghai Municipality Program. The authors also thank Shuyan Chen, Ruey Long Cheu, and Alain Rakotomamonjy for help in the study and their colleagues Xiong Li and Chenhao Wang for their useful suggestions.

reFerenceS

1. Yuan, F., and R. Cheu. Incident Detection Using Support Vector Machines. Transportation Research Part C, Vol. 11, 2003, pp. 309–328.

2. Payne, H. J., and S. C. Tignor. Freeway Incident-Detection Algorithms Based on Decision Trees with States. In Transportation Research Record 682, TRB, National Research Council, Washington, D.C., 1978, pp. 30–37.

3. Levin, M., and G. M. Krause. Incident Detection: A Bayesian Approach. In Transportation Research Record 682, TRB, National Research Council, Washington, D.C., 1978, pp. 52–58.

4. Ahmed, M. S., and A. R. Cook. Time Series Models for Freeway Incident Detection. Journal of Transportation Engineering, ASCE, Vol. 106, No. 6, 1982, pp. 731–745.

5. Parkany, E., and C. Xie. A Complete Review of Incident Detection Algorithms and Their Deployment: What Works and What Doesn’t.

Technical Report. New England Transportation Consortium, Storrs, Conn., Feb. 2005.

6. Oh, J., J.-Y. Min, M. Kim, and H. Cho. Development of an Automatic Traf-fic Conflict Detection System Based on Image Tracking Technology. In Transportation Research Record: Journal of the Transportation Research Board, No. 2129, Transportation Research Board of the National Academies, Washington, D.C., 2009, pp. 45–54.

7. Jin, J., and B. Ran. Automatic Freeway Incident Detection Based on Fundamental Diagrams of Traffic Flow. In Transportation Research Record: Journal of the Transportation Research Board, No. 2099, Trans-portation Research Board of the National Academies, Washington, D.C., 2009, pp. 65–75.

8. Srinivasan, D., X. Jin, and R. Cheu. Adaptive Neural Network Models for Automatic Incident Detection on Freeways. Neurocomputing, Vol. 64, 2005, pp. 473–496.

9. Cheu, R. L., D. Srinivasan, and W. H. Loo. Training Neural Networks to Detect Freeway Incidents by Using Particle Swarm Optimization. In Transportation Research Record: Journal of the Transportation Research Board, No. 1867, Transportation Research Board of the National Academies, Washington, D.C., 2004, pp. 11–18.

10. Vapnik, V. The Nature of Statistical Learning Theory. Springer, Berlin, 1995.

11. Zhang, L., F. Lin, and B. Zhang. Support Vector Machine Learning for Image Retrieval. Proc., International Conference on Image Processing, Vol. 2, 2001, pp. 721–724.

12. Chen, S., W. Wang, and H. Van Zuylen. Construct Support Vector Machine Ensemble to Detect Traffic Incident. Expert Systems with Applications, Vol. 36, 2009, pp. 10976–10986.

13. Rakotomamonjy, A., F. Bach, S. Canu, and Y. Grandvalet. SimpleMKL. Journal of Machine Learning Research, Vol. 9, 2008, pp. 2491–2521.

14. Bach, F. Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning. Arxiv preprint, arXiv:0809.1493, 2008.

15. Varma, M., and B. Babu. More Generality in Efficient Multiple Kernel Learning. Proc., 26th Annual International Conference on Machine Learning, Association for Computing Machinery, New York, 2009, pp. 1065–1072.

16. Nickisch, H., and M. Seeger. Multiple Kernel Learning: A Unifying Probabilistic Viewpoint. Arxiv preprint, arXiv:1103.0897, 2011.

17. Platt, J. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. In Advances in Kernel Methods: Support Vector Learning, MIT Press, 1998, pp. 98–112.

18. Lanckriet, G., T. De Bie, N. Cristianini, M. Jordan, and W. Noble. A Statistical Framework for Genomic Data Fusion. Bioinformatics, Vol. 20, 2004, p. 2626.

19. Wang, W., S. Chen, and G. Qu. Incident Detection Algorithm Based on Partial Least Squares Regression. Transportation Research Part C, Vol. 16, 2008, pp. 54–70.

20. Fawcett, T. An Introduction to Roc Analysis. Pattern Recognition Letters, Vol. 27, 2006, pp. 861–874.

21. Petty, K., H. Noeimi, K. Sanwal, D. Rydzewski, A. Skabardonis, P. Varaiya, and H. Al-Deek. The Freeway Service Patrol Evaluation Project Database Support Programs and Accessibility. Transportation Research Part C, Vol. 4, 1996, pp. 71–85.

The Intelligent Transportation Systems Committee peer-reviewed this paper.

TABLE 4 Experimental Results of All Three Algorithms with PeMS Data Set

Algorithm DR (%) FAR (%) MTTD (min) CR (%) AUC (%) PI

Standard SVM 42.97 16.16 1.08 76.26 73.33 0.280±25.89 ±9.25 ±1.28 ±6.49 ±11.94 ±0.099

SVM ensemble 36.63 7.83 1.55 81.87 61.97 0.289±7.29 60.96 ±1.44 60.84 ±9.12 ±0.061

MKL-SVM 63.56 14.33 0.61 81.57 79.85 0.19068.89 ±3.33 60.22 ±1.48 61.34 60.025

Note: Performance is presented as average ± variance. Best results are in bold.

traffic incident detection using multiple-kernel support vector machine

Documents