research article traffic speed data imputation method based...

10
Research Article Traffic Speed Data Imputation Method Based on Tensor Completion Bin Ran, 1 Huachun Tan, 2 Jianshuai Feng, 2 Ying Liu, 3 and Wuhong Wang 2 1 Department of Civil and Environmental Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA 2 Department of Transportation Engineering, Beijing Institute of Technology, Beijing 100081, China 3 College of Computer and Information Engineering, Beijing Technology and Business University, Beijng 100048, China Correspondence should be addressed to Huachun Tan; [email protected] Received 28 October 2014; Accepted 4 January 2015 Academic Editor: J. Alfredo Hernandez Copyright © 2015 Bin Ran et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Traffic speed data plays a key role in Intelligent Transportation Systems (ITS); however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS). In this paper, we handle this issue by a novel tensor- based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor Completion (HaLRTC), an efficient tensor completion method, is employed to estimate the missing traffic speed data. is proposed method is able to recover missing entries from given entries, which may be noisy, considering severe fluctuation of traffic speed data compared with traffic volume. e proposed method is evaluated on Performance Measurement System (PeMS) database, and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches. 1. Introduction e large amounts of traffic data collected from the traffic sensors are extremely valuable for route guidance, planning, and management of Intelligent Transportation Systems (ITS) [1]. e data, which include traffic speed, volume, and occu- pancy, are collected via various traffic collecting devices and technologies. Traffic speed data is one of the most important information sources for ITS, Advanced Traveler Information Systems (ATIS), and Advanced Traffic Management Systems (ATMS) [1]. As one of important parameters of traffic data, traffic speed data play a prominent role in the traffic domain and convey more information on traffic state, such as traffic congestion, than traffic volume data. For example, traffic speed data are used for computing the traffic congestion index in Beijing. Moreover, the traffic speed data are used for the purpose of traffic guidance [2]. Despite the importance of traffic speed information, unfortunately, substantial missing data are usually induced due to various malfunctions in data collection and/or record systems, such as failed loop detectors, failed loop amplifiers, and failed signal communication and processing devices. However, most of traffic data analysis methods used in intelligent transportation systems require the completeness of data and thus missing traffic speed data will severely degrade the performance of ITS, such as the accuracy of travel time estimation. Besides improving the reliability of traffic data collection and record systems, the research on the problem of missing data in intelligent transportation systems has been aroused due to extensive concern by traffic engineers and researchers. In the past decade, various imputation methods have been proposed for solving the missing traffic data problem. Among these methods, most of them focus on traffic volume data imputation, and only a few researches try to deal with missing traffic speed data estimation. Since traffic speed data have some similar characteristics with traffic volume data, such as temporal and spatial correlations, some imputing methods for missing traffic volume data can be applied directly to traffic speed data. However, compared with traffic volume data imputation, there are many special characteris- tics of speed data. One major observation for traffic speed data imputation is that traffic speed data fluctuate more severely compared with traffic volume data, which poses significant challenge to traditional methods. Hindawi Publishing Corporation Computational Intelligence and Neuroscience Volume 2015, Article ID 364089, 9 pages http://dx.doi.org/10.1155/2015/364089

Upload: others

Post on 15-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Traffic Speed Data Imputation Method Based ...downloads.hindawi.com/journals/cin/2015/364089.pdfResearch Article Traffic Speed Data Imputation Method Based on Tensor

Research ArticleTraffic Speed Data Imputation Method Based onTensor Completion

Bin Ran1 Huachun Tan2 Jianshuai Feng2 Ying Liu3 and Wuhong Wang2

1Department of Civil and Environmental Engineering University of Wisconsin-Madison Madison WI 53706 USA2Department of Transportation Engineering Beijing Institute of Technology Beijing 100081 China3College of Computer and Information Engineering Beijing Technology and Business University Beijng 100048 China

Correspondence should be addressed to Huachun Tan tanhc00gmailcom

Received 28 October 2014 Accepted 4 January 2015

Academic Editor J Alfredo Hernandez

Copyright copy 2015 Bin Ran et al This is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Traffic speed data plays a key role in Intelligent Transportation Systems (ITS) however missing traffic data would affect theperformance of ITS as well as Advanced Traveler Information Systems (ATIS) In this paper we handle this issue by a novel tensor-based imputation approach Specifically tensor pattern is adopted for modeling traffic speed data and then High accurate LowRank Tensor Completion (HaLRTC) an efficient tensor completion method is employed to estimate the missing traffic speed dataThis proposed method is able to recover missing entries from given entries which may be noisy considering severe fluctuation oftraffic speed data compared with traffic volume The proposed method is evaluated on Performance Measurement System (PeMS)database and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches

1 Introduction

The large amounts of traffic data collected from the trafficsensors are extremely valuable for route guidance planningand management of Intelligent Transportation Systems (ITS)[1] The data which include traffic speed volume and occu-pancy are collected via various traffic collecting devices andtechnologies Traffic speed data is one of the most importantinformation sources for ITS Advanced Traveler InformationSystems (ATIS) and Advanced Traffic Management Systems(ATMS) [1] As one of important parameters of traffic datatraffic speed data play a prominent role in the traffic domainand convey more information on traffic state such as trafficcongestion than traffic volume data For example trafficspeed data are used for computing the traffic congestion indexin Beijing Moreover the traffic speed data are used for thepurpose of traffic guidance [2]

Despite the importance of traffic speed informationunfortunately substantial missing data are usually induceddue to various malfunctions in data collection andor recordsystems such as failed loop detectors failed loop amplifiersand failed signal communication and processing devicesHowever most of traffic data analysis methods used in

intelligent transportation systems require the completeness ofdata and thus missing traffic speed data will severely degradethe performance of ITS such as the accuracy of travel timeestimation Besides improving the reliability of traffic datacollection and record systems the research on the problemof missing data in intelligent transportation systems has beenaroused due to extensive concern by traffic engineers andresearchers

In the past decade various imputation methods havebeen proposed for solving the missing traffic data problemAmong these methods most of them focus on traffic volumedata imputation and only a few researches try to deal withmissing traffic speed data estimation Since traffic speed datahave some similar characteristics with traffic volume datasuch as temporal and spatial correlations some imputingmethods for missing traffic volume data can be applieddirectly to traffic speed data However compared with trafficvolume data imputation there are many special characteris-tics of speed data One major observation for traffic speeddata imputation is that traffic speed data fluctuate moreseverely compared with traffic volume data which posessignificant challenge to traditional methods

Hindawi Publishing CorporationComputational Intelligence and NeuroscienceVolume 2015 Article ID 364089 9 pageshttpdxdoiorg1011552015364089

2 Computational Intelligence and Neuroscience

In general the frequently usedmethods include historical(neighboring) imputation methods [3] and spline (includinglinear)regression imputation methods [4] The historicalimputation method fills a missing data point with a knowndata point collected on the same site at the same daily timebut from a neighboring day [5] A variation of this algorithmfills the missing data with the average values taken overthe most recent days [6] The splineregression imputationmethod recovers the missing values by applying mathemat-ical interpolation algorithms according to the surroundingknown data points collected during the same day [7ndash9] Theimputing performance of above methods greatly dependson the surrounding data of missing points and thus thesemethods fail to estimate the missing value at high missingratioMeanwhile the abovemethods fail to capture the globalinformation of data sets As a result this kind of methods willproduce relative poor results for imputingmissing speed datadue to its high fluctuation characteristics

Because of the above reason researchers proposedBayesian Principal Component Analysis (BPCA) algorithm[10] and Probabilistic Principal Component Analysis (PPCA)[7] for addressing missing data problem BPCA is a mod-ification of PPCA Indeed both PPCA and BPCA arebased on EM imputation methods [11] and make use ofthe relationship between the observed data and the latentvariables for imputing the missing data Generally the rela-tionship between the observed data and the latent variablesis described as probabilistic model In order to obtain themaximum probability of above parameters the Bayesianmodel is introduced to estimate the missing values withrespect to the estimated posterior distribution The missingvalues are gradually recovered along with the building of thelatent model Intuitively the two methods make a reasonabletrade-off among the periodicity local predictability andother statistical property of the traffic flow This abilityhelps them to outperform the traditional methods since theyexploit more temporal correlations than traditional methodsWidhalm et al [12] proposed a method based on Gaussian-mixture model to estimate road link speed from sparse ormissing probe vehicle dataThe traffic speed is estimated onlyfrom sparse (only a few available observations) historical dataof all links in the road network However these methodsmay perform poorly when the missing ratio is high due tothe intrinsic characteristic of EM method and the intrinsiccharacteristic of matrix model

To improve the performance of traffic data imputationtensor pattern is introduced to represent the traffic data whileencoding multicorrelations in traffic data and tensor-basedimputation algorithms mine the multicorrelations of theconstructed tensor while estimating the missing entries Tanet al [13] construct traffic volume data as a tensor model andpropose an efficient algorithm based on tensor completion toimpute missing traffic volume It exploits global informationof traffic volume data specifically tensor based model canexploit multiway global information simultaneously suchas temporal and spatial information Though this methodshows its superiority in traffic volume data imputation theperformances on estimating missing traffic speed data arenot reported Asif et al [14] proposed a low-dimensional

model to impute the missing speed data in road networkThey model the traffic data as matrix andor tensor and useFixed Point Continuation with Approximate SVD (FPCA)[15] and Canonical Polyadic (CP) decomposition [16] to solvethe problem of missing traffic speed data The missing speeddata can be imputed more accurately than tradition methodsbased on the multiway global information However most ofthem focused on traffic volume data imputation and they didnot discuss how to exploit themultiway global information oftraffic speed data

In this paper we focus on the missing speed dataimputation on freeway Motivated by the work in [13] thispaper adopts tensor pattern to model the traffic speed dataand then an efficient tensor completion method which candeal with noisy entries is used to estimate the missing trafficspeed data due to the severe fluctuation of traffic speed dataThe correlations of traffic speed data are analyzed firstlyand then tensor pattern is used for modeling traffic speedwhich could benefit for mining the underlying multimodecorrelations while keeping its natural structure To estimatethe missing entries in the traffic speed tensor a high accuracylow rank tensor completion algorithm called HaLRTC [17]is adopted which can deal with noisy observed entries Theproposed method is evaluated on the Performance Measure-ment System (PeMS) database (httppemsdotcagov) andexperimental results show the proposed method achieveshigher recovery accuracy than the state-of-the-art missingtraffic speed data imputation methods

To give a detailed explanation the rest of this paper isorganized as follows Section 2 presents the notations used inthis paper and introduces tensor completion algorithms formissing data estimation The intrinsic correlations of trafficspeed data such as week-to-week relations day-to-day rela-tions and hour-to-hour relations are analyzed in Section 3In Section 4 we propose a general model to describe trafficdata tensor completion problem and give a high accuracylow rank tensor completion algorithm (HaLRTC) to solve itNumerical experiments are reported in Section 5 followed byconclusion in Section 6

2 Notations and Tensor Completion

In this section the notations used in this paper are presentedand then the matrix and tensor completion algorithms formissing data estimation are introduced

21 Notations We denote the scalars in R with lowercaseletters (119886 119887 ) and the vectors with bold lowercase letters(a b ) The matrices are written as uppercase italic lettersfor example X and the symbols for tensors are handwritingletters for exampleXThe subscripts represent the followingscalars (X)

119894119895119896= 119909119894119895119896 (X)119894119895= 119909119894119895 The superscripts indicate

the size of the matrices or tensors For example there is a setof traffic volume data which are recorded every 5 minutes for16 days Then the data of one day preserves 288 data points(12 hours per day and 24 data points per hour)Therefore thetraffic data of 16 days can be constructed as a matrix modelof size 16 times 288 or a tensor model of size 16 times 12 times 24

Computational Intelligence and Neuroscience 3

The Frobenius norm of matrix X is defined as X119865

=

(sum119894119895|119909119894119895|2)

12 Let Ω be an index set then XΩdenotes the

vector consisting of elements in the setΩ only Define XΩ=

(sum(119894119895)isinΩ

1199092

119894119895)

12An 119899-way tensor can be rearranged as a matrix this is

called matricization also known as unfolding or flattening atensorThe ldquounfoldrdquo operation along the 119899thmode on a tensorX of size 119868

1times1198682timessdot sdot sdottimes119868

119873is defined as unfold (X 119899) = X

(119899)The

opposite operation ldquofoldrdquo is defined as fold (X(119899)) = X For

example the above tensor model X for traffic volume datawhich of size 16times12times24 can be unfolded along the 1thmodeand get a matrix X

(1)of size 16 times 288 In addition the mode-

119899 rank of X is denoted as rank119899(X) which is equal to the

column rank of X(119899)

22 Tensor Completion Methods The matrix and tensorcompletion methods were recently proposed for addressingthe missing data problem Those methods can perform welleven when the missing ratio is very high

During the past years there were lots of works on matrixcompletions Recently most theoretical work focuses onproving bounds for the exact matrix completion problemand a lot of work focuses on low rank or approximate lowrank matrix completion problems Candes and Recht [18]introduced a convex optimization to solve thematrix comple-tion problem by modeling it as a Semi-Definite Programing(SDP) FPCA [15] and SVT [19] are the other two algo-rithms for solving the low-rank matrix completion problemADMiRA [20] is an iterative method for solving a leastsquares problem with the restriction of rank OptSpace [21]is an efficient procedure to solve the exact and approximatematrix completion problems

Tensor completion methods can be seen as a high-orderextension of matrix completion methods They can capturemore global information than matrix completion methodsdue to the intrinsic multiway characteristics of tensor modelLiu et al [17] first proposed a tensor completion methodbased on trace norm minimization and applied it on imagecompletion Also a first-order method has been recentlydeveloped called CP-WOPT [16] base on CP decompositionof tensor model and applied on imputing missing networktraffic data Signoretto et al [22] established a mathematicalframework for learning with higher order tensors respect tomissing data

Traffic speed data have intrinsic multiway spatial-temporal correlations For fully exploiting the spatial-temporal correlations and improving the performance ofimputation methods a multiway tensor model is utilized toconstruct the traffic speed data and a high accuracy low ranktensor completion algorithm is used to address the missingspeed data in this paper

3 Correlation Analysis of Traffic Speed Data

As mentioned above the core idea of the approaches foraddressing missing data problems was to make use ofthe established intrinsic relations among those data [23]

Figure 1 Loop Detectors in District 7 Los Angeles County

0 50 100 150 200 250 300

80

70

60

50

40

30

20

10

0

Time interval (5min)

Spee

d (m

ph)

Link1Link2Link3Link4

Link5Link6Link7

Figure 2 The daily profile of the traffic speed data on Monday forseven detectors

Exploiting the useful intrinsic information of traffic speeddata attracts continuous interest due to its wide applicationsespecially in missing traffic speed data imputation

Below we will illustrate the traffic speed data down-loaded from PeMS (httppemsdotcagov) and analyze thecorrelations between each mode of traffic speed data Wedownloaded a set of traffic speed data in District 7 LosAngeles County (see Figure 1) The area covers 13 loopdetectors for a direction The traffic speed data are recordedby every 5 minutes And the whole period of the data lasts for28 days that is from February 4 to March 3 2013

For illustrating the spatial correlations of traffic speeddata seven adjacent detectors are chosen randomly from theabove area for simplicity And the data on Monday for eachdetector are plotted in Figure 2 The figure shows that thetraffic speed data between neighbor detectors are stronglycorrelated Also the Pearson Correlation Coefficient (PCC)

4 Computational Intelligence and Neuroscience

Table 1 Pearson Correlation Coefficient between five weekdays for VDS 716331

PCC Link1 Link2 Link3 Link4 Link5 Link6 Link7Link1 10000 09617 09454 08255 08698 07567 07187Link2 09617 10000 09660 08735 08584 08012 07564Link3 09454 09660 10000 09233 09247 08521 07972Link4 08255 08735 09233 10000 08507 09526 08982Link5 08698 08584 09247 08507 10000 08325 07898Link6 07567 08012 08521 09526 08325 10000 09694Link7 07187 07564 07972 08982 07898 09694 10000

Table 2 Pearson Correlation Coefficient between five weekdays for VDS 716331

PCC Monday Tuesday Wednesday Thursday FridayMonday 10000 06082 06184 06286 07261Tuesday 06082 10000 07757 07124 06882Wednesday 06184 07757 10000 08264 07382Thursday 06286 07124 08264 10000 06609Friday 07261 06882 07382 06609 10000

MondayTuesdayWednesday

ThursdayFriday

0 50 100 150 200 250 300

80

70

60

50

40

30

20

10

Time interval (5min)

Spee

d (m

ph)

Figure 3 The daily profile of the traffic speed data of five weekdaysfor VDS 716331

between each Mondayrsquos speed data is computed in Table 1Here the PCC for vector 119909 and 119910 is defined as

120588119909119910

=

cov (119909 119910)120590119909120590119910

=

119864 ((119909 minus 119864 (119909)) (119910 minus 119864 (119910)))

120590119909120590119910

(1)

where cov(sdot) stands for the covariance and 119864(sdot) stands for themathematical expectation

Besides the strong spatial correlations of traffic speeddata the temporal correlations are also prominent In orderto illustrate the temporal correlations of the traffic speed datathe daily data for five weekdays during a week is plottedin Figure 3 The correlations of the speed data between

0206201302132013

0220201302272013

0 50 100 150 200 250 300

80

70

60

50

40

30

20

Time interval (5min)

Spee

d (m

ph)

Figure 4 The daily profile of the traffic speed data on Wednesdayin February 2013

each weekday are obvious However the fluctuations of eachdata profile are notable this is the inherent property oftraffic speed data especially when the traffic condition is incongestion Also the PCCs of the data sets are shown inTable 2

It is worth mentioning that the correlations along week-to-week mode should be more prominent For verifying theidea the speed data on Wednesday for a month (February2013) are plotted in Figure 4 and the PCCs are shown inTable 3 The correlations are stronger than those betweenweekdays

Simultaneously the correlations of traffic speed data areof multiple patterns The interval-to-interval correlations are

Computational Intelligence and Neuroscience 5

Table 3 PearsonCorrelationCoefficient onWednesday in February2013

PCC Monday Tuesday Wednesday ThursdayMonday 10000 08733 07656 08738Tuesday 08733 10000 08422 08890Wednesday 07656 08422 10000 08139Thursday 08738 08890 08139 10000

usually ignored because it may be less apparent than day-to-day correlations After making statistics and analysis thePCCs of interval-to-interval pattern are averaged as about70The temporal correlations are not so stronger than trafficvolume data Thus more accurate methods are needed todevelop missing traffic speed data

According to the above analysis it is sufficient to saythat traffic speed data on a freeway corridor exhibit a strongcorrelation in multimode In day mode as well as weekmode space as well as interval mode PCCs are about 07It should be noted that PCC value can be underestimatedandor misleading if outliers that are ubiquitous in trafficspeed data are present [24] Some robust statistical methodssuch as the methods proposed by Verma et al [25 26] havebeen proposed in recent yearsThese methods would providemore powerful tools for traffic speed data analysis This willbe considered in our future work to help us to understand theintrinsic features underlying the traffic speed data

4 HaLRTC for Traffic Speed Completion

Based on the above correlations analysis of traffic speed datathe tensor model is firstly constructed along different modesThe correlations of traffic speed data are critical for recoveringthe missing traffic speed data Traditional methods mostlyexploit part of correlations such as historical or temporalneighboring correlations The classic methods usually utilizethe temporal correlations of traffic speed data fromday to daySuch as the single detector data multiple correlations containthe relations of traffic speed data from day to day hour tohour and so forth In addition the spatial correlations existin multiple detectors speed data

Conventional methods usually use day to day matrixpattern to model the traffic speed data Although each modeof traffic speed data has a very high similarity these methodsdo not utilize the multimode correlations which are ldquoDay timesHourrdquo ldquoWeektimesHourrdquo and ldquoLinktimesHourrdquo simultaneously andthus may result in poor recovery performance

To make full use of the multimode correlations andspatial-temporal information traffic speed data need to beconstructed into multiway data model Fortunately tensorpattern based traffic speed data can be well used to modelthe multiway traffic speed data This helps keep the originalstructure and employ enough spatial-temporal informationFor example the speed data set which is used for correlationsanalysis in Section 3 can be constructed as a 13 times 28 times 24 times12 tensor model according the PCCs computed along eachmode Here the speed tensor of size 13 times 28 times 24 times 12which stands for 13 detectors 28 days 24 hours in a day

and 12 sampling intervals in a hour (ie sampling interval is5min) For imputing the missing speed data the built tensormodel can keep up the integrity of speed data structure andexploit multimode correlations simultaneously Meanwhilean efficient algorithm is equally important

Considering the higher fluctuation characteristics of traf-fic speed data a high accuracy low rank tensor completion(HaLRTC) algorithm [17] is used for imputing missing trafficspeed data in this paper In the following a single detectorspeed profile is created as a three-order tensor X isin R119897times119898times119899

for expressing simply also it is the same formultiple detectorsspeed data The speed tensor X isin R119897times119898times119899 contains averagespeed values for 119897 days 119898 hours and 119899 intervals Howevernot all values in X are known Let Ω be the set of valuesfor which speed data is available Just as the correlationsanalysis in Section 3 strong temporal and spatial correlationsare exhibited Hence the speed data can be represented asa low-dimensional structure Thus the problem of imputingmissing traffic speed data can be solved by the optimizationproblem for low rank tensor completion

minX

rank (X)

st XΩ= TΩ

(2)

where X T are 119899-mode tensors with identical size ineach mode However the rank of tensor is not unique andnonconvex [27] One common approach is to use the tracenorm sdot

lowastwhich is the tightest convex envelop for the

rank of tensors to approximate the rank of tensors Using thedefinition of trace norm of tensor in [17] the optimizationproblem can be converted into

minX

119899

sum

119894=1

120572119894

1003817100381710038171003817X(119894)

1003817100381710038171003817lowast

st XΩ= TΩ

(3)

where 120572119894rsquos are constants satisfying 120572

119894ge 0 and sum

119899

119894=1120572119894=

1 To solve the optimization problem additional tensorsM1 M

119899are introduced and derive the following equiv-

alent formulation

minXM1M119899

119899

sum

119894=1

120572119894

1003817100381710038171003817M119894(119894)

1003817100381710038171003817lowast

st X =M119894

for 119894 = 1 119899

XΩ= TΩ

(4)

In this paper a high accuracy algorithm called ADMM [28] isintroduced to tackle the large scale problem The augmentedLagrangian function of (4) is as follows

119871120588(XM

1 M

119899Y1 Y

119899)

=

119899

sum

119894=1

120572119894

1003817100381710038171003817M119894(119894)

1003817100381710038171003817lowast+ ⟨X minusM

119894Y119894⟩

+

120588

2

1003817100381710038171003817M119894minusX

1003817100381710038171003817

2

119865

(5)

6 Computational Intelligence and Neuroscience

InputX withXΩ= TΩ 120588 and 119870

OutputX(1) SetX

Ω= TΩandX

Ω= 0

(2) for 119896 = 0 to 119870 do(3) for 119894 = 1 to 119899 do

(4) M119894= fold

119894[119863(120572119894120588)

(X(119894)+

1

120588

Y119894(119894))]

(5) end for

(6) XΩ=

1

119899

(

119899

sum

119894=1

M119894minus

1

120588

Y119894)

Ω

(7) Y119894= Y119894minus 120588(M

119894minusX)

(8) end for

Algorithm 1 HaLRTC high accuracy low rank tensor completion

According to the framework of ADMM and the algorithmdescription in [17] the HaLRTC algorithm is summarized inAlgorithm 1

5 Experiments

The performances of HaLRTC algorithm are evaluated onthe real world traffic speed data The experimental settingsare listed in Section 51 and evaluation indices are shown inSection 52 In Section 53 temporal correlations are exploitedfor imputing missing speed data for a single detector Addi-tionally Section 54 tests the algorithms for data sets ofmultiple detectors

51 Experimental Settings HaLRTC is compared with twoclassical imputation methods (1) Mean-Historical imputa-tion [29] and (2) BPCA-based imputation method [10]

For the historical imputation method we calculate themean value of all the available data points belonging to thesame detector at the same time interval in the last few daysand use the mean as the imputed value see [29] for moredetails For BPCA we set the maximum number of iterationsteps is 200 and the threshold of the approximate complexityis set to 10minus4 which is the same as [10] HaLRTCmethod is aniterative algorithm themaximum iterative numbers are set as500 The value of 120572

119894is set to 1119899 (119899 is the mode number) and

the parameter 120588 is set as 119890minus3

52 Evaluation Indices To evaluate the performances of theproposed method HaLRTC the following two indices wereused in this paper

(1) Mean Absolute Percentage Error (MAPE) The index givesthe evaluation of the average estimation error in terms ofpercentage

MAPE = 1

119872

119872

sum

119898=1

1003816100381610038161003816100381610038161003816100381610038161003816

119905(119898)

119903minus 119905(119898)

119890

119905(119898)

119903

1003816100381610038161003816100381610038161003816100381610038161003816

times 100 (6)

01 02 03 04 05 06 07 08

14

13

12

11

10

9

8

7

6

5

4

Missing ratio

RMSE

(mpe

)

BPCAMean

HaLRTC

Figure 5 RMSE for weekdaysrsquo data of single detector

(2) Root Mean Square Error (RMSE) This index gives theevaluation of the variance in the estimation errors

RMSE = radic 1

119872

119872

sum

119898=1

(119905(119898)

119903 minus 119905(119898)

119890 )

2

(7)

where 119905(119898)119903

and 119905(119898)

119890are the 119898th elements which stand for

the known real value and estimated value respectively 119872denotes the number of estimated traffic volumes

53 Results forMissing Speed Data of Single Detector To illus-trate the performances of the proposed method a completetraffic speed data set is used as ground truth for the test Wechoose the data of a fixed detector VDS 716331 in District7 Los Angeles County (see Figure 1) which are downloadedfrom (PeMS httppemsdotcagov) The traffic speed dataare recorded every 5 minutes Therefore a daily traffic speedseries for a loop detector contains 288 records and the wholeperiod of the data is chosen for three weeks that is fromFebruary 4 to February 24 2013

Based on the correlations analysis of traffic speed data inSection 3 the correlations are stronger for weekdays withoutregard to the weekends And it will be helpful for imputingthe missing traffic speed data Therefore the weekdaysrsquo datafor three weeks (ie 15 days) are chosen for evaluating theHaLRTC algorithm

Based on multiple correlations of the traffic speed datathe data set is modeled as a tensor of size 24 times 12 times 15 whichstands for 24 hours in a day 12 sample intervals (ie recordedby 5 minutes) per hour and 15 days For the methods Meanand BPCA the speed data is arranged as amatrix of size 288times15 The ratios of missing data are set from 5 to 80 andthe missing data are produced randomly All the results areaverage by 10 instances

The RMSE curves and MAPE curves of those methodswith randomly missing weekdaysrsquo traffic speed data for asingle loop detector are shown in Figures 5 and 6 Obviously

Computational Intelligence and Neuroscience 7

20

15

10

5

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 6 MAPE for weekdaysrsquo data of single detector

12

11

10

9

8

7

6

5

4

RMSE

(mpe

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 7 RMSE for data in Wednesday of single detector

the RMSE and MAPE of HaLRTC approach are smallerthan other approaches It is worth noting that the RMSEof HaLRTC will be bigger than RMSE of BPCA when themissing ratio is 80 That situation can be considered as adiploma when the HaLRTC will reach the limit for keepingthe high accurate

For exploiting stronger correlations of traffic speed dataWednesdayrsquos data of a specific week are chosen and 10 weeksdata make up a data set with 10 days Identically the data setis modeled as a tensor of size 24times12times10which stands for 24hours in a day 12 sample intervals (ie recorded by 5minutes)per hour and 10 days And a 288times10matrixmodel is arrangedfor Mean and BPCA

18

16

14

12

10

8

6

4

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 8 MAPE for data in Wednesday of single detector

Figures 7 and 8 show that HaLRTC algorithm is withhigher accuracy for imputing missing traffic speed data thanMean and BPCA Comparing Figures 5 and 7 correspondingwith comparison of Figures 6 and 8 the performance forimputing missing traffic speed data will better when exploit-ing more correlations of data sets

54 Results for Missing Speed Data of Multiple Detectors Toillustrate the benefit of HaLRTC algorithm in reconstructingthe missing traffic speed data that is multiple correlationsof traffic speed data are more beneficial than the partialcorrelations used in traditional methods A new traffic speeddata set by considering spatial correlations is used to evaluateHaLRTC algorithm

We choose the data of 13 detectors in District 7 LosAngeles County (see Figure 1) which are downloaded from(PeMS httppemsdotcagov) For each detector the trafficspeed data for a specific day are chosen andderive the data setFor exploiting the spatial and temporal correlations of trafficspeed data simultaneously the data set is modeled as a tensorof size 24 times 12 times 13 which stands for 24 hours in a day 12sampling intervals in an hour and 13 detectors Meanwhilethe data set is arranged as a matrix 288 times 13 for Mean andBPCA

Figures 9 and 10 show that our propose HaLRTC out-performs Mean and BPCA for traffic speed data of multipledetectors

6 Conclusion

In this paper a multiway tensor model is proposed torepresent the traffic speed data considering the multiplecorrelations and a high accuracy low rank tensor completion(HaLRTC) algorithm is employed to estimate the missingtraffic speed data due to its severely fluctuation Experiments

8 Computational Intelligence and Neuroscience

12

14

16

10

8

6

4

RMSE

(mpe

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 9 RMSE for data of multiple detectors

30

25

20

15

10

5

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 10 MAPE for data of multiple detectors

on benchmark show that the proposed method performsbetter than other classical state-of-the-art methods Forfuture work it is interesting to extend the proposed methodto large and dynamic road network

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The research was supported by NSFC (Grant nos 61271376and 51308115) the National Basic Research Program of China(973 Program no 2012CB725405) Beijing Natural Science

Foundation (4122067) and the Basic Research Fund ofBeijing Institute of Technology (20120342026)

References

[1] J Zhang F-Y Wang K Wang W-H Lin X Xu and C ChenldquoData-driven intelligent transportation systems a surveyrdquo IEEETransactions on Intelligent Transportation Systems vol 12 no 4pp 1624ndash1639 2011

[2] V Jain A Sharma and L Subramanian ldquoRoad traffic conges-tion in the developing worldrdquo in Proceedings of the 2nd ACMSymposium on Computing for Development p 11 2012

[3] H S Zhang Y Zhang Z H Li andD C Hu ldquoSpatial-temporaltraffic data analysis based on global data management usingMASrdquo IEEE Transactions on Intelligent Transportation Systemsvol 5 no 4 pp 267ndash275 2004

[4] J Chen and J Shao ldquoNearest neighbor imputation for surveydatardquo Journal ofOfficial Statistics vol 16 no 2 pp 113ndash132 2000

[5] P D Allison and M Data University Papers Series on Quanti-tative Applications in the Social Sciences Sage Thousand OaksCalif USA 2001

[6] H Chen S Grant-Muller L Mussone and F MontgomeryldquoA study of hybrid neural network approaches and the effectsof missing data on traffic forecastingrdquo Neural Computing ampApplications vol 10 no 3 pp 277ndash286 2001

[7] L Qu L Li Y Zhang and J Hu ldquoPPCA-based missing dataimputation for traffic flow volume a systematical approachrdquoIEEE Transactions on Intelligent Transportation Systems vol 10no 3 pp 512ndash522 2009

[8] L Wang Z Li and C Song ldquoNetwork traffic prediction basedon seasonal ARIMA modelrdquo in Proceedings of the 5th IEEEWorld Congress on Intelligent Control and Automation vol 2pp 1425ndash1428 2004

[9] X Jin S-K Hong and Q Ma ldquoAn algorithm to estimatecontinuous-time traffic speed usingmultiple regressionmodelrdquoInformation Technology Journal vol 5 no 2 pp 281ndash284 2006

[10] L Qu Y Zhang J Hu L Jia and L Li ldquoA BPCA basedmissing value imputing method for traffic flow volume datardquo inProceedings of the IEEE Intelligent Vehicles Symposium (IV rsquo08)pp 985ndash990 June 2008

[11] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[12] P Widhalm M Piff N Brandle H Koller and M ReinthalerldquoRobust road link speed estimates for sparse or missing probevehicle datardquo in Proceedings of the 15th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo12) pp1693ndash1697 September 2012

[13] H Tan G Feng J Feng W Wang Y-J Zhang and F LildquoA tensor-based method for missing traffic data completionrdquoTransportation Research Part C Emerging Technologies vol 28pp 15ndash27 2013

[14] M T Asif N Mitrovic L Garg J Dauwels and P Jaillet ldquoLow-dimensional models for missing data imputation in road net-worksrdquo in Proceedings of the 38th IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo13) pp3527ndash3531 May 2013

[15] S Ma D Goldfarb and L Chen ldquoFixed point and Bregmaniterative methods for matrix rank minimizationrdquoMathematicalProgramming vol 128 no 1-2 pp 321ndash353 2011

Computational Intelligence and Neuroscience 9

[16] E Acar D M Dunlavy T G Kolda and M Moslashrup ldquoScalabletensor factorizations for incomplete datardquo Chemometrics andIntelligent Laboratory Systems vol 106 no 1 pp 41ndash56 2011

[17] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 35 no 1 pp 208ndash220 2013

[18] E J Candes and B Recht ldquoExact matrix completion via convexoptimizationrdquo Foundations of Computational Mathematics vol9 no 6 pp 717ndash772 2009

[19] J F Cai E J Candes andZ Shen ldquoA singular value thresholdingalgorithm for matrix completionrdquo SIAM Journal on Optimiza-tion vol 20 no 4 pp 1956ndash1982 2010

[20] K Lee and Y Bresler ldquoADMiRA atomic decomposition forminimum rank approximationrdquo IEEE Transactions on Informa-tion Theory vol 56 no 9 pp 4402ndash4416 2010

[21] R H Keshavan A Montanari and S Oh ldquoMatrix completionfrom a few entriesrdquo IEEE Transactions on Information Theoryvol 56 no 6 pp 2980ndash2998 2010

[22] M Signoretto Q Tran Dinh L de Lathauwer and J A KSuykens ldquoLearning with tensors a framework based on convexoptimization and spectral regularizationrdquo Machine Learningvol 94 no 3 pp 303ndash351 2014

[23] Y Zhang and Y Liu ldquoMissing traffic flow data predictionusing least squares support vector machines in urban arterialstreetsrdquo inProceedings of the IEEE SymposiumonComputationalIntelligence and DataMining (CIDM rsquo09) pp 76ndash83 April 2009

[24] V Barnett and T LewisOutliers in Statistical Data Wiley Seriesin Probability and Mathematical Statistics Applied Probabilityand Statistics John Wiley amp Sons Chichester UK 3rd edition1994

[25] S P Verma R Cruz-Huicochea and L Dıaz-Gonzalez ldquoUni-variate data analysis system deciphering mean compositionsof island and continental arc magmas and influence of theunderlying crustrdquo International Geology Review vol 55 no 15pp 1922ndash1940 2013

[26] S P Verma L Dıaz-Gonzalez M Rosales-Rivera and AQuiroz-Ruiz ldquoComparative performance of four single extremeoutlier discordancy tests from Monte Carlo simulationsrdquo TheScientific World Journal vol 2014 Article ID 746451 27 pages2014

[27] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[28] Z Lin M Chen L Wu and Y Ma ldquoThe augmented lagrangemultiplier method for exact recovery of corrupted low-rankmatricesrdquo UIUC Technical Report UILU-ENG-09-2215 2009

[29] D Kahaner C Moler and S Nash Numerical Methods andSoftware Prentice Hall Englewood Cliffs NJ USA 1989

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article Traffic Speed Data Imputation Method Based ...downloads.hindawi.com/journals/cin/2015/364089.pdfResearch Article Traffic Speed Data Imputation Method Based on Tensor

2 Computational Intelligence and Neuroscience

In general the frequently usedmethods include historical(neighboring) imputation methods [3] and spline (includinglinear)regression imputation methods [4] The historicalimputation method fills a missing data point with a knowndata point collected on the same site at the same daily timebut from a neighboring day [5] A variation of this algorithmfills the missing data with the average values taken overthe most recent days [6] The splineregression imputationmethod recovers the missing values by applying mathemat-ical interpolation algorithms according to the surroundingknown data points collected during the same day [7ndash9] Theimputing performance of above methods greatly dependson the surrounding data of missing points and thus thesemethods fail to estimate the missing value at high missingratioMeanwhile the abovemethods fail to capture the globalinformation of data sets As a result this kind of methods willproduce relative poor results for imputingmissing speed datadue to its high fluctuation characteristics

Because of the above reason researchers proposedBayesian Principal Component Analysis (BPCA) algorithm[10] and Probabilistic Principal Component Analysis (PPCA)[7] for addressing missing data problem BPCA is a mod-ification of PPCA Indeed both PPCA and BPCA arebased on EM imputation methods [11] and make use ofthe relationship between the observed data and the latentvariables for imputing the missing data Generally the rela-tionship between the observed data and the latent variablesis described as probabilistic model In order to obtain themaximum probability of above parameters the Bayesianmodel is introduced to estimate the missing values withrespect to the estimated posterior distribution The missingvalues are gradually recovered along with the building of thelatent model Intuitively the two methods make a reasonabletrade-off among the periodicity local predictability andother statistical property of the traffic flow This abilityhelps them to outperform the traditional methods since theyexploit more temporal correlations than traditional methodsWidhalm et al [12] proposed a method based on Gaussian-mixture model to estimate road link speed from sparse ormissing probe vehicle dataThe traffic speed is estimated onlyfrom sparse (only a few available observations) historical dataof all links in the road network However these methodsmay perform poorly when the missing ratio is high due tothe intrinsic characteristic of EM method and the intrinsiccharacteristic of matrix model

To improve the performance of traffic data imputationtensor pattern is introduced to represent the traffic data whileencoding multicorrelations in traffic data and tensor-basedimputation algorithms mine the multicorrelations of theconstructed tensor while estimating the missing entries Tanet al [13] construct traffic volume data as a tensor model andpropose an efficient algorithm based on tensor completion toimpute missing traffic volume It exploits global informationof traffic volume data specifically tensor based model canexploit multiway global information simultaneously suchas temporal and spatial information Though this methodshows its superiority in traffic volume data imputation theperformances on estimating missing traffic speed data arenot reported Asif et al [14] proposed a low-dimensional

model to impute the missing speed data in road networkThey model the traffic data as matrix andor tensor and useFixed Point Continuation with Approximate SVD (FPCA)[15] and Canonical Polyadic (CP) decomposition [16] to solvethe problem of missing traffic speed data The missing speeddata can be imputed more accurately than tradition methodsbased on the multiway global information However most ofthem focused on traffic volume data imputation and they didnot discuss how to exploit themultiway global information oftraffic speed data

In this paper we focus on the missing speed dataimputation on freeway Motivated by the work in [13] thispaper adopts tensor pattern to model the traffic speed dataand then an efficient tensor completion method which candeal with noisy entries is used to estimate the missing trafficspeed data due to the severe fluctuation of traffic speed dataThe correlations of traffic speed data are analyzed firstlyand then tensor pattern is used for modeling traffic speedwhich could benefit for mining the underlying multimodecorrelations while keeping its natural structure To estimatethe missing entries in the traffic speed tensor a high accuracylow rank tensor completion algorithm called HaLRTC [17]is adopted which can deal with noisy observed entries Theproposed method is evaluated on the Performance Measure-ment System (PeMS) database (httppemsdotcagov) andexperimental results show the proposed method achieveshigher recovery accuracy than the state-of-the-art missingtraffic speed data imputation methods

To give a detailed explanation the rest of this paper isorganized as follows Section 2 presents the notations used inthis paper and introduces tensor completion algorithms formissing data estimation The intrinsic correlations of trafficspeed data such as week-to-week relations day-to-day rela-tions and hour-to-hour relations are analyzed in Section 3In Section 4 we propose a general model to describe trafficdata tensor completion problem and give a high accuracylow rank tensor completion algorithm (HaLRTC) to solve itNumerical experiments are reported in Section 5 followed byconclusion in Section 6

2 Notations and Tensor Completion

In this section the notations used in this paper are presentedand then the matrix and tensor completion algorithms formissing data estimation are introduced

21 Notations We denote the scalars in R with lowercaseletters (119886 119887 ) and the vectors with bold lowercase letters(a b ) The matrices are written as uppercase italic lettersfor example X and the symbols for tensors are handwritingletters for exampleXThe subscripts represent the followingscalars (X)

119894119895119896= 119909119894119895119896 (X)119894119895= 119909119894119895 The superscripts indicate

the size of the matrices or tensors For example there is a setof traffic volume data which are recorded every 5 minutes for16 days Then the data of one day preserves 288 data points(12 hours per day and 24 data points per hour)Therefore thetraffic data of 16 days can be constructed as a matrix modelof size 16 times 288 or a tensor model of size 16 times 12 times 24

Computational Intelligence and Neuroscience 3

The Frobenius norm of matrix X is defined as X119865

=

(sum119894119895|119909119894119895|2)

12 Let Ω be an index set then XΩdenotes the

vector consisting of elements in the setΩ only Define XΩ=

(sum(119894119895)isinΩ

1199092

119894119895)

12An 119899-way tensor can be rearranged as a matrix this is

called matricization also known as unfolding or flattening atensorThe ldquounfoldrdquo operation along the 119899thmode on a tensorX of size 119868

1times1198682timessdot sdot sdottimes119868

119873is defined as unfold (X 119899) = X

(119899)The

opposite operation ldquofoldrdquo is defined as fold (X(119899)) = X For

example the above tensor model X for traffic volume datawhich of size 16times12times24 can be unfolded along the 1thmodeand get a matrix X

(1)of size 16 times 288 In addition the mode-

119899 rank of X is denoted as rank119899(X) which is equal to the

column rank of X(119899)

22 Tensor Completion Methods The matrix and tensorcompletion methods were recently proposed for addressingthe missing data problem Those methods can perform welleven when the missing ratio is very high

During the past years there were lots of works on matrixcompletions Recently most theoretical work focuses onproving bounds for the exact matrix completion problemand a lot of work focuses on low rank or approximate lowrank matrix completion problems Candes and Recht [18]introduced a convex optimization to solve thematrix comple-tion problem by modeling it as a Semi-Definite Programing(SDP) FPCA [15] and SVT [19] are the other two algo-rithms for solving the low-rank matrix completion problemADMiRA [20] is an iterative method for solving a leastsquares problem with the restriction of rank OptSpace [21]is an efficient procedure to solve the exact and approximatematrix completion problems

Tensor completion methods can be seen as a high-orderextension of matrix completion methods They can capturemore global information than matrix completion methodsdue to the intrinsic multiway characteristics of tensor modelLiu et al [17] first proposed a tensor completion methodbased on trace norm minimization and applied it on imagecompletion Also a first-order method has been recentlydeveloped called CP-WOPT [16] base on CP decompositionof tensor model and applied on imputing missing networktraffic data Signoretto et al [22] established a mathematicalframework for learning with higher order tensors respect tomissing data

Traffic speed data have intrinsic multiway spatial-temporal correlations For fully exploiting the spatial-temporal correlations and improving the performance ofimputation methods a multiway tensor model is utilized toconstruct the traffic speed data and a high accuracy low ranktensor completion algorithm is used to address the missingspeed data in this paper

3 Correlation Analysis of Traffic Speed Data

As mentioned above the core idea of the approaches foraddressing missing data problems was to make use ofthe established intrinsic relations among those data [23]

Figure 1 Loop Detectors in District 7 Los Angeles County

0 50 100 150 200 250 300

80

70

60

50

40

30

20

10

0

Time interval (5min)

Spee

d (m

ph)

Link1Link2Link3Link4

Link5Link6Link7

Figure 2 The daily profile of the traffic speed data on Monday forseven detectors

Exploiting the useful intrinsic information of traffic speeddata attracts continuous interest due to its wide applicationsespecially in missing traffic speed data imputation

Below we will illustrate the traffic speed data down-loaded from PeMS (httppemsdotcagov) and analyze thecorrelations between each mode of traffic speed data Wedownloaded a set of traffic speed data in District 7 LosAngeles County (see Figure 1) The area covers 13 loopdetectors for a direction The traffic speed data are recordedby every 5 minutes And the whole period of the data lasts for28 days that is from February 4 to March 3 2013

For illustrating the spatial correlations of traffic speeddata seven adjacent detectors are chosen randomly from theabove area for simplicity And the data on Monday for eachdetector are plotted in Figure 2 The figure shows that thetraffic speed data between neighbor detectors are stronglycorrelated Also the Pearson Correlation Coefficient (PCC)

4 Computational Intelligence and Neuroscience

Table 1 Pearson Correlation Coefficient between five weekdays for VDS 716331

PCC Link1 Link2 Link3 Link4 Link5 Link6 Link7Link1 10000 09617 09454 08255 08698 07567 07187Link2 09617 10000 09660 08735 08584 08012 07564Link3 09454 09660 10000 09233 09247 08521 07972Link4 08255 08735 09233 10000 08507 09526 08982Link5 08698 08584 09247 08507 10000 08325 07898Link6 07567 08012 08521 09526 08325 10000 09694Link7 07187 07564 07972 08982 07898 09694 10000

Table 2 Pearson Correlation Coefficient between five weekdays for VDS 716331

PCC Monday Tuesday Wednesday Thursday FridayMonday 10000 06082 06184 06286 07261Tuesday 06082 10000 07757 07124 06882Wednesday 06184 07757 10000 08264 07382Thursday 06286 07124 08264 10000 06609Friday 07261 06882 07382 06609 10000

MondayTuesdayWednesday

ThursdayFriday

0 50 100 150 200 250 300

80

70

60

50

40

30

20

10

Time interval (5min)

Spee

d (m

ph)

Figure 3 The daily profile of the traffic speed data of five weekdaysfor VDS 716331

between each Mondayrsquos speed data is computed in Table 1Here the PCC for vector 119909 and 119910 is defined as

120588119909119910

=

cov (119909 119910)120590119909120590119910

=

119864 ((119909 minus 119864 (119909)) (119910 minus 119864 (119910)))

120590119909120590119910

(1)

where cov(sdot) stands for the covariance and 119864(sdot) stands for themathematical expectation

Besides the strong spatial correlations of traffic speeddata the temporal correlations are also prominent In orderto illustrate the temporal correlations of the traffic speed datathe daily data for five weekdays during a week is plottedin Figure 3 The correlations of the speed data between

0206201302132013

0220201302272013

0 50 100 150 200 250 300

80

70

60

50

40

30

20

Time interval (5min)

Spee

d (m

ph)

Figure 4 The daily profile of the traffic speed data on Wednesdayin February 2013

each weekday are obvious However the fluctuations of eachdata profile are notable this is the inherent property oftraffic speed data especially when the traffic condition is incongestion Also the PCCs of the data sets are shown inTable 2

It is worth mentioning that the correlations along week-to-week mode should be more prominent For verifying theidea the speed data on Wednesday for a month (February2013) are plotted in Figure 4 and the PCCs are shown inTable 3 The correlations are stronger than those betweenweekdays

Simultaneously the correlations of traffic speed data areof multiple patterns The interval-to-interval correlations are

Computational Intelligence and Neuroscience 5

Table 3 PearsonCorrelationCoefficient onWednesday in February2013

PCC Monday Tuesday Wednesday ThursdayMonday 10000 08733 07656 08738Tuesday 08733 10000 08422 08890Wednesday 07656 08422 10000 08139Thursday 08738 08890 08139 10000

usually ignored because it may be less apparent than day-to-day correlations After making statistics and analysis thePCCs of interval-to-interval pattern are averaged as about70The temporal correlations are not so stronger than trafficvolume data Thus more accurate methods are needed todevelop missing traffic speed data

According to the above analysis it is sufficient to saythat traffic speed data on a freeway corridor exhibit a strongcorrelation in multimode In day mode as well as weekmode space as well as interval mode PCCs are about 07It should be noted that PCC value can be underestimatedandor misleading if outliers that are ubiquitous in trafficspeed data are present [24] Some robust statistical methodssuch as the methods proposed by Verma et al [25 26] havebeen proposed in recent yearsThese methods would providemore powerful tools for traffic speed data analysis This willbe considered in our future work to help us to understand theintrinsic features underlying the traffic speed data

4 HaLRTC for Traffic Speed Completion

Based on the above correlations analysis of traffic speed datathe tensor model is firstly constructed along different modesThe correlations of traffic speed data are critical for recoveringthe missing traffic speed data Traditional methods mostlyexploit part of correlations such as historical or temporalneighboring correlations The classic methods usually utilizethe temporal correlations of traffic speed data fromday to daySuch as the single detector data multiple correlations containthe relations of traffic speed data from day to day hour tohour and so forth In addition the spatial correlations existin multiple detectors speed data

Conventional methods usually use day to day matrixpattern to model the traffic speed data Although each modeof traffic speed data has a very high similarity these methodsdo not utilize the multimode correlations which are ldquoDay timesHourrdquo ldquoWeektimesHourrdquo and ldquoLinktimesHourrdquo simultaneously andthus may result in poor recovery performance

To make full use of the multimode correlations andspatial-temporal information traffic speed data need to beconstructed into multiway data model Fortunately tensorpattern based traffic speed data can be well used to modelthe multiway traffic speed data This helps keep the originalstructure and employ enough spatial-temporal informationFor example the speed data set which is used for correlationsanalysis in Section 3 can be constructed as a 13 times 28 times 24 times12 tensor model according the PCCs computed along eachmode Here the speed tensor of size 13 times 28 times 24 times 12which stands for 13 detectors 28 days 24 hours in a day

and 12 sampling intervals in a hour (ie sampling interval is5min) For imputing the missing speed data the built tensormodel can keep up the integrity of speed data structure andexploit multimode correlations simultaneously Meanwhilean efficient algorithm is equally important

Considering the higher fluctuation characteristics of traf-fic speed data a high accuracy low rank tensor completion(HaLRTC) algorithm [17] is used for imputing missing trafficspeed data in this paper In the following a single detectorspeed profile is created as a three-order tensor X isin R119897times119898times119899

for expressing simply also it is the same formultiple detectorsspeed data The speed tensor X isin R119897times119898times119899 contains averagespeed values for 119897 days 119898 hours and 119899 intervals Howevernot all values in X are known Let Ω be the set of valuesfor which speed data is available Just as the correlationsanalysis in Section 3 strong temporal and spatial correlationsare exhibited Hence the speed data can be represented asa low-dimensional structure Thus the problem of imputingmissing traffic speed data can be solved by the optimizationproblem for low rank tensor completion

minX

rank (X)

st XΩ= TΩ

(2)

where X T are 119899-mode tensors with identical size ineach mode However the rank of tensor is not unique andnonconvex [27] One common approach is to use the tracenorm sdot

lowastwhich is the tightest convex envelop for the

rank of tensors to approximate the rank of tensors Using thedefinition of trace norm of tensor in [17] the optimizationproblem can be converted into

minX

119899

sum

119894=1

120572119894

1003817100381710038171003817X(119894)

1003817100381710038171003817lowast

st XΩ= TΩ

(3)

where 120572119894rsquos are constants satisfying 120572

119894ge 0 and sum

119899

119894=1120572119894=

1 To solve the optimization problem additional tensorsM1 M

119899are introduced and derive the following equiv-

alent formulation

minXM1M119899

119899

sum

119894=1

120572119894

1003817100381710038171003817M119894(119894)

1003817100381710038171003817lowast

st X =M119894

for 119894 = 1 119899

XΩ= TΩ

(4)

In this paper a high accuracy algorithm called ADMM [28] isintroduced to tackle the large scale problem The augmentedLagrangian function of (4) is as follows

119871120588(XM

1 M

119899Y1 Y

119899)

=

119899

sum

119894=1

120572119894

1003817100381710038171003817M119894(119894)

1003817100381710038171003817lowast+ ⟨X minusM

119894Y119894⟩

+

120588

2

1003817100381710038171003817M119894minusX

1003817100381710038171003817

2

119865

(5)

6 Computational Intelligence and Neuroscience

InputX withXΩ= TΩ 120588 and 119870

OutputX(1) SetX

Ω= TΩandX

Ω= 0

(2) for 119896 = 0 to 119870 do(3) for 119894 = 1 to 119899 do

(4) M119894= fold

119894[119863(120572119894120588)

(X(119894)+

1

120588

Y119894(119894))]

(5) end for

(6) XΩ=

1

119899

(

119899

sum

119894=1

M119894minus

1

120588

Y119894)

Ω

(7) Y119894= Y119894minus 120588(M

119894minusX)

(8) end for

Algorithm 1 HaLRTC high accuracy low rank tensor completion

According to the framework of ADMM and the algorithmdescription in [17] the HaLRTC algorithm is summarized inAlgorithm 1

5 Experiments

The performances of HaLRTC algorithm are evaluated onthe real world traffic speed data The experimental settingsare listed in Section 51 and evaluation indices are shown inSection 52 In Section 53 temporal correlations are exploitedfor imputing missing speed data for a single detector Addi-tionally Section 54 tests the algorithms for data sets ofmultiple detectors

51 Experimental Settings HaLRTC is compared with twoclassical imputation methods (1) Mean-Historical imputa-tion [29] and (2) BPCA-based imputation method [10]

For the historical imputation method we calculate themean value of all the available data points belonging to thesame detector at the same time interval in the last few daysand use the mean as the imputed value see [29] for moredetails For BPCA we set the maximum number of iterationsteps is 200 and the threshold of the approximate complexityis set to 10minus4 which is the same as [10] HaLRTCmethod is aniterative algorithm themaximum iterative numbers are set as500 The value of 120572

119894is set to 1119899 (119899 is the mode number) and

the parameter 120588 is set as 119890minus3

52 Evaluation Indices To evaluate the performances of theproposed method HaLRTC the following two indices wereused in this paper

(1) Mean Absolute Percentage Error (MAPE) The index givesthe evaluation of the average estimation error in terms ofpercentage

MAPE = 1

119872

119872

sum

119898=1

1003816100381610038161003816100381610038161003816100381610038161003816

119905(119898)

119903minus 119905(119898)

119890

119905(119898)

119903

1003816100381610038161003816100381610038161003816100381610038161003816

times 100 (6)

01 02 03 04 05 06 07 08

14

13

12

11

10

9

8

7

6

5

4

Missing ratio

RMSE

(mpe

)

BPCAMean

HaLRTC

Figure 5 RMSE for weekdaysrsquo data of single detector

(2) Root Mean Square Error (RMSE) This index gives theevaluation of the variance in the estimation errors

RMSE = radic 1

119872

119872

sum

119898=1

(119905(119898)

119903 minus 119905(119898)

119890 )

2

(7)

where 119905(119898)119903

and 119905(119898)

119890are the 119898th elements which stand for

the known real value and estimated value respectively 119872denotes the number of estimated traffic volumes

53 Results forMissing Speed Data of Single Detector To illus-trate the performances of the proposed method a completetraffic speed data set is used as ground truth for the test Wechoose the data of a fixed detector VDS 716331 in District7 Los Angeles County (see Figure 1) which are downloadedfrom (PeMS httppemsdotcagov) The traffic speed dataare recorded every 5 minutes Therefore a daily traffic speedseries for a loop detector contains 288 records and the wholeperiod of the data is chosen for three weeks that is fromFebruary 4 to February 24 2013

Based on the correlations analysis of traffic speed data inSection 3 the correlations are stronger for weekdays withoutregard to the weekends And it will be helpful for imputingthe missing traffic speed data Therefore the weekdaysrsquo datafor three weeks (ie 15 days) are chosen for evaluating theHaLRTC algorithm

Based on multiple correlations of the traffic speed datathe data set is modeled as a tensor of size 24 times 12 times 15 whichstands for 24 hours in a day 12 sample intervals (ie recordedby 5 minutes) per hour and 15 days For the methods Meanand BPCA the speed data is arranged as amatrix of size 288times15 The ratios of missing data are set from 5 to 80 andthe missing data are produced randomly All the results areaverage by 10 instances

The RMSE curves and MAPE curves of those methodswith randomly missing weekdaysrsquo traffic speed data for asingle loop detector are shown in Figures 5 and 6 Obviously

Computational Intelligence and Neuroscience 7

20

15

10

5

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 6 MAPE for weekdaysrsquo data of single detector

12

11

10

9

8

7

6

5

4

RMSE

(mpe

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 7 RMSE for data in Wednesday of single detector

the RMSE and MAPE of HaLRTC approach are smallerthan other approaches It is worth noting that the RMSEof HaLRTC will be bigger than RMSE of BPCA when themissing ratio is 80 That situation can be considered as adiploma when the HaLRTC will reach the limit for keepingthe high accurate

For exploiting stronger correlations of traffic speed dataWednesdayrsquos data of a specific week are chosen and 10 weeksdata make up a data set with 10 days Identically the data setis modeled as a tensor of size 24times12times10which stands for 24hours in a day 12 sample intervals (ie recorded by 5minutes)per hour and 10 days And a 288times10matrixmodel is arrangedfor Mean and BPCA

18

16

14

12

10

8

6

4

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 8 MAPE for data in Wednesday of single detector

Figures 7 and 8 show that HaLRTC algorithm is withhigher accuracy for imputing missing traffic speed data thanMean and BPCA Comparing Figures 5 and 7 correspondingwith comparison of Figures 6 and 8 the performance forimputing missing traffic speed data will better when exploit-ing more correlations of data sets

54 Results for Missing Speed Data of Multiple Detectors Toillustrate the benefit of HaLRTC algorithm in reconstructingthe missing traffic speed data that is multiple correlationsof traffic speed data are more beneficial than the partialcorrelations used in traditional methods A new traffic speeddata set by considering spatial correlations is used to evaluateHaLRTC algorithm

We choose the data of 13 detectors in District 7 LosAngeles County (see Figure 1) which are downloaded from(PeMS httppemsdotcagov) For each detector the trafficspeed data for a specific day are chosen andderive the data setFor exploiting the spatial and temporal correlations of trafficspeed data simultaneously the data set is modeled as a tensorof size 24 times 12 times 13 which stands for 24 hours in a day 12sampling intervals in an hour and 13 detectors Meanwhilethe data set is arranged as a matrix 288 times 13 for Mean andBPCA

Figures 9 and 10 show that our propose HaLRTC out-performs Mean and BPCA for traffic speed data of multipledetectors

6 Conclusion

In this paper a multiway tensor model is proposed torepresent the traffic speed data considering the multiplecorrelations and a high accuracy low rank tensor completion(HaLRTC) algorithm is employed to estimate the missingtraffic speed data due to its severely fluctuation Experiments

8 Computational Intelligence and Neuroscience

12

14

16

10

8

6

4

RMSE

(mpe

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 9 RMSE for data of multiple detectors

30

25

20

15

10

5

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 10 MAPE for data of multiple detectors

on benchmark show that the proposed method performsbetter than other classical state-of-the-art methods Forfuture work it is interesting to extend the proposed methodto large and dynamic road network

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The research was supported by NSFC (Grant nos 61271376and 51308115) the National Basic Research Program of China(973 Program no 2012CB725405) Beijing Natural Science

Foundation (4122067) and the Basic Research Fund ofBeijing Institute of Technology (20120342026)

References

[1] J Zhang F-Y Wang K Wang W-H Lin X Xu and C ChenldquoData-driven intelligent transportation systems a surveyrdquo IEEETransactions on Intelligent Transportation Systems vol 12 no 4pp 1624ndash1639 2011

[2] V Jain A Sharma and L Subramanian ldquoRoad traffic conges-tion in the developing worldrdquo in Proceedings of the 2nd ACMSymposium on Computing for Development p 11 2012

[3] H S Zhang Y Zhang Z H Li andD C Hu ldquoSpatial-temporaltraffic data analysis based on global data management usingMASrdquo IEEE Transactions on Intelligent Transportation Systemsvol 5 no 4 pp 267ndash275 2004

[4] J Chen and J Shao ldquoNearest neighbor imputation for surveydatardquo Journal ofOfficial Statistics vol 16 no 2 pp 113ndash132 2000

[5] P D Allison and M Data University Papers Series on Quanti-tative Applications in the Social Sciences Sage Thousand OaksCalif USA 2001

[6] H Chen S Grant-Muller L Mussone and F MontgomeryldquoA study of hybrid neural network approaches and the effectsof missing data on traffic forecastingrdquo Neural Computing ampApplications vol 10 no 3 pp 277ndash286 2001

[7] L Qu L Li Y Zhang and J Hu ldquoPPCA-based missing dataimputation for traffic flow volume a systematical approachrdquoIEEE Transactions on Intelligent Transportation Systems vol 10no 3 pp 512ndash522 2009

[8] L Wang Z Li and C Song ldquoNetwork traffic prediction basedon seasonal ARIMA modelrdquo in Proceedings of the 5th IEEEWorld Congress on Intelligent Control and Automation vol 2pp 1425ndash1428 2004

[9] X Jin S-K Hong and Q Ma ldquoAn algorithm to estimatecontinuous-time traffic speed usingmultiple regressionmodelrdquoInformation Technology Journal vol 5 no 2 pp 281ndash284 2006

[10] L Qu Y Zhang J Hu L Jia and L Li ldquoA BPCA basedmissing value imputing method for traffic flow volume datardquo inProceedings of the IEEE Intelligent Vehicles Symposium (IV rsquo08)pp 985ndash990 June 2008

[11] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[12] P Widhalm M Piff N Brandle H Koller and M ReinthalerldquoRobust road link speed estimates for sparse or missing probevehicle datardquo in Proceedings of the 15th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo12) pp1693ndash1697 September 2012

[13] H Tan G Feng J Feng W Wang Y-J Zhang and F LildquoA tensor-based method for missing traffic data completionrdquoTransportation Research Part C Emerging Technologies vol 28pp 15ndash27 2013

[14] M T Asif N Mitrovic L Garg J Dauwels and P Jaillet ldquoLow-dimensional models for missing data imputation in road net-worksrdquo in Proceedings of the 38th IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo13) pp3527ndash3531 May 2013

[15] S Ma D Goldfarb and L Chen ldquoFixed point and Bregmaniterative methods for matrix rank minimizationrdquoMathematicalProgramming vol 128 no 1-2 pp 321ndash353 2011

Computational Intelligence and Neuroscience 9

[16] E Acar D M Dunlavy T G Kolda and M Moslashrup ldquoScalabletensor factorizations for incomplete datardquo Chemometrics andIntelligent Laboratory Systems vol 106 no 1 pp 41ndash56 2011

[17] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 35 no 1 pp 208ndash220 2013

[18] E J Candes and B Recht ldquoExact matrix completion via convexoptimizationrdquo Foundations of Computational Mathematics vol9 no 6 pp 717ndash772 2009

[19] J F Cai E J Candes andZ Shen ldquoA singular value thresholdingalgorithm for matrix completionrdquo SIAM Journal on Optimiza-tion vol 20 no 4 pp 1956ndash1982 2010

[20] K Lee and Y Bresler ldquoADMiRA atomic decomposition forminimum rank approximationrdquo IEEE Transactions on Informa-tion Theory vol 56 no 9 pp 4402ndash4416 2010

[21] R H Keshavan A Montanari and S Oh ldquoMatrix completionfrom a few entriesrdquo IEEE Transactions on Information Theoryvol 56 no 6 pp 2980ndash2998 2010

[22] M Signoretto Q Tran Dinh L de Lathauwer and J A KSuykens ldquoLearning with tensors a framework based on convexoptimization and spectral regularizationrdquo Machine Learningvol 94 no 3 pp 303ndash351 2014

[23] Y Zhang and Y Liu ldquoMissing traffic flow data predictionusing least squares support vector machines in urban arterialstreetsrdquo inProceedings of the IEEE SymposiumonComputationalIntelligence and DataMining (CIDM rsquo09) pp 76ndash83 April 2009

[24] V Barnett and T LewisOutliers in Statistical Data Wiley Seriesin Probability and Mathematical Statistics Applied Probabilityand Statistics John Wiley amp Sons Chichester UK 3rd edition1994

[25] S P Verma R Cruz-Huicochea and L Dıaz-Gonzalez ldquoUni-variate data analysis system deciphering mean compositionsof island and continental arc magmas and influence of theunderlying crustrdquo International Geology Review vol 55 no 15pp 1922ndash1940 2013

[26] S P Verma L Dıaz-Gonzalez M Rosales-Rivera and AQuiroz-Ruiz ldquoComparative performance of four single extremeoutlier discordancy tests from Monte Carlo simulationsrdquo TheScientific World Journal vol 2014 Article ID 746451 27 pages2014

[27] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[28] Z Lin M Chen L Wu and Y Ma ldquoThe augmented lagrangemultiplier method for exact recovery of corrupted low-rankmatricesrdquo UIUC Technical Report UILU-ENG-09-2215 2009

[29] D Kahaner C Moler and S Nash Numerical Methods andSoftware Prentice Hall Englewood Cliffs NJ USA 1989

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article Traffic Speed Data Imputation Method Based ...downloads.hindawi.com/journals/cin/2015/364089.pdfResearch Article Traffic Speed Data Imputation Method Based on Tensor

Computational Intelligence and Neuroscience 3

The Frobenius norm of matrix X is defined as X119865

=

(sum119894119895|119909119894119895|2)

12 Let Ω be an index set then XΩdenotes the

vector consisting of elements in the setΩ only Define XΩ=

(sum(119894119895)isinΩ

1199092

119894119895)

12An 119899-way tensor can be rearranged as a matrix this is

called matricization also known as unfolding or flattening atensorThe ldquounfoldrdquo operation along the 119899thmode on a tensorX of size 119868

1times1198682timessdot sdot sdottimes119868

119873is defined as unfold (X 119899) = X

(119899)The

opposite operation ldquofoldrdquo is defined as fold (X(119899)) = X For

example the above tensor model X for traffic volume datawhich of size 16times12times24 can be unfolded along the 1thmodeand get a matrix X

(1)of size 16 times 288 In addition the mode-

119899 rank of X is denoted as rank119899(X) which is equal to the

column rank of X(119899)

22 Tensor Completion Methods The matrix and tensorcompletion methods were recently proposed for addressingthe missing data problem Those methods can perform welleven when the missing ratio is very high

During the past years there were lots of works on matrixcompletions Recently most theoretical work focuses onproving bounds for the exact matrix completion problemand a lot of work focuses on low rank or approximate lowrank matrix completion problems Candes and Recht [18]introduced a convex optimization to solve thematrix comple-tion problem by modeling it as a Semi-Definite Programing(SDP) FPCA [15] and SVT [19] are the other two algo-rithms for solving the low-rank matrix completion problemADMiRA [20] is an iterative method for solving a leastsquares problem with the restriction of rank OptSpace [21]is an efficient procedure to solve the exact and approximatematrix completion problems

Tensor completion methods can be seen as a high-orderextension of matrix completion methods They can capturemore global information than matrix completion methodsdue to the intrinsic multiway characteristics of tensor modelLiu et al [17] first proposed a tensor completion methodbased on trace norm minimization and applied it on imagecompletion Also a first-order method has been recentlydeveloped called CP-WOPT [16] base on CP decompositionof tensor model and applied on imputing missing networktraffic data Signoretto et al [22] established a mathematicalframework for learning with higher order tensors respect tomissing data

Traffic speed data have intrinsic multiway spatial-temporal correlations For fully exploiting the spatial-temporal correlations and improving the performance ofimputation methods a multiway tensor model is utilized toconstruct the traffic speed data and a high accuracy low ranktensor completion algorithm is used to address the missingspeed data in this paper

3 Correlation Analysis of Traffic Speed Data

As mentioned above the core idea of the approaches foraddressing missing data problems was to make use ofthe established intrinsic relations among those data [23]

Figure 1 Loop Detectors in District 7 Los Angeles County

0 50 100 150 200 250 300

80

70

60

50

40

30

20

10

0

Time interval (5min)

Spee

d (m

ph)

Link1Link2Link3Link4

Link5Link6Link7

Figure 2 The daily profile of the traffic speed data on Monday forseven detectors

Exploiting the useful intrinsic information of traffic speeddata attracts continuous interest due to its wide applicationsespecially in missing traffic speed data imputation

Below we will illustrate the traffic speed data down-loaded from PeMS (httppemsdotcagov) and analyze thecorrelations between each mode of traffic speed data Wedownloaded a set of traffic speed data in District 7 LosAngeles County (see Figure 1) The area covers 13 loopdetectors for a direction The traffic speed data are recordedby every 5 minutes And the whole period of the data lasts for28 days that is from February 4 to March 3 2013

For illustrating the spatial correlations of traffic speeddata seven adjacent detectors are chosen randomly from theabove area for simplicity And the data on Monday for eachdetector are plotted in Figure 2 The figure shows that thetraffic speed data between neighbor detectors are stronglycorrelated Also the Pearson Correlation Coefficient (PCC)

4 Computational Intelligence and Neuroscience

Table 1 Pearson Correlation Coefficient between five weekdays for VDS 716331

PCC Link1 Link2 Link3 Link4 Link5 Link6 Link7Link1 10000 09617 09454 08255 08698 07567 07187Link2 09617 10000 09660 08735 08584 08012 07564Link3 09454 09660 10000 09233 09247 08521 07972Link4 08255 08735 09233 10000 08507 09526 08982Link5 08698 08584 09247 08507 10000 08325 07898Link6 07567 08012 08521 09526 08325 10000 09694Link7 07187 07564 07972 08982 07898 09694 10000

Table 2 Pearson Correlation Coefficient between five weekdays for VDS 716331

PCC Monday Tuesday Wednesday Thursday FridayMonday 10000 06082 06184 06286 07261Tuesday 06082 10000 07757 07124 06882Wednesday 06184 07757 10000 08264 07382Thursday 06286 07124 08264 10000 06609Friday 07261 06882 07382 06609 10000

MondayTuesdayWednesday

ThursdayFriday

0 50 100 150 200 250 300

80

70

60

50

40

30

20

10

Time interval (5min)

Spee

d (m

ph)

Figure 3 The daily profile of the traffic speed data of five weekdaysfor VDS 716331

between each Mondayrsquos speed data is computed in Table 1Here the PCC for vector 119909 and 119910 is defined as

120588119909119910

=

cov (119909 119910)120590119909120590119910

=

119864 ((119909 minus 119864 (119909)) (119910 minus 119864 (119910)))

120590119909120590119910

(1)

where cov(sdot) stands for the covariance and 119864(sdot) stands for themathematical expectation

Besides the strong spatial correlations of traffic speeddata the temporal correlations are also prominent In orderto illustrate the temporal correlations of the traffic speed datathe daily data for five weekdays during a week is plottedin Figure 3 The correlations of the speed data between

0206201302132013

0220201302272013

0 50 100 150 200 250 300

80

70

60

50

40

30

20

Time interval (5min)

Spee

d (m

ph)

Figure 4 The daily profile of the traffic speed data on Wednesdayin February 2013

each weekday are obvious However the fluctuations of eachdata profile are notable this is the inherent property oftraffic speed data especially when the traffic condition is incongestion Also the PCCs of the data sets are shown inTable 2

It is worth mentioning that the correlations along week-to-week mode should be more prominent For verifying theidea the speed data on Wednesday for a month (February2013) are plotted in Figure 4 and the PCCs are shown inTable 3 The correlations are stronger than those betweenweekdays

Simultaneously the correlations of traffic speed data areof multiple patterns The interval-to-interval correlations are

Computational Intelligence and Neuroscience 5

Table 3 PearsonCorrelationCoefficient onWednesday in February2013

PCC Monday Tuesday Wednesday ThursdayMonday 10000 08733 07656 08738Tuesday 08733 10000 08422 08890Wednesday 07656 08422 10000 08139Thursday 08738 08890 08139 10000

usually ignored because it may be less apparent than day-to-day correlations After making statistics and analysis thePCCs of interval-to-interval pattern are averaged as about70The temporal correlations are not so stronger than trafficvolume data Thus more accurate methods are needed todevelop missing traffic speed data

According to the above analysis it is sufficient to saythat traffic speed data on a freeway corridor exhibit a strongcorrelation in multimode In day mode as well as weekmode space as well as interval mode PCCs are about 07It should be noted that PCC value can be underestimatedandor misleading if outliers that are ubiquitous in trafficspeed data are present [24] Some robust statistical methodssuch as the methods proposed by Verma et al [25 26] havebeen proposed in recent yearsThese methods would providemore powerful tools for traffic speed data analysis This willbe considered in our future work to help us to understand theintrinsic features underlying the traffic speed data

4 HaLRTC for Traffic Speed Completion

Based on the above correlations analysis of traffic speed datathe tensor model is firstly constructed along different modesThe correlations of traffic speed data are critical for recoveringthe missing traffic speed data Traditional methods mostlyexploit part of correlations such as historical or temporalneighboring correlations The classic methods usually utilizethe temporal correlations of traffic speed data fromday to daySuch as the single detector data multiple correlations containthe relations of traffic speed data from day to day hour tohour and so forth In addition the spatial correlations existin multiple detectors speed data

Conventional methods usually use day to day matrixpattern to model the traffic speed data Although each modeof traffic speed data has a very high similarity these methodsdo not utilize the multimode correlations which are ldquoDay timesHourrdquo ldquoWeektimesHourrdquo and ldquoLinktimesHourrdquo simultaneously andthus may result in poor recovery performance

To make full use of the multimode correlations andspatial-temporal information traffic speed data need to beconstructed into multiway data model Fortunately tensorpattern based traffic speed data can be well used to modelthe multiway traffic speed data This helps keep the originalstructure and employ enough spatial-temporal informationFor example the speed data set which is used for correlationsanalysis in Section 3 can be constructed as a 13 times 28 times 24 times12 tensor model according the PCCs computed along eachmode Here the speed tensor of size 13 times 28 times 24 times 12which stands for 13 detectors 28 days 24 hours in a day

and 12 sampling intervals in a hour (ie sampling interval is5min) For imputing the missing speed data the built tensormodel can keep up the integrity of speed data structure andexploit multimode correlations simultaneously Meanwhilean efficient algorithm is equally important

Considering the higher fluctuation characteristics of traf-fic speed data a high accuracy low rank tensor completion(HaLRTC) algorithm [17] is used for imputing missing trafficspeed data in this paper In the following a single detectorspeed profile is created as a three-order tensor X isin R119897times119898times119899

for expressing simply also it is the same formultiple detectorsspeed data The speed tensor X isin R119897times119898times119899 contains averagespeed values for 119897 days 119898 hours and 119899 intervals Howevernot all values in X are known Let Ω be the set of valuesfor which speed data is available Just as the correlationsanalysis in Section 3 strong temporal and spatial correlationsare exhibited Hence the speed data can be represented asa low-dimensional structure Thus the problem of imputingmissing traffic speed data can be solved by the optimizationproblem for low rank tensor completion

minX

rank (X)

st XΩ= TΩ

(2)

where X T are 119899-mode tensors with identical size ineach mode However the rank of tensor is not unique andnonconvex [27] One common approach is to use the tracenorm sdot

lowastwhich is the tightest convex envelop for the

rank of tensors to approximate the rank of tensors Using thedefinition of trace norm of tensor in [17] the optimizationproblem can be converted into

minX

119899

sum

119894=1

120572119894

1003817100381710038171003817X(119894)

1003817100381710038171003817lowast

st XΩ= TΩ

(3)

where 120572119894rsquos are constants satisfying 120572

119894ge 0 and sum

119899

119894=1120572119894=

1 To solve the optimization problem additional tensorsM1 M

119899are introduced and derive the following equiv-

alent formulation

minXM1M119899

119899

sum

119894=1

120572119894

1003817100381710038171003817M119894(119894)

1003817100381710038171003817lowast

st X =M119894

for 119894 = 1 119899

XΩ= TΩ

(4)

In this paper a high accuracy algorithm called ADMM [28] isintroduced to tackle the large scale problem The augmentedLagrangian function of (4) is as follows

119871120588(XM

1 M

119899Y1 Y

119899)

=

119899

sum

119894=1

120572119894

1003817100381710038171003817M119894(119894)

1003817100381710038171003817lowast+ ⟨X minusM

119894Y119894⟩

+

120588

2

1003817100381710038171003817M119894minusX

1003817100381710038171003817

2

119865

(5)

6 Computational Intelligence and Neuroscience

InputX withXΩ= TΩ 120588 and 119870

OutputX(1) SetX

Ω= TΩandX

Ω= 0

(2) for 119896 = 0 to 119870 do(3) for 119894 = 1 to 119899 do

(4) M119894= fold

119894[119863(120572119894120588)

(X(119894)+

1

120588

Y119894(119894))]

(5) end for

(6) XΩ=

1

119899

(

119899

sum

119894=1

M119894minus

1

120588

Y119894)

Ω

(7) Y119894= Y119894minus 120588(M

119894minusX)

(8) end for

Algorithm 1 HaLRTC high accuracy low rank tensor completion

According to the framework of ADMM and the algorithmdescription in [17] the HaLRTC algorithm is summarized inAlgorithm 1

5 Experiments

The performances of HaLRTC algorithm are evaluated onthe real world traffic speed data The experimental settingsare listed in Section 51 and evaluation indices are shown inSection 52 In Section 53 temporal correlations are exploitedfor imputing missing speed data for a single detector Addi-tionally Section 54 tests the algorithms for data sets ofmultiple detectors

51 Experimental Settings HaLRTC is compared with twoclassical imputation methods (1) Mean-Historical imputa-tion [29] and (2) BPCA-based imputation method [10]

For the historical imputation method we calculate themean value of all the available data points belonging to thesame detector at the same time interval in the last few daysand use the mean as the imputed value see [29] for moredetails For BPCA we set the maximum number of iterationsteps is 200 and the threshold of the approximate complexityis set to 10minus4 which is the same as [10] HaLRTCmethod is aniterative algorithm themaximum iterative numbers are set as500 The value of 120572

119894is set to 1119899 (119899 is the mode number) and

the parameter 120588 is set as 119890minus3

52 Evaluation Indices To evaluate the performances of theproposed method HaLRTC the following two indices wereused in this paper

(1) Mean Absolute Percentage Error (MAPE) The index givesthe evaluation of the average estimation error in terms ofpercentage

MAPE = 1

119872

119872

sum

119898=1

1003816100381610038161003816100381610038161003816100381610038161003816

119905(119898)

119903minus 119905(119898)

119890

119905(119898)

119903

1003816100381610038161003816100381610038161003816100381610038161003816

times 100 (6)

01 02 03 04 05 06 07 08

14

13

12

11

10

9

8

7

6

5

4

Missing ratio

RMSE

(mpe

)

BPCAMean

HaLRTC

Figure 5 RMSE for weekdaysrsquo data of single detector

(2) Root Mean Square Error (RMSE) This index gives theevaluation of the variance in the estimation errors

RMSE = radic 1

119872

119872

sum

119898=1

(119905(119898)

119903 minus 119905(119898)

119890 )

2

(7)

where 119905(119898)119903

and 119905(119898)

119890are the 119898th elements which stand for

the known real value and estimated value respectively 119872denotes the number of estimated traffic volumes

53 Results forMissing Speed Data of Single Detector To illus-trate the performances of the proposed method a completetraffic speed data set is used as ground truth for the test Wechoose the data of a fixed detector VDS 716331 in District7 Los Angeles County (see Figure 1) which are downloadedfrom (PeMS httppemsdotcagov) The traffic speed dataare recorded every 5 minutes Therefore a daily traffic speedseries for a loop detector contains 288 records and the wholeperiod of the data is chosen for three weeks that is fromFebruary 4 to February 24 2013

Based on the correlations analysis of traffic speed data inSection 3 the correlations are stronger for weekdays withoutregard to the weekends And it will be helpful for imputingthe missing traffic speed data Therefore the weekdaysrsquo datafor three weeks (ie 15 days) are chosen for evaluating theHaLRTC algorithm

Based on multiple correlations of the traffic speed datathe data set is modeled as a tensor of size 24 times 12 times 15 whichstands for 24 hours in a day 12 sample intervals (ie recordedby 5 minutes) per hour and 15 days For the methods Meanand BPCA the speed data is arranged as amatrix of size 288times15 The ratios of missing data are set from 5 to 80 andthe missing data are produced randomly All the results areaverage by 10 instances

The RMSE curves and MAPE curves of those methodswith randomly missing weekdaysrsquo traffic speed data for asingle loop detector are shown in Figures 5 and 6 Obviously

Computational Intelligence and Neuroscience 7

20

15

10

5

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 6 MAPE for weekdaysrsquo data of single detector

12

11

10

9

8

7

6

5

4

RMSE

(mpe

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 7 RMSE for data in Wednesday of single detector

the RMSE and MAPE of HaLRTC approach are smallerthan other approaches It is worth noting that the RMSEof HaLRTC will be bigger than RMSE of BPCA when themissing ratio is 80 That situation can be considered as adiploma when the HaLRTC will reach the limit for keepingthe high accurate

For exploiting stronger correlations of traffic speed dataWednesdayrsquos data of a specific week are chosen and 10 weeksdata make up a data set with 10 days Identically the data setis modeled as a tensor of size 24times12times10which stands for 24hours in a day 12 sample intervals (ie recorded by 5minutes)per hour and 10 days And a 288times10matrixmodel is arrangedfor Mean and BPCA

18

16

14

12

10

8

6

4

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 8 MAPE for data in Wednesday of single detector

Figures 7 and 8 show that HaLRTC algorithm is withhigher accuracy for imputing missing traffic speed data thanMean and BPCA Comparing Figures 5 and 7 correspondingwith comparison of Figures 6 and 8 the performance forimputing missing traffic speed data will better when exploit-ing more correlations of data sets

54 Results for Missing Speed Data of Multiple Detectors Toillustrate the benefit of HaLRTC algorithm in reconstructingthe missing traffic speed data that is multiple correlationsof traffic speed data are more beneficial than the partialcorrelations used in traditional methods A new traffic speeddata set by considering spatial correlations is used to evaluateHaLRTC algorithm

We choose the data of 13 detectors in District 7 LosAngeles County (see Figure 1) which are downloaded from(PeMS httppemsdotcagov) For each detector the trafficspeed data for a specific day are chosen andderive the data setFor exploiting the spatial and temporal correlations of trafficspeed data simultaneously the data set is modeled as a tensorof size 24 times 12 times 13 which stands for 24 hours in a day 12sampling intervals in an hour and 13 detectors Meanwhilethe data set is arranged as a matrix 288 times 13 for Mean andBPCA

Figures 9 and 10 show that our propose HaLRTC out-performs Mean and BPCA for traffic speed data of multipledetectors

6 Conclusion

In this paper a multiway tensor model is proposed torepresent the traffic speed data considering the multiplecorrelations and a high accuracy low rank tensor completion(HaLRTC) algorithm is employed to estimate the missingtraffic speed data due to its severely fluctuation Experiments

8 Computational Intelligence and Neuroscience

12

14

16

10

8

6

4

RMSE

(mpe

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 9 RMSE for data of multiple detectors

30

25

20

15

10

5

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 10 MAPE for data of multiple detectors

on benchmark show that the proposed method performsbetter than other classical state-of-the-art methods Forfuture work it is interesting to extend the proposed methodto large and dynamic road network

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The research was supported by NSFC (Grant nos 61271376and 51308115) the National Basic Research Program of China(973 Program no 2012CB725405) Beijing Natural Science

Foundation (4122067) and the Basic Research Fund ofBeijing Institute of Technology (20120342026)

References

[1] J Zhang F-Y Wang K Wang W-H Lin X Xu and C ChenldquoData-driven intelligent transportation systems a surveyrdquo IEEETransactions on Intelligent Transportation Systems vol 12 no 4pp 1624ndash1639 2011

[2] V Jain A Sharma and L Subramanian ldquoRoad traffic conges-tion in the developing worldrdquo in Proceedings of the 2nd ACMSymposium on Computing for Development p 11 2012

[3] H S Zhang Y Zhang Z H Li andD C Hu ldquoSpatial-temporaltraffic data analysis based on global data management usingMASrdquo IEEE Transactions on Intelligent Transportation Systemsvol 5 no 4 pp 267ndash275 2004

[4] J Chen and J Shao ldquoNearest neighbor imputation for surveydatardquo Journal ofOfficial Statistics vol 16 no 2 pp 113ndash132 2000

[5] P D Allison and M Data University Papers Series on Quanti-tative Applications in the Social Sciences Sage Thousand OaksCalif USA 2001

[6] H Chen S Grant-Muller L Mussone and F MontgomeryldquoA study of hybrid neural network approaches and the effectsof missing data on traffic forecastingrdquo Neural Computing ampApplications vol 10 no 3 pp 277ndash286 2001

[7] L Qu L Li Y Zhang and J Hu ldquoPPCA-based missing dataimputation for traffic flow volume a systematical approachrdquoIEEE Transactions on Intelligent Transportation Systems vol 10no 3 pp 512ndash522 2009

[8] L Wang Z Li and C Song ldquoNetwork traffic prediction basedon seasonal ARIMA modelrdquo in Proceedings of the 5th IEEEWorld Congress on Intelligent Control and Automation vol 2pp 1425ndash1428 2004

[9] X Jin S-K Hong and Q Ma ldquoAn algorithm to estimatecontinuous-time traffic speed usingmultiple regressionmodelrdquoInformation Technology Journal vol 5 no 2 pp 281ndash284 2006

[10] L Qu Y Zhang J Hu L Jia and L Li ldquoA BPCA basedmissing value imputing method for traffic flow volume datardquo inProceedings of the IEEE Intelligent Vehicles Symposium (IV rsquo08)pp 985ndash990 June 2008

[11] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[12] P Widhalm M Piff N Brandle H Koller and M ReinthalerldquoRobust road link speed estimates for sparse or missing probevehicle datardquo in Proceedings of the 15th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo12) pp1693ndash1697 September 2012

[13] H Tan G Feng J Feng W Wang Y-J Zhang and F LildquoA tensor-based method for missing traffic data completionrdquoTransportation Research Part C Emerging Technologies vol 28pp 15ndash27 2013

[14] M T Asif N Mitrovic L Garg J Dauwels and P Jaillet ldquoLow-dimensional models for missing data imputation in road net-worksrdquo in Proceedings of the 38th IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo13) pp3527ndash3531 May 2013

[15] S Ma D Goldfarb and L Chen ldquoFixed point and Bregmaniterative methods for matrix rank minimizationrdquoMathematicalProgramming vol 128 no 1-2 pp 321ndash353 2011

Computational Intelligence and Neuroscience 9

[16] E Acar D M Dunlavy T G Kolda and M Moslashrup ldquoScalabletensor factorizations for incomplete datardquo Chemometrics andIntelligent Laboratory Systems vol 106 no 1 pp 41ndash56 2011

[17] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 35 no 1 pp 208ndash220 2013

[18] E J Candes and B Recht ldquoExact matrix completion via convexoptimizationrdquo Foundations of Computational Mathematics vol9 no 6 pp 717ndash772 2009

[19] J F Cai E J Candes andZ Shen ldquoA singular value thresholdingalgorithm for matrix completionrdquo SIAM Journal on Optimiza-tion vol 20 no 4 pp 1956ndash1982 2010

[20] K Lee and Y Bresler ldquoADMiRA atomic decomposition forminimum rank approximationrdquo IEEE Transactions on Informa-tion Theory vol 56 no 9 pp 4402ndash4416 2010

[21] R H Keshavan A Montanari and S Oh ldquoMatrix completionfrom a few entriesrdquo IEEE Transactions on Information Theoryvol 56 no 6 pp 2980ndash2998 2010

[22] M Signoretto Q Tran Dinh L de Lathauwer and J A KSuykens ldquoLearning with tensors a framework based on convexoptimization and spectral regularizationrdquo Machine Learningvol 94 no 3 pp 303ndash351 2014

[23] Y Zhang and Y Liu ldquoMissing traffic flow data predictionusing least squares support vector machines in urban arterialstreetsrdquo inProceedings of the IEEE SymposiumonComputationalIntelligence and DataMining (CIDM rsquo09) pp 76ndash83 April 2009

[24] V Barnett and T LewisOutliers in Statistical Data Wiley Seriesin Probability and Mathematical Statistics Applied Probabilityand Statistics John Wiley amp Sons Chichester UK 3rd edition1994

[25] S P Verma R Cruz-Huicochea and L Dıaz-Gonzalez ldquoUni-variate data analysis system deciphering mean compositionsof island and continental arc magmas and influence of theunderlying crustrdquo International Geology Review vol 55 no 15pp 1922ndash1940 2013

[26] S P Verma L Dıaz-Gonzalez M Rosales-Rivera and AQuiroz-Ruiz ldquoComparative performance of four single extremeoutlier discordancy tests from Monte Carlo simulationsrdquo TheScientific World Journal vol 2014 Article ID 746451 27 pages2014

[27] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[28] Z Lin M Chen L Wu and Y Ma ldquoThe augmented lagrangemultiplier method for exact recovery of corrupted low-rankmatricesrdquo UIUC Technical Report UILU-ENG-09-2215 2009

[29] D Kahaner C Moler and S Nash Numerical Methods andSoftware Prentice Hall Englewood Cliffs NJ USA 1989

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article Traffic Speed Data Imputation Method Based ...downloads.hindawi.com/journals/cin/2015/364089.pdfResearch Article Traffic Speed Data Imputation Method Based on Tensor

4 Computational Intelligence and Neuroscience

Table 1 Pearson Correlation Coefficient between five weekdays for VDS 716331

PCC Link1 Link2 Link3 Link4 Link5 Link6 Link7Link1 10000 09617 09454 08255 08698 07567 07187Link2 09617 10000 09660 08735 08584 08012 07564Link3 09454 09660 10000 09233 09247 08521 07972Link4 08255 08735 09233 10000 08507 09526 08982Link5 08698 08584 09247 08507 10000 08325 07898Link6 07567 08012 08521 09526 08325 10000 09694Link7 07187 07564 07972 08982 07898 09694 10000

Table 2 Pearson Correlation Coefficient between five weekdays for VDS 716331

PCC Monday Tuesday Wednesday Thursday FridayMonday 10000 06082 06184 06286 07261Tuesday 06082 10000 07757 07124 06882Wednesday 06184 07757 10000 08264 07382Thursday 06286 07124 08264 10000 06609Friday 07261 06882 07382 06609 10000

MondayTuesdayWednesday

ThursdayFriday

0 50 100 150 200 250 300

80

70

60

50

40

30

20

10

Time interval (5min)

Spee

d (m

ph)

Figure 3 The daily profile of the traffic speed data of five weekdaysfor VDS 716331

between each Mondayrsquos speed data is computed in Table 1Here the PCC for vector 119909 and 119910 is defined as

120588119909119910

=

cov (119909 119910)120590119909120590119910

=

119864 ((119909 minus 119864 (119909)) (119910 minus 119864 (119910)))

120590119909120590119910

(1)

where cov(sdot) stands for the covariance and 119864(sdot) stands for themathematical expectation

Besides the strong spatial correlations of traffic speeddata the temporal correlations are also prominent In orderto illustrate the temporal correlations of the traffic speed datathe daily data for five weekdays during a week is plottedin Figure 3 The correlations of the speed data between

0206201302132013

0220201302272013

0 50 100 150 200 250 300

80

70

60

50

40

30

20

Time interval (5min)

Spee

d (m

ph)

Figure 4 The daily profile of the traffic speed data on Wednesdayin February 2013

each weekday are obvious However the fluctuations of eachdata profile are notable this is the inherent property oftraffic speed data especially when the traffic condition is incongestion Also the PCCs of the data sets are shown inTable 2

It is worth mentioning that the correlations along week-to-week mode should be more prominent For verifying theidea the speed data on Wednesday for a month (February2013) are plotted in Figure 4 and the PCCs are shown inTable 3 The correlations are stronger than those betweenweekdays

Simultaneously the correlations of traffic speed data areof multiple patterns The interval-to-interval correlations are

Computational Intelligence and Neuroscience 5

Table 3 PearsonCorrelationCoefficient onWednesday in February2013

PCC Monday Tuesday Wednesday ThursdayMonday 10000 08733 07656 08738Tuesday 08733 10000 08422 08890Wednesday 07656 08422 10000 08139Thursday 08738 08890 08139 10000

usually ignored because it may be less apparent than day-to-day correlations After making statistics and analysis thePCCs of interval-to-interval pattern are averaged as about70The temporal correlations are not so stronger than trafficvolume data Thus more accurate methods are needed todevelop missing traffic speed data

According to the above analysis it is sufficient to saythat traffic speed data on a freeway corridor exhibit a strongcorrelation in multimode In day mode as well as weekmode space as well as interval mode PCCs are about 07It should be noted that PCC value can be underestimatedandor misleading if outliers that are ubiquitous in trafficspeed data are present [24] Some robust statistical methodssuch as the methods proposed by Verma et al [25 26] havebeen proposed in recent yearsThese methods would providemore powerful tools for traffic speed data analysis This willbe considered in our future work to help us to understand theintrinsic features underlying the traffic speed data

4 HaLRTC for Traffic Speed Completion

Based on the above correlations analysis of traffic speed datathe tensor model is firstly constructed along different modesThe correlations of traffic speed data are critical for recoveringthe missing traffic speed data Traditional methods mostlyexploit part of correlations such as historical or temporalneighboring correlations The classic methods usually utilizethe temporal correlations of traffic speed data fromday to daySuch as the single detector data multiple correlations containthe relations of traffic speed data from day to day hour tohour and so forth In addition the spatial correlations existin multiple detectors speed data

Conventional methods usually use day to day matrixpattern to model the traffic speed data Although each modeof traffic speed data has a very high similarity these methodsdo not utilize the multimode correlations which are ldquoDay timesHourrdquo ldquoWeektimesHourrdquo and ldquoLinktimesHourrdquo simultaneously andthus may result in poor recovery performance

To make full use of the multimode correlations andspatial-temporal information traffic speed data need to beconstructed into multiway data model Fortunately tensorpattern based traffic speed data can be well used to modelthe multiway traffic speed data This helps keep the originalstructure and employ enough spatial-temporal informationFor example the speed data set which is used for correlationsanalysis in Section 3 can be constructed as a 13 times 28 times 24 times12 tensor model according the PCCs computed along eachmode Here the speed tensor of size 13 times 28 times 24 times 12which stands for 13 detectors 28 days 24 hours in a day

and 12 sampling intervals in a hour (ie sampling interval is5min) For imputing the missing speed data the built tensormodel can keep up the integrity of speed data structure andexploit multimode correlations simultaneously Meanwhilean efficient algorithm is equally important

Considering the higher fluctuation characteristics of traf-fic speed data a high accuracy low rank tensor completion(HaLRTC) algorithm [17] is used for imputing missing trafficspeed data in this paper In the following a single detectorspeed profile is created as a three-order tensor X isin R119897times119898times119899

for expressing simply also it is the same formultiple detectorsspeed data The speed tensor X isin R119897times119898times119899 contains averagespeed values for 119897 days 119898 hours and 119899 intervals Howevernot all values in X are known Let Ω be the set of valuesfor which speed data is available Just as the correlationsanalysis in Section 3 strong temporal and spatial correlationsare exhibited Hence the speed data can be represented asa low-dimensional structure Thus the problem of imputingmissing traffic speed data can be solved by the optimizationproblem for low rank tensor completion

minX

rank (X)

st XΩ= TΩ

(2)

where X T are 119899-mode tensors with identical size ineach mode However the rank of tensor is not unique andnonconvex [27] One common approach is to use the tracenorm sdot

lowastwhich is the tightest convex envelop for the

rank of tensors to approximate the rank of tensors Using thedefinition of trace norm of tensor in [17] the optimizationproblem can be converted into

minX

119899

sum

119894=1

120572119894

1003817100381710038171003817X(119894)

1003817100381710038171003817lowast

st XΩ= TΩ

(3)

where 120572119894rsquos are constants satisfying 120572

119894ge 0 and sum

119899

119894=1120572119894=

1 To solve the optimization problem additional tensorsM1 M

119899are introduced and derive the following equiv-

alent formulation

minXM1M119899

119899

sum

119894=1

120572119894

1003817100381710038171003817M119894(119894)

1003817100381710038171003817lowast

st X =M119894

for 119894 = 1 119899

XΩ= TΩ

(4)

In this paper a high accuracy algorithm called ADMM [28] isintroduced to tackle the large scale problem The augmentedLagrangian function of (4) is as follows

119871120588(XM

1 M

119899Y1 Y

119899)

=

119899

sum

119894=1

120572119894

1003817100381710038171003817M119894(119894)

1003817100381710038171003817lowast+ ⟨X minusM

119894Y119894⟩

+

120588

2

1003817100381710038171003817M119894minusX

1003817100381710038171003817

2

119865

(5)

6 Computational Intelligence and Neuroscience

InputX withXΩ= TΩ 120588 and 119870

OutputX(1) SetX

Ω= TΩandX

Ω= 0

(2) for 119896 = 0 to 119870 do(3) for 119894 = 1 to 119899 do

(4) M119894= fold

119894[119863(120572119894120588)

(X(119894)+

1

120588

Y119894(119894))]

(5) end for

(6) XΩ=

1

119899

(

119899

sum

119894=1

M119894minus

1

120588

Y119894)

Ω

(7) Y119894= Y119894minus 120588(M

119894minusX)

(8) end for

Algorithm 1 HaLRTC high accuracy low rank tensor completion

According to the framework of ADMM and the algorithmdescription in [17] the HaLRTC algorithm is summarized inAlgorithm 1

5 Experiments

The performances of HaLRTC algorithm are evaluated onthe real world traffic speed data The experimental settingsare listed in Section 51 and evaluation indices are shown inSection 52 In Section 53 temporal correlations are exploitedfor imputing missing speed data for a single detector Addi-tionally Section 54 tests the algorithms for data sets ofmultiple detectors

51 Experimental Settings HaLRTC is compared with twoclassical imputation methods (1) Mean-Historical imputa-tion [29] and (2) BPCA-based imputation method [10]

For the historical imputation method we calculate themean value of all the available data points belonging to thesame detector at the same time interval in the last few daysand use the mean as the imputed value see [29] for moredetails For BPCA we set the maximum number of iterationsteps is 200 and the threshold of the approximate complexityis set to 10minus4 which is the same as [10] HaLRTCmethod is aniterative algorithm themaximum iterative numbers are set as500 The value of 120572

119894is set to 1119899 (119899 is the mode number) and

the parameter 120588 is set as 119890minus3

52 Evaluation Indices To evaluate the performances of theproposed method HaLRTC the following two indices wereused in this paper

(1) Mean Absolute Percentage Error (MAPE) The index givesthe evaluation of the average estimation error in terms ofpercentage

MAPE = 1

119872

119872

sum

119898=1

1003816100381610038161003816100381610038161003816100381610038161003816

119905(119898)

119903minus 119905(119898)

119890

119905(119898)

119903

1003816100381610038161003816100381610038161003816100381610038161003816

times 100 (6)

01 02 03 04 05 06 07 08

14

13

12

11

10

9

8

7

6

5

4

Missing ratio

RMSE

(mpe

)

BPCAMean

HaLRTC

Figure 5 RMSE for weekdaysrsquo data of single detector

(2) Root Mean Square Error (RMSE) This index gives theevaluation of the variance in the estimation errors

RMSE = radic 1

119872

119872

sum

119898=1

(119905(119898)

119903 minus 119905(119898)

119890 )

2

(7)

where 119905(119898)119903

and 119905(119898)

119890are the 119898th elements which stand for

the known real value and estimated value respectively 119872denotes the number of estimated traffic volumes

53 Results forMissing Speed Data of Single Detector To illus-trate the performances of the proposed method a completetraffic speed data set is used as ground truth for the test Wechoose the data of a fixed detector VDS 716331 in District7 Los Angeles County (see Figure 1) which are downloadedfrom (PeMS httppemsdotcagov) The traffic speed dataare recorded every 5 minutes Therefore a daily traffic speedseries for a loop detector contains 288 records and the wholeperiod of the data is chosen for three weeks that is fromFebruary 4 to February 24 2013

Based on the correlations analysis of traffic speed data inSection 3 the correlations are stronger for weekdays withoutregard to the weekends And it will be helpful for imputingthe missing traffic speed data Therefore the weekdaysrsquo datafor three weeks (ie 15 days) are chosen for evaluating theHaLRTC algorithm

Based on multiple correlations of the traffic speed datathe data set is modeled as a tensor of size 24 times 12 times 15 whichstands for 24 hours in a day 12 sample intervals (ie recordedby 5 minutes) per hour and 15 days For the methods Meanand BPCA the speed data is arranged as amatrix of size 288times15 The ratios of missing data are set from 5 to 80 andthe missing data are produced randomly All the results areaverage by 10 instances

The RMSE curves and MAPE curves of those methodswith randomly missing weekdaysrsquo traffic speed data for asingle loop detector are shown in Figures 5 and 6 Obviously

Computational Intelligence and Neuroscience 7

20

15

10

5

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 6 MAPE for weekdaysrsquo data of single detector

12

11

10

9

8

7

6

5

4

RMSE

(mpe

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 7 RMSE for data in Wednesday of single detector

the RMSE and MAPE of HaLRTC approach are smallerthan other approaches It is worth noting that the RMSEof HaLRTC will be bigger than RMSE of BPCA when themissing ratio is 80 That situation can be considered as adiploma when the HaLRTC will reach the limit for keepingthe high accurate

For exploiting stronger correlations of traffic speed dataWednesdayrsquos data of a specific week are chosen and 10 weeksdata make up a data set with 10 days Identically the data setis modeled as a tensor of size 24times12times10which stands for 24hours in a day 12 sample intervals (ie recorded by 5minutes)per hour and 10 days And a 288times10matrixmodel is arrangedfor Mean and BPCA

18

16

14

12

10

8

6

4

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 8 MAPE for data in Wednesday of single detector

Figures 7 and 8 show that HaLRTC algorithm is withhigher accuracy for imputing missing traffic speed data thanMean and BPCA Comparing Figures 5 and 7 correspondingwith comparison of Figures 6 and 8 the performance forimputing missing traffic speed data will better when exploit-ing more correlations of data sets

54 Results for Missing Speed Data of Multiple Detectors Toillustrate the benefit of HaLRTC algorithm in reconstructingthe missing traffic speed data that is multiple correlationsof traffic speed data are more beneficial than the partialcorrelations used in traditional methods A new traffic speeddata set by considering spatial correlations is used to evaluateHaLRTC algorithm

We choose the data of 13 detectors in District 7 LosAngeles County (see Figure 1) which are downloaded from(PeMS httppemsdotcagov) For each detector the trafficspeed data for a specific day are chosen andderive the data setFor exploiting the spatial and temporal correlations of trafficspeed data simultaneously the data set is modeled as a tensorof size 24 times 12 times 13 which stands for 24 hours in a day 12sampling intervals in an hour and 13 detectors Meanwhilethe data set is arranged as a matrix 288 times 13 for Mean andBPCA

Figures 9 and 10 show that our propose HaLRTC out-performs Mean and BPCA for traffic speed data of multipledetectors

6 Conclusion

In this paper a multiway tensor model is proposed torepresent the traffic speed data considering the multiplecorrelations and a high accuracy low rank tensor completion(HaLRTC) algorithm is employed to estimate the missingtraffic speed data due to its severely fluctuation Experiments

8 Computational Intelligence and Neuroscience

12

14

16

10

8

6

4

RMSE

(mpe

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 9 RMSE for data of multiple detectors

30

25

20

15

10

5

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 10 MAPE for data of multiple detectors

on benchmark show that the proposed method performsbetter than other classical state-of-the-art methods Forfuture work it is interesting to extend the proposed methodto large and dynamic road network

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The research was supported by NSFC (Grant nos 61271376and 51308115) the National Basic Research Program of China(973 Program no 2012CB725405) Beijing Natural Science

Foundation (4122067) and the Basic Research Fund ofBeijing Institute of Technology (20120342026)

References

[1] J Zhang F-Y Wang K Wang W-H Lin X Xu and C ChenldquoData-driven intelligent transportation systems a surveyrdquo IEEETransactions on Intelligent Transportation Systems vol 12 no 4pp 1624ndash1639 2011

[2] V Jain A Sharma and L Subramanian ldquoRoad traffic conges-tion in the developing worldrdquo in Proceedings of the 2nd ACMSymposium on Computing for Development p 11 2012

[3] H S Zhang Y Zhang Z H Li andD C Hu ldquoSpatial-temporaltraffic data analysis based on global data management usingMASrdquo IEEE Transactions on Intelligent Transportation Systemsvol 5 no 4 pp 267ndash275 2004

[4] J Chen and J Shao ldquoNearest neighbor imputation for surveydatardquo Journal ofOfficial Statistics vol 16 no 2 pp 113ndash132 2000

[5] P D Allison and M Data University Papers Series on Quanti-tative Applications in the Social Sciences Sage Thousand OaksCalif USA 2001

[6] H Chen S Grant-Muller L Mussone and F MontgomeryldquoA study of hybrid neural network approaches and the effectsof missing data on traffic forecastingrdquo Neural Computing ampApplications vol 10 no 3 pp 277ndash286 2001

[7] L Qu L Li Y Zhang and J Hu ldquoPPCA-based missing dataimputation for traffic flow volume a systematical approachrdquoIEEE Transactions on Intelligent Transportation Systems vol 10no 3 pp 512ndash522 2009

[8] L Wang Z Li and C Song ldquoNetwork traffic prediction basedon seasonal ARIMA modelrdquo in Proceedings of the 5th IEEEWorld Congress on Intelligent Control and Automation vol 2pp 1425ndash1428 2004

[9] X Jin S-K Hong and Q Ma ldquoAn algorithm to estimatecontinuous-time traffic speed usingmultiple regressionmodelrdquoInformation Technology Journal vol 5 no 2 pp 281ndash284 2006

[10] L Qu Y Zhang J Hu L Jia and L Li ldquoA BPCA basedmissing value imputing method for traffic flow volume datardquo inProceedings of the IEEE Intelligent Vehicles Symposium (IV rsquo08)pp 985ndash990 June 2008

[11] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[12] P Widhalm M Piff N Brandle H Koller and M ReinthalerldquoRobust road link speed estimates for sparse or missing probevehicle datardquo in Proceedings of the 15th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo12) pp1693ndash1697 September 2012

[13] H Tan G Feng J Feng W Wang Y-J Zhang and F LildquoA tensor-based method for missing traffic data completionrdquoTransportation Research Part C Emerging Technologies vol 28pp 15ndash27 2013

[14] M T Asif N Mitrovic L Garg J Dauwels and P Jaillet ldquoLow-dimensional models for missing data imputation in road net-worksrdquo in Proceedings of the 38th IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo13) pp3527ndash3531 May 2013

[15] S Ma D Goldfarb and L Chen ldquoFixed point and Bregmaniterative methods for matrix rank minimizationrdquoMathematicalProgramming vol 128 no 1-2 pp 321ndash353 2011

Computational Intelligence and Neuroscience 9

[16] E Acar D M Dunlavy T G Kolda and M Moslashrup ldquoScalabletensor factorizations for incomplete datardquo Chemometrics andIntelligent Laboratory Systems vol 106 no 1 pp 41ndash56 2011

[17] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 35 no 1 pp 208ndash220 2013

[18] E J Candes and B Recht ldquoExact matrix completion via convexoptimizationrdquo Foundations of Computational Mathematics vol9 no 6 pp 717ndash772 2009

[19] J F Cai E J Candes andZ Shen ldquoA singular value thresholdingalgorithm for matrix completionrdquo SIAM Journal on Optimiza-tion vol 20 no 4 pp 1956ndash1982 2010

[20] K Lee and Y Bresler ldquoADMiRA atomic decomposition forminimum rank approximationrdquo IEEE Transactions on Informa-tion Theory vol 56 no 9 pp 4402ndash4416 2010

[21] R H Keshavan A Montanari and S Oh ldquoMatrix completionfrom a few entriesrdquo IEEE Transactions on Information Theoryvol 56 no 6 pp 2980ndash2998 2010

[22] M Signoretto Q Tran Dinh L de Lathauwer and J A KSuykens ldquoLearning with tensors a framework based on convexoptimization and spectral regularizationrdquo Machine Learningvol 94 no 3 pp 303ndash351 2014

[23] Y Zhang and Y Liu ldquoMissing traffic flow data predictionusing least squares support vector machines in urban arterialstreetsrdquo inProceedings of the IEEE SymposiumonComputationalIntelligence and DataMining (CIDM rsquo09) pp 76ndash83 April 2009

[24] V Barnett and T LewisOutliers in Statistical Data Wiley Seriesin Probability and Mathematical Statistics Applied Probabilityand Statistics John Wiley amp Sons Chichester UK 3rd edition1994

[25] S P Verma R Cruz-Huicochea and L Dıaz-Gonzalez ldquoUni-variate data analysis system deciphering mean compositionsof island and continental arc magmas and influence of theunderlying crustrdquo International Geology Review vol 55 no 15pp 1922ndash1940 2013

[26] S P Verma L Dıaz-Gonzalez M Rosales-Rivera and AQuiroz-Ruiz ldquoComparative performance of four single extremeoutlier discordancy tests from Monte Carlo simulationsrdquo TheScientific World Journal vol 2014 Article ID 746451 27 pages2014

[27] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[28] Z Lin M Chen L Wu and Y Ma ldquoThe augmented lagrangemultiplier method for exact recovery of corrupted low-rankmatricesrdquo UIUC Technical Report UILU-ENG-09-2215 2009

[29] D Kahaner C Moler and S Nash Numerical Methods andSoftware Prentice Hall Englewood Cliffs NJ USA 1989

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article Traffic Speed Data Imputation Method Based ...downloads.hindawi.com/journals/cin/2015/364089.pdfResearch Article Traffic Speed Data Imputation Method Based on Tensor

Computational Intelligence and Neuroscience 5

Table 3 PearsonCorrelationCoefficient onWednesday in February2013

PCC Monday Tuesday Wednesday ThursdayMonday 10000 08733 07656 08738Tuesday 08733 10000 08422 08890Wednesday 07656 08422 10000 08139Thursday 08738 08890 08139 10000

usually ignored because it may be less apparent than day-to-day correlations After making statistics and analysis thePCCs of interval-to-interval pattern are averaged as about70The temporal correlations are not so stronger than trafficvolume data Thus more accurate methods are needed todevelop missing traffic speed data

According to the above analysis it is sufficient to saythat traffic speed data on a freeway corridor exhibit a strongcorrelation in multimode In day mode as well as weekmode space as well as interval mode PCCs are about 07It should be noted that PCC value can be underestimatedandor misleading if outliers that are ubiquitous in trafficspeed data are present [24] Some robust statistical methodssuch as the methods proposed by Verma et al [25 26] havebeen proposed in recent yearsThese methods would providemore powerful tools for traffic speed data analysis This willbe considered in our future work to help us to understand theintrinsic features underlying the traffic speed data

4 HaLRTC for Traffic Speed Completion

Based on the above correlations analysis of traffic speed datathe tensor model is firstly constructed along different modesThe correlations of traffic speed data are critical for recoveringthe missing traffic speed data Traditional methods mostlyexploit part of correlations such as historical or temporalneighboring correlations The classic methods usually utilizethe temporal correlations of traffic speed data fromday to daySuch as the single detector data multiple correlations containthe relations of traffic speed data from day to day hour tohour and so forth In addition the spatial correlations existin multiple detectors speed data

Conventional methods usually use day to day matrixpattern to model the traffic speed data Although each modeof traffic speed data has a very high similarity these methodsdo not utilize the multimode correlations which are ldquoDay timesHourrdquo ldquoWeektimesHourrdquo and ldquoLinktimesHourrdquo simultaneously andthus may result in poor recovery performance

To make full use of the multimode correlations andspatial-temporal information traffic speed data need to beconstructed into multiway data model Fortunately tensorpattern based traffic speed data can be well used to modelthe multiway traffic speed data This helps keep the originalstructure and employ enough spatial-temporal informationFor example the speed data set which is used for correlationsanalysis in Section 3 can be constructed as a 13 times 28 times 24 times12 tensor model according the PCCs computed along eachmode Here the speed tensor of size 13 times 28 times 24 times 12which stands for 13 detectors 28 days 24 hours in a day

and 12 sampling intervals in a hour (ie sampling interval is5min) For imputing the missing speed data the built tensormodel can keep up the integrity of speed data structure andexploit multimode correlations simultaneously Meanwhilean efficient algorithm is equally important

Considering the higher fluctuation characteristics of traf-fic speed data a high accuracy low rank tensor completion(HaLRTC) algorithm [17] is used for imputing missing trafficspeed data in this paper In the following a single detectorspeed profile is created as a three-order tensor X isin R119897times119898times119899

for expressing simply also it is the same formultiple detectorsspeed data The speed tensor X isin R119897times119898times119899 contains averagespeed values for 119897 days 119898 hours and 119899 intervals Howevernot all values in X are known Let Ω be the set of valuesfor which speed data is available Just as the correlationsanalysis in Section 3 strong temporal and spatial correlationsare exhibited Hence the speed data can be represented asa low-dimensional structure Thus the problem of imputingmissing traffic speed data can be solved by the optimizationproblem for low rank tensor completion

minX

rank (X)

st XΩ= TΩ

(2)

where X T are 119899-mode tensors with identical size ineach mode However the rank of tensor is not unique andnonconvex [27] One common approach is to use the tracenorm sdot

lowastwhich is the tightest convex envelop for the

rank of tensors to approximate the rank of tensors Using thedefinition of trace norm of tensor in [17] the optimizationproblem can be converted into

minX

119899

sum

119894=1

120572119894

1003817100381710038171003817X(119894)

1003817100381710038171003817lowast

st XΩ= TΩ

(3)

where 120572119894rsquos are constants satisfying 120572

119894ge 0 and sum

119899

119894=1120572119894=

1 To solve the optimization problem additional tensorsM1 M

119899are introduced and derive the following equiv-

alent formulation

minXM1M119899

119899

sum

119894=1

120572119894

1003817100381710038171003817M119894(119894)

1003817100381710038171003817lowast

st X =M119894

for 119894 = 1 119899

XΩ= TΩ

(4)

In this paper a high accuracy algorithm called ADMM [28] isintroduced to tackle the large scale problem The augmentedLagrangian function of (4) is as follows

119871120588(XM

1 M

119899Y1 Y

119899)

=

119899

sum

119894=1

120572119894

1003817100381710038171003817M119894(119894)

1003817100381710038171003817lowast+ ⟨X minusM

119894Y119894⟩

+

120588

2

1003817100381710038171003817M119894minusX

1003817100381710038171003817

2

119865

(5)

6 Computational Intelligence and Neuroscience

InputX withXΩ= TΩ 120588 and 119870

OutputX(1) SetX

Ω= TΩandX

Ω= 0

(2) for 119896 = 0 to 119870 do(3) for 119894 = 1 to 119899 do

(4) M119894= fold

119894[119863(120572119894120588)

(X(119894)+

1

120588

Y119894(119894))]

(5) end for

(6) XΩ=

1

119899

(

119899

sum

119894=1

M119894minus

1

120588

Y119894)

Ω

(7) Y119894= Y119894minus 120588(M

119894minusX)

(8) end for

Algorithm 1 HaLRTC high accuracy low rank tensor completion

According to the framework of ADMM and the algorithmdescription in [17] the HaLRTC algorithm is summarized inAlgorithm 1

5 Experiments

The performances of HaLRTC algorithm are evaluated onthe real world traffic speed data The experimental settingsare listed in Section 51 and evaluation indices are shown inSection 52 In Section 53 temporal correlations are exploitedfor imputing missing speed data for a single detector Addi-tionally Section 54 tests the algorithms for data sets ofmultiple detectors

51 Experimental Settings HaLRTC is compared with twoclassical imputation methods (1) Mean-Historical imputa-tion [29] and (2) BPCA-based imputation method [10]

For the historical imputation method we calculate themean value of all the available data points belonging to thesame detector at the same time interval in the last few daysand use the mean as the imputed value see [29] for moredetails For BPCA we set the maximum number of iterationsteps is 200 and the threshold of the approximate complexityis set to 10minus4 which is the same as [10] HaLRTCmethod is aniterative algorithm themaximum iterative numbers are set as500 The value of 120572

119894is set to 1119899 (119899 is the mode number) and

the parameter 120588 is set as 119890minus3

52 Evaluation Indices To evaluate the performances of theproposed method HaLRTC the following two indices wereused in this paper

(1) Mean Absolute Percentage Error (MAPE) The index givesthe evaluation of the average estimation error in terms ofpercentage

MAPE = 1

119872

119872

sum

119898=1

1003816100381610038161003816100381610038161003816100381610038161003816

119905(119898)

119903minus 119905(119898)

119890

119905(119898)

119903

1003816100381610038161003816100381610038161003816100381610038161003816

times 100 (6)

01 02 03 04 05 06 07 08

14

13

12

11

10

9

8

7

6

5

4

Missing ratio

RMSE

(mpe

)

BPCAMean

HaLRTC

Figure 5 RMSE for weekdaysrsquo data of single detector

(2) Root Mean Square Error (RMSE) This index gives theevaluation of the variance in the estimation errors

RMSE = radic 1

119872

119872

sum

119898=1

(119905(119898)

119903 minus 119905(119898)

119890 )

2

(7)

where 119905(119898)119903

and 119905(119898)

119890are the 119898th elements which stand for

the known real value and estimated value respectively 119872denotes the number of estimated traffic volumes

53 Results forMissing Speed Data of Single Detector To illus-trate the performances of the proposed method a completetraffic speed data set is used as ground truth for the test Wechoose the data of a fixed detector VDS 716331 in District7 Los Angeles County (see Figure 1) which are downloadedfrom (PeMS httppemsdotcagov) The traffic speed dataare recorded every 5 minutes Therefore a daily traffic speedseries for a loop detector contains 288 records and the wholeperiod of the data is chosen for three weeks that is fromFebruary 4 to February 24 2013

Based on the correlations analysis of traffic speed data inSection 3 the correlations are stronger for weekdays withoutregard to the weekends And it will be helpful for imputingthe missing traffic speed data Therefore the weekdaysrsquo datafor three weeks (ie 15 days) are chosen for evaluating theHaLRTC algorithm

Based on multiple correlations of the traffic speed datathe data set is modeled as a tensor of size 24 times 12 times 15 whichstands for 24 hours in a day 12 sample intervals (ie recordedby 5 minutes) per hour and 15 days For the methods Meanand BPCA the speed data is arranged as amatrix of size 288times15 The ratios of missing data are set from 5 to 80 andthe missing data are produced randomly All the results areaverage by 10 instances

The RMSE curves and MAPE curves of those methodswith randomly missing weekdaysrsquo traffic speed data for asingle loop detector are shown in Figures 5 and 6 Obviously

Computational Intelligence and Neuroscience 7

20

15

10

5

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 6 MAPE for weekdaysrsquo data of single detector

12

11

10

9

8

7

6

5

4

RMSE

(mpe

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 7 RMSE for data in Wednesday of single detector

the RMSE and MAPE of HaLRTC approach are smallerthan other approaches It is worth noting that the RMSEof HaLRTC will be bigger than RMSE of BPCA when themissing ratio is 80 That situation can be considered as adiploma when the HaLRTC will reach the limit for keepingthe high accurate

For exploiting stronger correlations of traffic speed dataWednesdayrsquos data of a specific week are chosen and 10 weeksdata make up a data set with 10 days Identically the data setis modeled as a tensor of size 24times12times10which stands for 24hours in a day 12 sample intervals (ie recorded by 5minutes)per hour and 10 days And a 288times10matrixmodel is arrangedfor Mean and BPCA

18

16

14

12

10

8

6

4

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 8 MAPE for data in Wednesday of single detector

Figures 7 and 8 show that HaLRTC algorithm is withhigher accuracy for imputing missing traffic speed data thanMean and BPCA Comparing Figures 5 and 7 correspondingwith comparison of Figures 6 and 8 the performance forimputing missing traffic speed data will better when exploit-ing more correlations of data sets

54 Results for Missing Speed Data of Multiple Detectors Toillustrate the benefit of HaLRTC algorithm in reconstructingthe missing traffic speed data that is multiple correlationsof traffic speed data are more beneficial than the partialcorrelations used in traditional methods A new traffic speeddata set by considering spatial correlations is used to evaluateHaLRTC algorithm

We choose the data of 13 detectors in District 7 LosAngeles County (see Figure 1) which are downloaded from(PeMS httppemsdotcagov) For each detector the trafficspeed data for a specific day are chosen andderive the data setFor exploiting the spatial and temporal correlations of trafficspeed data simultaneously the data set is modeled as a tensorof size 24 times 12 times 13 which stands for 24 hours in a day 12sampling intervals in an hour and 13 detectors Meanwhilethe data set is arranged as a matrix 288 times 13 for Mean andBPCA

Figures 9 and 10 show that our propose HaLRTC out-performs Mean and BPCA for traffic speed data of multipledetectors

6 Conclusion

In this paper a multiway tensor model is proposed torepresent the traffic speed data considering the multiplecorrelations and a high accuracy low rank tensor completion(HaLRTC) algorithm is employed to estimate the missingtraffic speed data due to its severely fluctuation Experiments

8 Computational Intelligence and Neuroscience

12

14

16

10

8

6

4

RMSE

(mpe

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 9 RMSE for data of multiple detectors

30

25

20

15

10

5

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 10 MAPE for data of multiple detectors

on benchmark show that the proposed method performsbetter than other classical state-of-the-art methods Forfuture work it is interesting to extend the proposed methodto large and dynamic road network

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The research was supported by NSFC (Grant nos 61271376and 51308115) the National Basic Research Program of China(973 Program no 2012CB725405) Beijing Natural Science

Foundation (4122067) and the Basic Research Fund ofBeijing Institute of Technology (20120342026)

References

[1] J Zhang F-Y Wang K Wang W-H Lin X Xu and C ChenldquoData-driven intelligent transportation systems a surveyrdquo IEEETransactions on Intelligent Transportation Systems vol 12 no 4pp 1624ndash1639 2011

[2] V Jain A Sharma and L Subramanian ldquoRoad traffic conges-tion in the developing worldrdquo in Proceedings of the 2nd ACMSymposium on Computing for Development p 11 2012

[3] H S Zhang Y Zhang Z H Li andD C Hu ldquoSpatial-temporaltraffic data analysis based on global data management usingMASrdquo IEEE Transactions on Intelligent Transportation Systemsvol 5 no 4 pp 267ndash275 2004

[4] J Chen and J Shao ldquoNearest neighbor imputation for surveydatardquo Journal ofOfficial Statistics vol 16 no 2 pp 113ndash132 2000

[5] P D Allison and M Data University Papers Series on Quanti-tative Applications in the Social Sciences Sage Thousand OaksCalif USA 2001

[6] H Chen S Grant-Muller L Mussone and F MontgomeryldquoA study of hybrid neural network approaches and the effectsof missing data on traffic forecastingrdquo Neural Computing ampApplications vol 10 no 3 pp 277ndash286 2001

[7] L Qu L Li Y Zhang and J Hu ldquoPPCA-based missing dataimputation for traffic flow volume a systematical approachrdquoIEEE Transactions on Intelligent Transportation Systems vol 10no 3 pp 512ndash522 2009

[8] L Wang Z Li and C Song ldquoNetwork traffic prediction basedon seasonal ARIMA modelrdquo in Proceedings of the 5th IEEEWorld Congress on Intelligent Control and Automation vol 2pp 1425ndash1428 2004

[9] X Jin S-K Hong and Q Ma ldquoAn algorithm to estimatecontinuous-time traffic speed usingmultiple regressionmodelrdquoInformation Technology Journal vol 5 no 2 pp 281ndash284 2006

[10] L Qu Y Zhang J Hu L Jia and L Li ldquoA BPCA basedmissing value imputing method for traffic flow volume datardquo inProceedings of the IEEE Intelligent Vehicles Symposium (IV rsquo08)pp 985ndash990 June 2008

[11] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[12] P Widhalm M Piff N Brandle H Koller and M ReinthalerldquoRobust road link speed estimates for sparse or missing probevehicle datardquo in Proceedings of the 15th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo12) pp1693ndash1697 September 2012

[13] H Tan G Feng J Feng W Wang Y-J Zhang and F LildquoA tensor-based method for missing traffic data completionrdquoTransportation Research Part C Emerging Technologies vol 28pp 15ndash27 2013

[14] M T Asif N Mitrovic L Garg J Dauwels and P Jaillet ldquoLow-dimensional models for missing data imputation in road net-worksrdquo in Proceedings of the 38th IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo13) pp3527ndash3531 May 2013

[15] S Ma D Goldfarb and L Chen ldquoFixed point and Bregmaniterative methods for matrix rank minimizationrdquoMathematicalProgramming vol 128 no 1-2 pp 321ndash353 2011

Computational Intelligence and Neuroscience 9

[16] E Acar D M Dunlavy T G Kolda and M Moslashrup ldquoScalabletensor factorizations for incomplete datardquo Chemometrics andIntelligent Laboratory Systems vol 106 no 1 pp 41ndash56 2011

[17] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 35 no 1 pp 208ndash220 2013

[18] E J Candes and B Recht ldquoExact matrix completion via convexoptimizationrdquo Foundations of Computational Mathematics vol9 no 6 pp 717ndash772 2009

[19] J F Cai E J Candes andZ Shen ldquoA singular value thresholdingalgorithm for matrix completionrdquo SIAM Journal on Optimiza-tion vol 20 no 4 pp 1956ndash1982 2010

[20] K Lee and Y Bresler ldquoADMiRA atomic decomposition forminimum rank approximationrdquo IEEE Transactions on Informa-tion Theory vol 56 no 9 pp 4402ndash4416 2010

[21] R H Keshavan A Montanari and S Oh ldquoMatrix completionfrom a few entriesrdquo IEEE Transactions on Information Theoryvol 56 no 6 pp 2980ndash2998 2010

[22] M Signoretto Q Tran Dinh L de Lathauwer and J A KSuykens ldquoLearning with tensors a framework based on convexoptimization and spectral regularizationrdquo Machine Learningvol 94 no 3 pp 303ndash351 2014

[23] Y Zhang and Y Liu ldquoMissing traffic flow data predictionusing least squares support vector machines in urban arterialstreetsrdquo inProceedings of the IEEE SymposiumonComputationalIntelligence and DataMining (CIDM rsquo09) pp 76ndash83 April 2009

[24] V Barnett and T LewisOutliers in Statistical Data Wiley Seriesin Probability and Mathematical Statistics Applied Probabilityand Statistics John Wiley amp Sons Chichester UK 3rd edition1994

[25] S P Verma R Cruz-Huicochea and L Dıaz-Gonzalez ldquoUni-variate data analysis system deciphering mean compositionsof island and continental arc magmas and influence of theunderlying crustrdquo International Geology Review vol 55 no 15pp 1922ndash1940 2013

[26] S P Verma L Dıaz-Gonzalez M Rosales-Rivera and AQuiroz-Ruiz ldquoComparative performance of four single extremeoutlier discordancy tests from Monte Carlo simulationsrdquo TheScientific World Journal vol 2014 Article ID 746451 27 pages2014

[27] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[28] Z Lin M Chen L Wu and Y Ma ldquoThe augmented lagrangemultiplier method for exact recovery of corrupted low-rankmatricesrdquo UIUC Technical Report UILU-ENG-09-2215 2009

[29] D Kahaner C Moler and S Nash Numerical Methods andSoftware Prentice Hall Englewood Cliffs NJ USA 1989

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article Traffic Speed Data Imputation Method Based ...downloads.hindawi.com/journals/cin/2015/364089.pdfResearch Article Traffic Speed Data Imputation Method Based on Tensor

6 Computational Intelligence and Neuroscience

InputX withXΩ= TΩ 120588 and 119870

OutputX(1) SetX

Ω= TΩandX

Ω= 0

(2) for 119896 = 0 to 119870 do(3) for 119894 = 1 to 119899 do

(4) M119894= fold

119894[119863(120572119894120588)

(X(119894)+

1

120588

Y119894(119894))]

(5) end for

(6) XΩ=

1

119899

(

119899

sum

119894=1

M119894minus

1

120588

Y119894)

Ω

(7) Y119894= Y119894minus 120588(M

119894minusX)

(8) end for

Algorithm 1 HaLRTC high accuracy low rank tensor completion

According to the framework of ADMM and the algorithmdescription in [17] the HaLRTC algorithm is summarized inAlgorithm 1

5 Experiments

The performances of HaLRTC algorithm are evaluated onthe real world traffic speed data The experimental settingsare listed in Section 51 and evaluation indices are shown inSection 52 In Section 53 temporal correlations are exploitedfor imputing missing speed data for a single detector Addi-tionally Section 54 tests the algorithms for data sets ofmultiple detectors

51 Experimental Settings HaLRTC is compared with twoclassical imputation methods (1) Mean-Historical imputa-tion [29] and (2) BPCA-based imputation method [10]

For the historical imputation method we calculate themean value of all the available data points belonging to thesame detector at the same time interval in the last few daysand use the mean as the imputed value see [29] for moredetails For BPCA we set the maximum number of iterationsteps is 200 and the threshold of the approximate complexityis set to 10minus4 which is the same as [10] HaLRTCmethod is aniterative algorithm themaximum iterative numbers are set as500 The value of 120572

119894is set to 1119899 (119899 is the mode number) and

the parameter 120588 is set as 119890minus3

52 Evaluation Indices To evaluate the performances of theproposed method HaLRTC the following two indices wereused in this paper

(1) Mean Absolute Percentage Error (MAPE) The index givesthe evaluation of the average estimation error in terms ofpercentage

MAPE = 1

119872

119872

sum

119898=1

1003816100381610038161003816100381610038161003816100381610038161003816

119905(119898)

119903minus 119905(119898)

119890

119905(119898)

119903

1003816100381610038161003816100381610038161003816100381610038161003816

times 100 (6)

01 02 03 04 05 06 07 08

14

13

12

11

10

9

8

7

6

5

4

Missing ratio

RMSE

(mpe

)

BPCAMean

HaLRTC

Figure 5 RMSE for weekdaysrsquo data of single detector

(2) Root Mean Square Error (RMSE) This index gives theevaluation of the variance in the estimation errors

RMSE = radic 1

119872

119872

sum

119898=1

(119905(119898)

119903 minus 119905(119898)

119890 )

2

(7)

where 119905(119898)119903

and 119905(119898)

119890are the 119898th elements which stand for

the known real value and estimated value respectively 119872denotes the number of estimated traffic volumes

53 Results forMissing Speed Data of Single Detector To illus-trate the performances of the proposed method a completetraffic speed data set is used as ground truth for the test Wechoose the data of a fixed detector VDS 716331 in District7 Los Angeles County (see Figure 1) which are downloadedfrom (PeMS httppemsdotcagov) The traffic speed dataare recorded every 5 minutes Therefore a daily traffic speedseries for a loop detector contains 288 records and the wholeperiod of the data is chosen for three weeks that is fromFebruary 4 to February 24 2013

Based on the correlations analysis of traffic speed data inSection 3 the correlations are stronger for weekdays withoutregard to the weekends And it will be helpful for imputingthe missing traffic speed data Therefore the weekdaysrsquo datafor three weeks (ie 15 days) are chosen for evaluating theHaLRTC algorithm

Based on multiple correlations of the traffic speed datathe data set is modeled as a tensor of size 24 times 12 times 15 whichstands for 24 hours in a day 12 sample intervals (ie recordedby 5 minutes) per hour and 15 days For the methods Meanand BPCA the speed data is arranged as amatrix of size 288times15 The ratios of missing data are set from 5 to 80 andthe missing data are produced randomly All the results areaverage by 10 instances

The RMSE curves and MAPE curves of those methodswith randomly missing weekdaysrsquo traffic speed data for asingle loop detector are shown in Figures 5 and 6 Obviously

Computational Intelligence and Neuroscience 7

20

15

10

5

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 6 MAPE for weekdaysrsquo data of single detector

12

11

10

9

8

7

6

5

4

RMSE

(mpe

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 7 RMSE for data in Wednesday of single detector

the RMSE and MAPE of HaLRTC approach are smallerthan other approaches It is worth noting that the RMSEof HaLRTC will be bigger than RMSE of BPCA when themissing ratio is 80 That situation can be considered as adiploma when the HaLRTC will reach the limit for keepingthe high accurate

For exploiting stronger correlations of traffic speed dataWednesdayrsquos data of a specific week are chosen and 10 weeksdata make up a data set with 10 days Identically the data setis modeled as a tensor of size 24times12times10which stands for 24hours in a day 12 sample intervals (ie recorded by 5minutes)per hour and 10 days And a 288times10matrixmodel is arrangedfor Mean and BPCA

18

16

14

12

10

8

6

4

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 8 MAPE for data in Wednesday of single detector

Figures 7 and 8 show that HaLRTC algorithm is withhigher accuracy for imputing missing traffic speed data thanMean and BPCA Comparing Figures 5 and 7 correspondingwith comparison of Figures 6 and 8 the performance forimputing missing traffic speed data will better when exploit-ing more correlations of data sets

54 Results for Missing Speed Data of Multiple Detectors Toillustrate the benefit of HaLRTC algorithm in reconstructingthe missing traffic speed data that is multiple correlationsof traffic speed data are more beneficial than the partialcorrelations used in traditional methods A new traffic speeddata set by considering spatial correlations is used to evaluateHaLRTC algorithm

We choose the data of 13 detectors in District 7 LosAngeles County (see Figure 1) which are downloaded from(PeMS httppemsdotcagov) For each detector the trafficspeed data for a specific day are chosen andderive the data setFor exploiting the spatial and temporal correlations of trafficspeed data simultaneously the data set is modeled as a tensorof size 24 times 12 times 13 which stands for 24 hours in a day 12sampling intervals in an hour and 13 detectors Meanwhilethe data set is arranged as a matrix 288 times 13 for Mean andBPCA

Figures 9 and 10 show that our propose HaLRTC out-performs Mean and BPCA for traffic speed data of multipledetectors

6 Conclusion

In this paper a multiway tensor model is proposed torepresent the traffic speed data considering the multiplecorrelations and a high accuracy low rank tensor completion(HaLRTC) algorithm is employed to estimate the missingtraffic speed data due to its severely fluctuation Experiments

8 Computational Intelligence and Neuroscience

12

14

16

10

8

6

4

RMSE

(mpe

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 9 RMSE for data of multiple detectors

30

25

20

15

10

5

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 10 MAPE for data of multiple detectors

on benchmark show that the proposed method performsbetter than other classical state-of-the-art methods Forfuture work it is interesting to extend the proposed methodto large and dynamic road network

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The research was supported by NSFC (Grant nos 61271376and 51308115) the National Basic Research Program of China(973 Program no 2012CB725405) Beijing Natural Science

Foundation (4122067) and the Basic Research Fund ofBeijing Institute of Technology (20120342026)

References

[1] J Zhang F-Y Wang K Wang W-H Lin X Xu and C ChenldquoData-driven intelligent transportation systems a surveyrdquo IEEETransactions on Intelligent Transportation Systems vol 12 no 4pp 1624ndash1639 2011

[2] V Jain A Sharma and L Subramanian ldquoRoad traffic conges-tion in the developing worldrdquo in Proceedings of the 2nd ACMSymposium on Computing for Development p 11 2012

[3] H S Zhang Y Zhang Z H Li andD C Hu ldquoSpatial-temporaltraffic data analysis based on global data management usingMASrdquo IEEE Transactions on Intelligent Transportation Systemsvol 5 no 4 pp 267ndash275 2004

[4] J Chen and J Shao ldquoNearest neighbor imputation for surveydatardquo Journal ofOfficial Statistics vol 16 no 2 pp 113ndash132 2000

[5] P D Allison and M Data University Papers Series on Quanti-tative Applications in the Social Sciences Sage Thousand OaksCalif USA 2001

[6] H Chen S Grant-Muller L Mussone and F MontgomeryldquoA study of hybrid neural network approaches and the effectsof missing data on traffic forecastingrdquo Neural Computing ampApplications vol 10 no 3 pp 277ndash286 2001

[7] L Qu L Li Y Zhang and J Hu ldquoPPCA-based missing dataimputation for traffic flow volume a systematical approachrdquoIEEE Transactions on Intelligent Transportation Systems vol 10no 3 pp 512ndash522 2009

[8] L Wang Z Li and C Song ldquoNetwork traffic prediction basedon seasonal ARIMA modelrdquo in Proceedings of the 5th IEEEWorld Congress on Intelligent Control and Automation vol 2pp 1425ndash1428 2004

[9] X Jin S-K Hong and Q Ma ldquoAn algorithm to estimatecontinuous-time traffic speed usingmultiple regressionmodelrdquoInformation Technology Journal vol 5 no 2 pp 281ndash284 2006

[10] L Qu Y Zhang J Hu L Jia and L Li ldquoA BPCA basedmissing value imputing method for traffic flow volume datardquo inProceedings of the IEEE Intelligent Vehicles Symposium (IV rsquo08)pp 985ndash990 June 2008

[11] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[12] P Widhalm M Piff N Brandle H Koller and M ReinthalerldquoRobust road link speed estimates for sparse or missing probevehicle datardquo in Proceedings of the 15th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo12) pp1693ndash1697 September 2012

[13] H Tan G Feng J Feng W Wang Y-J Zhang and F LildquoA tensor-based method for missing traffic data completionrdquoTransportation Research Part C Emerging Technologies vol 28pp 15ndash27 2013

[14] M T Asif N Mitrovic L Garg J Dauwels and P Jaillet ldquoLow-dimensional models for missing data imputation in road net-worksrdquo in Proceedings of the 38th IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo13) pp3527ndash3531 May 2013

[15] S Ma D Goldfarb and L Chen ldquoFixed point and Bregmaniterative methods for matrix rank minimizationrdquoMathematicalProgramming vol 128 no 1-2 pp 321ndash353 2011

Computational Intelligence and Neuroscience 9

[16] E Acar D M Dunlavy T G Kolda and M Moslashrup ldquoScalabletensor factorizations for incomplete datardquo Chemometrics andIntelligent Laboratory Systems vol 106 no 1 pp 41ndash56 2011

[17] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 35 no 1 pp 208ndash220 2013

[18] E J Candes and B Recht ldquoExact matrix completion via convexoptimizationrdquo Foundations of Computational Mathematics vol9 no 6 pp 717ndash772 2009

[19] J F Cai E J Candes andZ Shen ldquoA singular value thresholdingalgorithm for matrix completionrdquo SIAM Journal on Optimiza-tion vol 20 no 4 pp 1956ndash1982 2010

[20] K Lee and Y Bresler ldquoADMiRA atomic decomposition forminimum rank approximationrdquo IEEE Transactions on Informa-tion Theory vol 56 no 9 pp 4402ndash4416 2010

[21] R H Keshavan A Montanari and S Oh ldquoMatrix completionfrom a few entriesrdquo IEEE Transactions on Information Theoryvol 56 no 6 pp 2980ndash2998 2010

[22] M Signoretto Q Tran Dinh L de Lathauwer and J A KSuykens ldquoLearning with tensors a framework based on convexoptimization and spectral regularizationrdquo Machine Learningvol 94 no 3 pp 303ndash351 2014

[23] Y Zhang and Y Liu ldquoMissing traffic flow data predictionusing least squares support vector machines in urban arterialstreetsrdquo inProceedings of the IEEE SymposiumonComputationalIntelligence and DataMining (CIDM rsquo09) pp 76ndash83 April 2009

[24] V Barnett and T LewisOutliers in Statistical Data Wiley Seriesin Probability and Mathematical Statistics Applied Probabilityand Statistics John Wiley amp Sons Chichester UK 3rd edition1994

[25] S P Verma R Cruz-Huicochea and L Dıaz-Gonzalez ldquoUni-variate data analysis system deciphering mean compositionsof island and continental arc magmas and influence of theunderlying crustrdquo International Geology Review vol 55 no 15pp 1922ndash1940 2013

[26] S P Verma L Dıaz-Gonzalez M Rosales-Rivera and AQuiroz-Ruiz ldquoComparative performance of four single extremeoutlier discordancy tests from Monte Carlo simulationsrdquo TheScientific World Journal vol 2014 Article ID 746451 27 pages2014

[27] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[28] Z Lin M Chen L Wu and Y Ma ldquoThe augmented lagrangemultiplier method for exact recovery of corrupted low-rankmatricesrdquo UIUC Technical Report UILU-ENG-09-2215 2009

[29] D Kahaner C Moler and S Nash Numerical Methods andSoftware Prentice Hall Englewood Cliffs NJ USA 1989

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article Traffic Speed Data Imputation Method Based ...downloads.hindawi.com/journals/cin/2015/364089.pdfResearch Article Traffic Speed Data Imputation Method Based on Tensor

Computational Intelligence and Neuroscience 7

20

15

10

5

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 6 MAPE for weekdaysrsquo data of single detector

12

11

10

9

8

7

6

5

4

RMSE

(mpe

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 7 RMSE for data in Wednesday of single detector

the RMSE and MAPE of HaLRTC approach are smallerthan other approaches It is worth noting that the RMSEof HaLRTC will be bigger than RMSE of BPCA when themissing ratio is 80 That situation can be considered as adiploma when the HaLRTC will reach the limit for keepingthe high accurate

For exploiting stronger correlations of traffic speed dataWednesdayrsquos data of a specific week are chosen and 10 weeksdata make up a data set with 10 days Identically the data setis modeled as a tensor of size 24times12times10which stands for 24hours in a day 12 sample intervals (ie recorded by 5minutes)per hour and 10 days And a 288times10matrixmodel is arrangedfor Mean and BPCA

18

16

14

12

10

8

6

4

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 8 MAPE for data in Wednesday of single detector

Figures 7 and 8 show that HaLRTC algorithm is withhigher accuracy for imputing missing traffic speed data thanMean and BPCA Comparing Figures 5 and 7 correspondingwith comparison of Figures 6 and 8 the performance forimputing missing traffic speed data will better when exploit-ing more correlations of data sets

54 Results for Missing Speed Data of Multiple Detectors Toillustrate the benefit of HaLRTC algorithm in reconstructingthe missing traffic speed data that is multiple correlationsof traffic speed data are more beneficial than the partialcorrelations used in traditional methods A new traffic speeddata set by considering spatial correlations is used to evaluateHaLRTC algorithm

We choose the data of 13 detectors in District 7 LosAngeles County (see Figure 1) which are downloaded from(PeMS httppemsdotcagov) For each detector the trafficspeed data for a specific day are chosen andderive the data setFor exploiting the spatial and temporal correlations of trafficspeed data simultaneously the data set is modeled as a tensorof size 24 times 12 times 13 which stands for 24 hours in a day 12sampling intervals in an hour and 13 detectors Meanwhilethe data set is arranged as a matrix 288 times 13 for Mean andBPCA

Figures 9 and 10 show that our propose HaLRTC out-performs Mean and BPCA for traffic speed data of multipledetectors

6 Conclusion

In this paper a multiway tensor model is proposed torepresent the traffic speed data considering the multiplecorrelations and a high accuracy low rank tensor completion(HaLRTC) algorithm is employed to estimate the missingtraffic speed data due to its severely fluctuation Experiments

8 Computational Intelligence and Neuroscience

12

14

16

10

8

6

4

RMSE

(mpe

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 9 RMSE for data of multiple detectors

30

25

20

15

10

5

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 10 MAPE for data of multiple detectors

on benchmark show that the proposed method performsbetter than other classical state-of-the-art methods Forfuture work it is interesting to extend the proposed methodto large and dynamic road network

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The research was supported by NSFC (Grant nos 61271376and 51308115) the National Basic Research Program of China(973 Program no 2012CB725405) Beijing Natural Science

Foundation (4122067) and the Basic Research Fund ofBeijing Institute of Technology (20120342026)

References

[1] J Zhang F-Y Wang K Wang W-H Lin X Xu and C ChenldquoData-driven intelligent transportation systems a surveyrdquo IEEETransactions on Intelligent Transportation Systems vol 12 no 4pp 1624ndash1639 2011

[2] V Jain A Sharma and L Subramanian ldquoRoad traffic conges-tion in the developing worldrdquo in Proceedings of the 2nd ACMSymposium on Computing for Development p 11 2012

[3] H S Zhang Y Zhang Z H Li andD C Hu ldquoSpatial-temporaltraffic data analysis based on global data management usingMASrdquo IEEE Transactions on Intelligent Transportation Systemsvol 5 no 4 pp 267ndash275 2004

[4] J Chen and J Shao ldquoNearest neighbor imputation for surveydatardquo Journal ofOfficial Statistics vol 16 no 2 pp 113ndash132 2000

[5] P D Allison and M Data University Papers Series on Quanti-tative Applications in the Social Sciences Sage Thousand OaksCalif USA 2001

[6] H Chen S Grant-Muller L Mussone and F MontgomeryldquoA study of hybrid neural network approaches and the effectsof missing data on traffic forecastingrdquo Neural Computing ampApplications vol 10 no 3 pp 277ndash286 2001

[7] L Qu L Li Y Zhang and J Hu ldquoPPCA-based missing dataimputation for traffic flow volume a systematical approachrdquoIEEE Transactions on Intelligent Transportation Systems vol 10no 3 pp 512ndash522 2009

[8] L Wang Z Li and C Song ldquoNetwork traffic prediction basedon seasonal ARIMA modelrdquo in Proceedings of the 5th IEEEWorld Congress on Intelligent Control and Automation vol 2pp 1425ndash1428 2004

[9] X Jin S-K Hong and Q Ma ldquoAn algorithm to estimatecontinuous-time traffic speed usingmultiple regressionmodelrdquoInformation Technology Journal vol 5 no 2 pp 281ndash284 2006

[10] L Qu Y Zhang J Hu L Jia and L Li ldquoA BPCA basedmissing value imputing method for traffic flow volume datardquo inProceedings of the IEEE Intelligent Vehicles Symposium (IV rsquo08)pp 985ndash990 June 2008

[11] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[12] P Widhalm M Piff N Brandle H Koller and M ReinthalerldquoRobust road link speed estimates for sparse or missing probevehicle datardquo in Proceedings of the 15th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo12) pp1693ndash1697 September 2012

[13] H Tan G Feng J Feng W Wang Y-J Zhang and F LildquoA tensor-based method for missing traffic data completionrdquoTransportation Research Part C Emerging Technologies vol 28pp 15ndash27 2013

[14] M T Asif N Mitrovic L Garg J Dauwels and P Jaillet ldquoLow-dimensional models for missing data imputation in road net-worksrdquo in Proceedings of the 38th IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo13) pp3527ndash3531 May 2013

[15] S Ma D Goldfarb and L Chen ldquoFixed point and Bregmaniterative methods for matrix rank minimizationrdquoMathematicalProgramming vol 128 no 1-2 pp 321ndash353 2011

Computational Intelligence and Neuroscience 9

[16] E Acar D M Dunlavy T G Kolda and M Moslashrup ldquoScalabletensor factorizations for incomplete datardquo Chemometrics andIntelligent Laboratory Systems vol 106 no 1 pp 41ndash56 2011

[17] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 35 no 1 pp 208ndash220 2013

[18] E J Candes and B Recht ldquoExact matrix completion via convexoptimizationrdquo Foundations of Computational Mathematics vol9 no 6 pp 717ndash772 2009

[19] J F Cai E J Candes andZ Shen ldquoA singular value thresholdingalgorithm for matrix completionrdquo SIAM Journal on Optimiza-tion vol 20 no 4 pp 1956ndash1982 2010

[20] K Lee and Y Bresler ldquoADMiRA atomic decomposition forminimum rank approximationrdquo IEEE Transactions on Informa-tion Theory vol 56 no 9 pp 4402ndash4416 2010

[21] R H Keshavan A Montanari and S Oh ldquoMatrix completionfrom a few entriesrdquo IEEE Transactions on Information Theoryvol 56 no 6 pp 2980ndash2998 2010

[22] M Signoretto Q Tran Dinh L de Lathauwer and J A KSuykens ldquoLearning with tensors a framework based on convexoptimization and spectral regularizationrdquo Machine Learningvol 94 no 3 pp 303ndash351 2014

[23] Y Zhang and Y Liu ldquoMissing traffic flow data predictionusing least squares support vector machines in urban arterialstreetsrdquo inProceedings of the IEEE SymposiumonComputationalIntelligence and DataMining (CIDM rsquo09) pp 76ndash83 April 2009

[24] V Barnett and T LewisOutliers in Statistical Data Wiley Seriesin Probability and Mathematical Statistics Applied Probabilityand Statistics John Wiley amp Sons Chichester UK 3rd edition1994

[25] S P Verma R Cruz-Huicochea and L Dıaz-Gonzalez ldquoUni-variate data analysis system deciphering mean compositionsof island and continental arc magmas and influence of theunderlying crustrdquo International Geology Review vol 55 no 15pp 1922ndash1940 2013

[26] S P Verma L Dıaz-Gonzalez M Rosales-Rivera and AQuiroz-Ruiz ldquoComparative performance of four single extremeoutlier discordancy tests from Monte Carlo simulationsrdquo TheScientific World Journal vol 2014 Article ID 746451 27 pages2014

[27] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[28] Z Lin M Chen L Wu and Y Ma ldquoThe augmented lagrangemultiplier method for exact recovery of corrupted low-rankmatricesrdquo UIUC Technical Report UILU-ENG-09-2215 2009

[29] D Kahaner C Moler and S Nash Numerical Methods andSoftware Prentice Hall Englewood Cliffs NJ USA 1989

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article Traffic Speed Data Imputation Method Based ...downloads.hindawi.com/journals/cin/2015/364089.pdfResearch Article Traffic Speed Data Imputation Method Based on Tensor

8 Computational Intelligence and Neuroscience

12

14

16

10

8

6

4

RMSE

(mpe

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 9 RMSE for data of multiple detectors

30

25

20

15

10

5

MA

PE (

)

01 02 03 04 05 06 07 08

Missing ratio

BPCAMean

HaLRTC

Figure 10 MAPE for data of multiple detectors

on benchmark show that the proposed method performsbetter than other classical state-of-the-art methods Forfuture work it is interesting to extend the proposed methodto large and dynamic road network

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The research was supported by NSFC (Grant nos 61271376and 51308115) the National Basic Research Program of China(973 Program no 2012CB725405) Beijing Natural Science

Foundation (4122067) and the Basic Research Fund ofBeijing Institute of Technology (20120342026)

References

[1] J Zhang F-Y Wang K Wang W-H Lin X Xu and C ChenldquoData-driven intelligent transportation systems a surveyrdquo IEEETransactions on Intelligent Transportation Systems vol 12 no 4pp 1624ndash1639 2011

[2] V Jain A Sharma and L Subramanian ldquoRoad traffic conges-tion in the developing worldrdquo in Proceedings of the 2nd ACMSymposium on Computing for Development p 11 2012

[3] H S Zhang Y Zhang Z H Li andD C Hu ldquoSpatial-temporaltraffic data analysis based on global data management usingMASrdquo IEEE Transactions on Intelligent Transportation Systemsvol 5 no 4 pp 267ndash275 2004

[4] J Chen and J Shao ldquoNearest neighbor imputation for surveydatardquo Journal ofOfficial Statistics vol 16 no 2 pp 113ndash132 2000

[5] P D Allison and M Data University Papers Series on Quanti-tative Applications in the Social Sciences Sage Thousand OaksCalif USA 2001

[6] H Chen S Grant-Muller L Mussone and F MontgomeryldquoA study of hybrid neural network approaches and the effectsof missing data on traffic forecastingrdquo Neural Computing ampApplications vol 10 no 3 pp 277ndash286 2001

[7] L Qu L Li Y Zhang and J Hu ldquoPPCA-based missing dataimputation for traffic flow volume a systematical approachrdquoIEEE Transactions on Intelligent Transportation Systems vol 10no 3 pp 512ndash522 2009

[8] L Wang Z Li and C Song ldquoNetwork traffic prediction basedon seasonal ARIMA modelrdquo in Proceedings of the 5th IEEEWorld Congress on Intelligent Control and Automation vol 2pp 1425ndash1428 2004

[9] X Jin S-K Hong and Q Ma ldquoAn algorithm to estimatecontinuous-time traffic speed usingmultiple regressionmodelrdquoInformation Technology Journal vol 5 no 2 pp 281ndash284 2006

[10] L Qu Y Zhang J Hu L Jia and L Li ldquoA BPCA basedmissing value imputing method for traffic flow volume datardquo inProceedings of the IEEE Intelligent Vehicles Symposium (IV rsquo08)pp 985ndash990 June 2008

[11] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[12] P Widhalm M Piff N Brandle H Koller and M ReinthalerldquoRobust road link speed estimates for sparse or missing probevehicle datardquo in Proceedings of the 15th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo12) pp1693ndash1697 September 2012

[13] H Tan G Feng J Feng W Wang Y-J Zhang and F LildquoA tensor-based method for missing traffic data completionrdquoTransportation Research Part C Emerging Technologies vol 28pp 15ndash27 2013

[14] M T Asif N Mitrovic L Garg J Dauwels and P Jaillet ldquoLow-dimensional models for missing data imputation in road net-worksrdquo in Proceedings of the 38th IEEE International Conferenceon Acoustics Speech and Signal Processing (ICASSP rsquo13) pp3527ndash3531 May 2013

[15] S Ma D Goldfarb and L Chen ldquoFixed point and Bregmaniterative methods for matrix rank minimizationrdquoMathematicalProgramming vol 128 no 1-2 pp 321ndash353 2011

Computational Intelligence and Neuroscience 9

[16] E Acar D M Dunlavy T G Kolda and M Moslashrup ldquoScalabletensor factorizations for incomplete datardquo Chemometrics andIntelligent Laboratory Systems vol 106 no 1 pp 41ndash56 2011

[17] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 35 no 1 pp 208ndash220 2013

[18] E J Candes and B Recht ldquoExact matrix completion via convexoptimizationrdquo Foundations of Computational Mathematics vol9 no 6 pp 717ndash772 2009

[19] J F Cai E J Candes andZ Shen ldquoA singular value thresholdingalgorithm for matrix completionrdquo SIAM Journal on Optimiza-tion vol 20 no 4 pp 1956ndash1982 2010

[20] K Lee and Y Bresler ldquoADMiRA atomic decomposition forminimum rank approximationrdquo IEEE Transactions on Informa-tion Theory vol 56 no 9 pp 4402ndash4416 2010

[21] R H Keshavan A Montanari and S Oh ldquoMatrix completionfrom a few entriesrdquo IEEE Transactions on Information Theoryvol 56 no 6 pp 2980ndash2998 2010

[22] M Signoretto Q Tran Dinh L de Lathauwer and J A KSuykens ldquoLearning with tensors a framework based on convexoptimization and spectral regularizationrdquo Machine Learningvol 94 no 3 pp 303ndash351 2014

[23] Y Zhang and Y Liu ldquoMissing traffic flow data predictionusing least squares support vector machines in urban arterialstreetsrdquo inProceedings of the IEEE SymposiumonComputationalIntelligence and DataMining (CIDM rsquo09) pp 76ndash83 April 2009

[24] V Barnett and T LewisOutliers in Statistical Data Wiley Seriesin Probability and Mathematical Statistics Applied Probabilityand Statistics John Wiley amp Sons Chichester UK 3rd edition1994

[25] S P Verma R Cruz-Huicochea and L Dıaz-Gonzalez ldquoUni-variate data analysis system deciphering mean compositionsof island and continental arc magmas and influence of theunderlying crustrdquo International Geology Review vol 55 no 15pp 1922ndash1940 2013

[26] S P Verma L Dıaz-Gonzalez M Rosales-Rivera and AQuiroz-Ruiz ldquoComparative performance of four single extremeoutlier discordancy tests from Monte Carlo simulationsrdquo TheScientific World Journal vol 2014 Article ID 746451 27 pages2014

[27] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[28] Z Lin M Chen L Wu and Y Ma ldquoThe augmented lagrangemultiplier method for exact recovery of corrupted low-rankmatricesrdquo UIUC Technical Report UILU-ENG-09-2215 2009

[29] D Kahaner C Moler and S Nash Numerical Methods andSoftware Prentice Hall Englewood Cliffs NJ USA 1989

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article Traffic Speed Data Imputation Method Based ...downloads.hindawi.com/journals/cin/2015/364089.pdfResearch Article Traffic Speed Data Imputation Method Based on Tensor

Computational Intelligence and Neuroscience 9

[16] E Acar D M Dunlavy T G Kolda and M Moslashrup ldquoScalabletensor factorizations for incomplete datardquo Chemometrics andIntelligent Laboratory Systems vol 106 no 1 pp 41ndash56 2011

[17] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 35 no 1 pp 208ndash220 2013

[18] E J Candes and B Recht ldquoExact matrix completion via convexoptimizationrdquo Foundations of Computational Mathematics vol9 no 6 pp 717ndash772 2009

[19] J F Cai E J Candes andZ Shen ldquoA singular value thresholdingalgorithm for matrix completionrdquo SIAM Journal on Optimiza-tion vol 20 no 4 pp 1956ndash1982 2010

[20] K Lee and Y Bresler ldquoADMiRA atomic decomposition forminimum rank approximationrdquo IEEE Transactions on Informa-tion Theory vol 56 no 9 pp 4402ndash4416 2010

[21] R H Keshavan A Montanari and S Oh ldquoMatrix completionfrom a few entriesrdquo IEEE Transactions on Information Theoryvol 56 no 6 pp 2980ndash2998 2010

[22] M Signoretto Q Tran Dinh L de Lathauwer and J A KSuykens ldquoLearning with tensors a framework based on convexoptimization and spectral regularizationrdquo Machine Learningvol 94 no 3 pp 303ndash351 2014

[23] Y Zhang and Y Liu ldquoMissing traffic flow data predictionusing least squares support vector machines in urban arterialstreetsrdquo inProceedings of the IEEE SymposiumonComputationalIntelligence and DataMining (CIDM rsquo09) pp 76ndash83 April 2009

[24] V Barnett and T LewisOutliers in Statistical Data Wiley Seriesin Probability and Mathematical Statistics Applied Probabilityand Statistics John Wiley amp Sons Chichester UK 3rd edition1994

[25] S P Verma R Cruz-Huicochea and L Dıaz-Gonzalez ldquoUni-variate data analysis system deciphering mean compositionsof island and continental arc magmas and influence of theunderlying crustrdquo International Geology Review vol 55 no 15pp 1922ndash1940 2013

[26] S P Verma L Dıaz-Gonzalez M Rosales-Rivera and AQuiroz-Ruiz ldquoComparative performance of four single extremeoutlier discordancy tests from Monte Carlo simulationsrdquo TheScientific World Journal vol 2014 Article ID 746451 27 pages2014

[27] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[28] Z Lin M Chen L Wu and Y Ma ldquoThe augmented lagrangemultiplier method for exact recovery of corrupted low-rankmatricesrdquo UIUC Technical Report UILU-ENG-09-2215 2009

[29] D Kahaner C Moler and S Nash Numerical Methods andSoftware Prentice Hall Englewood Cliffs NJ USA 1989

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article Traffic Speed Data Imputation Method Based ...downloads.hindawi.com/journals/cin/2015/364089.pdfResearch Article Traffic Speed Data Imputation Method Based on Tensor

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014