short-term traffic flow prediction with recurrent mixture

Research ArticleShort-Term Traffic Flow Prediction with Recurrent MixtureDensity Network

Mingjian Chen Rui Chen Fu Cai Wanli Li Naikun Guo and Guangyun Li

Institute of Geospatial Information Information Engineering University Zhengzhou 450000 China

Correspondence should be addressed to Mingjian Chen zzcmj139com

Received 5 August 2020 Revised 17 April 2021 Accepted 16 May 2021 Published 24 May 2021

Academic Editor Bartlomiej Blachowski

Copyright copy 2021Mingjian Chen et al+is is an open access article distributed under the Creative CommonsAttribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Traffic situation awareness is the key factor for intelligent transportation systems (ITS) and smart city Short-term traffic flowprediction is one of the challenging tasks of traffic situation awareness which is useful for route planning traffic congestionalleviation emission reduction and so on Over the past few years ubiquitous location acquisition techniques and sensorsdigitized the road networks and generated spatiotemporal data Massive traffic data provide an opportunity for short-termtraffic flow prediction in a data-driven manner Most of the existing short-term traffic flow prediction methods can be dividedinto two categories nonparametric and parametric Traditional parametric methods failed to obtain accurate prediction due tothe nonlinear and stochastic characteristics of short-term traffic flow Recently deep learning methods have been studiedwidely in the fields of short-term prediction +ese nonparametric methods yielded promising results in practical experimentsMotivated by the current study status we dedicate this paper to a short-term traffic flow prediction approach based on therecurrent mixture density network the combination of recurrent neural network (RNN) and mixture density network (MDN)+is approach is implemented on real-world traffic flow data and demonstrates the prominent superiority To the best of ourknowledge this is the first time that the recurrent mixture density network is applied to a real-world short-term traffic flowprediction task

1 Introduction

Over the past few decades the number of megalopolises hasincreased rapidly especially in developing country as ChinaIn the process of city development projects in various as-pects such as urban layout plan environment protectionand traffic management are still tricky challenges [1] Hencegreat demand for smart city construction and intelligenttransportation systems (ITS) emerges [2] Short-term trafficflow prediction aims to provide future traffic information ofroad network which is useful for both individual users andtransportation management [3] Timely and accurate trafficsituation awareness contributes to a wide spectrum of ap-plications such as route planning traffic congestion alle-viation and emission reduction With the rapiddevelopment of sensors communication techniques andlocation acquisition techniques such as Global NavigationSatellite System (GNSS) and 5G technology massive

multisource data including trajectory data and traffic flowdata are generated and communicated by prevalent trafficinformation acquisition devices [4]

+e evolution of traffic conditions has both spatial andtemporal characteristics [5] +ere is abundant potentialknowledge about traffic conditions contained in traffic datawhich implies the spatiotemporal patterns of traffic flowUbiquitous traffic data lead to the increasing advent of data-driven traffic flow prediction methods Hence it is feasible topredict the evolution trend of traffic flow via knowledgediscovery Predicting short-term traffic flow refers to esti-mating the traffic flow indices of a certain road or transversesection at one or more following time instances based on therecent and current traffic flow data Most of the existingtraffic flow prediction methods can be categorized intoparametric methods and nonparametric methods [6]

In the category of parametric methods the predictionmodels maintain a fixed structure+emodel parameters are

HindawiMathematical Problems in EngineeringVolume 2021 Article ID 6393951 9 pageshttpsdoiorg10115520216393951

estimated by the empirical way based on historical dataWidely used parametric methods include the time-seriesmodel Autoregressive Integrated Moving Average(ARIMA) model [7] and Kalman Filter [8] +ere are threecore parameters to determine the structure of ARIMA(p d q) where p denotes the autoregressive order d de-notes the integrated order and q is the moving averagepolynomial order respectively ARIMA model converts thetime-series data into stationary series without tendency andcaptures both the autocorrelation features and noise featuresof the stationary series Kalman Filter based methods rely ontransition matrix and dynamic equation However theshort-term traffic flow is highly nonlinear and stochastic+e evolution of traffic flow is actually an uncertain eventHence traditional parametric methods are limited in cap-turing the inherent features of traffic flow and fail to achieveexpected performance

In order to improve prediction precision plenty of lit-erature focuses on applying nonparametric models to trafficflow prediction Nonparametric methods are mainly basedon machine learning and deep learning technologies whichdo not involve predefined parameters and structure [9]Deep neural networks (DNN) [10] including convolutionalneural network (CNN) [11] and recurrent neural network(RNN) [12] have achieved remarkable performance in awide range of fields such as computer vision (CV) andnatural language processing (NLP) +e rapid developmentof deep learning technologies brings opportunities to im-prove the precision of traffic flow prediction more efficientlyRNN has the ability to learn historical information andcontextual dependency from traffic flow series FurthermoreRNN excels at fitting nonlinear time series by virtue of thenonlinear activation units Actually the concrete value oftraffic flow indices is nonrepetitive and usually has a certaindistribution +e way to predict concrete values does notmatch with the fact that the future evolution of traffic flow isan uncertain event Hence the methods introduced aboveregarding the anticipated traffic flow indices as deterministicresults could be confused by the continuous and non-repetitive values

Motivated by the current study status we dedicate thispaper to a short-term traffic flow prediction approachbased on the recurrent mixture density network +e re-current mixture density network is constructed by inte-grating long short-term memory (LSTM) network andmixture density network (MDN) [13] LSTM network is avariant of RNN with a special unit structure which iscompetent for learning historical information and con-textual dependency in time series and avoiding the issue ofvanishing gradient +e main purpose of introducing MDNis to parameterize mixture distributions by the outputs ofthe neural network In this way the prediction modelgenerates the probable distributions of anticipated trafficflow indices instead of concrete values +e ultimate pre-diction results can be obtained by sampling from thedistributions +e manner of fuzzy prediction accords withthe actual behavior of traffic flow evolution To the best ofour knowledge this is the first time that the recurrentmixture density network is applied to a real-world short-

term traffic flow prediction task Additionally the recurrentmixture density model is easy to be transplanted to dif-ferent application scenarios due to the prominent ad-vantage of generalization and learning ability

+e prediction model is implemented on a real-worldtraffic flow prediction task Different from most of the otherexisting work which predicts the value of only one certaintraffic flow index of next one time step at once we predictboth the traffic time indices (TTI) and the average speed(AS) of a certain road section at several following time stepssimultaneously Since it is reasonable to consider the co-herent relationship between TTI and AS predicting trafficflow at more than one time step also provides more valuableinformation for traffic situation awareness+e remainder ofthis paper is organized as follows the previous work relatedto short-term traffic flow prediction is reviewed in Section 2Section 3 elaborates the prediction model Section 4 presentsthe experimental results and the corresponding evaluationand finally the discussion and conclusions are presented inSection 5

2 Related Work

+e manner of short-term traffic flow prediction methodscan be concluded as follows build a mathematic modelbased on history series data and then use the model topredict the future value of traffic flow indices In general theexisting prediction methods can be categorized into twogroups parametric methods and nonparametric methodsaccording to the way of modeling

In the category of parametric methods the modelparameters are estimated by the empirical way based onhistorical data ARIMA model is a typical parametricmodel As early as 1979 ARIMA has been applied torepresent freeway traffic time-series data and make fore-casts one time-interval in advance [14] In recent yearsvarious ARIMA models and the ARIMA-based modelshave been proposed A vector autoregressive model wasfound appropriate and better than the traditional ARIMAmodel in practical experiments [15] which introduced thetraffic flow information upstream as well as downstream+e ARIMA model and its variants such as autoregressivemoving average (ARMA) and seasonal ARIMA (SARIMA)are also studied comparatively with other widely usedmachine learning methods to evaluate the efficiency intraffic flow prediction tasks [16 17]

However short-term traffic flow is difficult to predictdue to the nonlinear and stochastic feature +e perfor-mance of parametric methods is limited Plenty of existingwork attempts to develop nonparametric methods withflexible structure and parameters Nonparametricmethods mainly include machine learning and deeplearning techniques [18] RNN and its variants such asLSTM and gated recurrent units (GRU) network arewidely used in short-term traffic flow prediction tasksOne of the advantages of the LSTM network is that theformat of the input data is flexible Hence in order toimprove the prediction performance several traffic flowindices such as flow speed and occupancy can be used as

2 Mathematical Problems in Engineering

input simultaneously [19] LSTM network and GRUnetwork were also evaluated comparatively with ARIMA+e experiments demonstrated that the RNN basedmethods perform better than the ARIMA model [20]Experiments including several prevailing parametric andnonparametric algorithms also suggest the excellentprediction ability of the LSTM network [21] Experi-mental results of multiple prediction tasks with differenttime intervals verified the generalization ability of theLSTM network [22]

Additionally other deep learning techniques are alsowidely used for traffic flow prediction CNN is an excellenttechnique in image recognition fields which is utilized tolearn traffic flow as an image and make predictions [23] Adeep architecture combined by CNN and LSTM is developedand implemented on traffic flow forecast [24] Stackedautoencoders (SAE) also perform better than other machinelearning techniques [25]

In this work we construct the recurrent mixture densitynetwork and apply it to a real-world short-term traffic flowprediction task taking advantage of the LSTM network andmeanwhile integrating the parameterization function of theMDN

3 Methodology

+e goal of short-term traffic flow prediction can be de-scribed as follows given the recent part of traffic flow indicesseries xj xjminus1 xjminus2 xjminusk+11113966 1113967 of a certain road section atthe present time j and k minus 1 previous time stamps wherex [TTIAS] is a two-dimensional vector containing thetraffic time indices and the average speed informationpredict the traffic flow indices xj+1 xj+2 xj+h1113966 1113967 at next h

time steps+e recurrent mixture density network and data pro-

cessing process will be elaborated in this section as follows

31 Recurrent Mixture Density Network +e recurrentmixture density network is constructed by integrating theLSTM network and MDN which has the ability oflearning historical information and contextual depen-dency and generating predicted mixture distributionsUnlike the ordinary feedforward neural network that onlyhas connections between adjacent layers the LSTMnetwork allows recurrent connections between adjacentcells of the same layer Hence the LSTM network is deepin time-series dimension and has the ability to map thehistorical input series to the output via recurrent con-nections +e most distinct characteristic of the LSTMnetwork is the unique gate structure of the LSTM cellincluding input gate forget gate and output gate as shownin Figure 1

+e three gates work collaboratively to encode the inputsignal and generate outputs which enable the LSTMnetwork to maintain historical information contained inseries data +e LSTM network encodes the input signal asfollows

fj σ Wf xj hjminus11113960 1113961 + bf1113872 1113873

ij σ Wi xj hjminus11113960 1113961 + bi1113872 1113873

oj σ Wo xj hjminus11113960 1113961 + bo1113872 1113873

1113957Cj tanh Wc xj hjminus11113960 1113961 + bc1113872 1113873

Cj fj middot Cjminus1 + ij middot 1113957Cj

hj oj middot tanh Cj1113872 1113873

(1)

where fj ij and oj denote the forget gate input gate andoutput gate respectively σ and tanh denote the logisticsigmoid function and hyperbolic tangent function respec-tively both of which are used as the nonlinear activationfunction hjminus1 is the output of the j minus 1 th unit in the LSTMlayer xj denotes the input signal from the preceding layer+e weight matrices Wf Wi Wo and the corresponding biasb associated with these gates are variables to be optimizedduring the training process LSTM cells retain and updatethe cell state while receiving an input signal +en theupdated cell state is activated by the nonlinear activationfunction and fed to the next layer

Outputs of all the LSTM layers are concatenated as theintegrated output of hidden layers and sent to the MDNlayer as shown in Figure 2 +e MDN layer is a full-con-nected layer as follows

1113957yj 1113944N

n1Whny middot h

nj + by (2)

where N denotes the number of stacked LSTM layers and hnj

is the output of the nth bidirectional LSTM layer Whny

denotes the weight matrix connecting the LSTM layer andMDN layer by is the corresponding bias 1113957yj denotes theoutputs of the MDN layer+e outputs of the MDN layer areused to parameterize mixture distributions In order tocapture the probable range of the two-dimensional vectorx [TTIAS] simultaneously a bivariate Gaussian mixturemodel is adopted as the mixture distributions Individualcomponents of the mixture distributions are denoted by a

Cjndash1

hjndash1

fj

xj

Wf Wi Wc Wo hj

ij oj

hjCj

~

tanh

tanhσ σ σ

Cj

Figure 1 +e architecture of the long short-term memory (LSTM)unit

Mathematical Problems in Engineering 3

bivariate Gaussianmodel which are parameterized by sets ofmeans μ standard deviations σ and correlations ρ Addi-tionally the Gaussian mixture model also contains theweight parameter π

1113957yj 1113957πmj 1113957ρm

j 1113957μmj 1113957σm

j1113966 1113967M

m1(3)

where M denotes the number of individual componentscomposed of the mixture distributions

All the outputs generated by the network must benormalized to meaningful ranges as the final output beforebeing used as parameters

yj πmj ρm

j μmj σm

j1113966 1113967M

m1(4)

where

πmj Softmax 1113957πm

j1113872 1113873 exp 1113957πm

j1113872 1113873

1113936Mm1 exp 1113957πm

j1113872 1113873

ρmj tanh 1113957ρm

j1113872 1113873

μmj 1113957μm

j

σmj exp 1113957σm

j1113872 1113873

(5)

+e probability density function of the train label xj+l

(lge 1) against the output mixture distributions is defined asfollows

Pr xj+l | yj1113872 1113873 1113944M

m

πmj G xj+l| μ

mj σm

j ρmj1113872 1113873 (6)

where G is the bivariate Gaussian function

G(x|μ σ ρ) 1

2πσ1σ2

1 minus ρ21113969 exp

minusZ

2 1 minus ρ21113872 1113873⎡⎢⎣ ⎤⎥⎦

Z x1 minus μ1( 1113857

2

σ21+

x2 minus μ2( 11138572

σ22minus2ρ x1 minus μ1( 1113857 x2 minus μ2( 1113857

σ1σ2

(7)

+e negative log-likelihood function is employed as theloss function to be optimized in the training process

Loss minuslog 1113944M

m1πm

i G xi+l|μmi σm

i ρmi( 1113857⎛⎝ ⎞⎠ (8)

Adam optimizer is applied to the backpropagationthrough time process to fine-tune the network variants Wealso adopt the dropout technique and gradient clip strategyto avoid the overfitting and exploding gradient issues

32 Data Processing In order to predict the short-termtraffic flow indices x [TTIAS] at several following timesteps we prepare both the training input data and traininglabel as sequence All the training data are generated throughsliding window strategy [13] as shown in Figure 3 where wehypothesized the input sequence length k 3 and predictionlength h 2 +e train label is in the same sequence lengthwith the train input and lags h 2 time steps In the pre-diction stage the prediction model also outputs the mixturedistributions of k 3 vectors whereas we only pick out thelast h 2 mixture distributions for sampling +e ultimateprediction results are obtained by sampling from mixturedistributions via roulette strategy based on the weights ofeach individual Gaussian component

4 Experiments

In this section we aim to complete a practical short-termtraffic flow prediction task by utilizing the recurrent mixturedensity network+e optimal parameter configuration of theprediction model is determined after several experiments Inorder to assess the performance of the prediction model twowidely studied methods are introduced for comparison

41 Dataset and Experiment Setup Shenzhen is one of thefirst-tier cities in China where the economy developedrapidly in recent years As a megalopolis Shenzhen is facedwith typical city development challenges such as trans-portation management Shenzhen North Railway Station isone of the most important transportation hubs of the city+e road networks around Shenzhen North Railway Stationare also the main transportation corridors of the city whichdirectly influence the urban traffic efficiency +e traffic flowdataset is generated by sensors installed on road networkswhich record the speed and traffic flow information ofmobile vehicles It contains TTI and AS information oftwelve road sections around Shenzhen North Railway Sta-tion in two periods from 1 January 2019 to 31 March 2019and from 1 October 2019 to 21 December 2019 +e timegranularity is ten minutes that is records at each timestampdenote the average indices in ten minutes Figure 4 presentsan example of typical traffic flow patterns overtime in a dayof a certain road section

In experiments we select the records from 1 December2019 to 21 December as test data and use the rest for modeltraining We use one-hour-long traffic flow information topredict the indices in the upcoming half-hour that is theinput length is 6 and the prediction length is 3

+ree metrics are employed for performance evaluationmean absolute error (MAE) mean relative error (MRE) androot mean square error (RMSE)

MDNlayer

Negative loglikelihoodfunction

LSTMlayers

Inputlayer

Traininglabel

Parametersπ ρ micro σ

Figure 2 Schematic of recurrent mixture density network


MAE 1n

1113944

n

i1xiprime minus xi

11138681113868111386811138681113868111386811138681113868

MRE 1n

1113944

n

i1

xiprime minus xi

11138681113868111386811138681113868111386811138681113868

xi

RMSE

1n

1113944

n

i1xiprime minus xi

111386811138681113868111386811138681113868111386811138682

11139741113972

(9)

where xiprime and xi denote the prediction result and the ground

truth respectively n is the total number of test traffic flowsegments +e prediction errors are calculated on everypredicted time instance

42 Comparative Results Two widely used methods areinvolved in comparison experiments including the LSTMnetwork and ARIMA model +e optimal parameter con-figuration of the recurrent mixture density network is de-termined after several experiments where the numberLSTM layer is set to 4 the number of LSTM units in eachlayer is set to 256 and the number of mixture components inthe mixture density layer is set to 15 In comparison

experiments the structure of the LSTM network is the sameas the LSTM layers of the recurrent mixture density network+ree parameters of the ARIMA model (p d q) are de-termined as (1 1 1) after several experiments It must benoted that the recurrent mixture density network and LSTMnetwork predict TTI and AS simultaneously while theARIMA model processes the two indices separately

Figure 5 depicts the performance evaluation of themodelin the training process +e model training process iscompleted in the vicinity of the 40th epoch according to thedescending curve of the loss function

+e prediction performance comparison of the threemethods is given in Table 1 +e lowest errors of all metricsare highlighted in bold font which evidently demonstratethe prominent performance of the recurrent mixture densitynetwork In Table 1 the accuracy (1-MRE) of the recurrentmixture density network is over 092 at all the predictedtimestamps for both TTI and AS prediction tasks It isobvious that the prediction errors increase with the increaseof prediction time steps +e performance of the recurrentmixture density network is promising in 10min 20min and30min prediction tasks Compared to the results of theLSTM network the better performance of the recurrentmixture density network proves the superiority of the fuzzyprediction strategy +e attempt of combining MDN andLSTM brings an accurate solution for the short-term trafficflow prediction issue In 10min prediction the recurrentmixture density network improves the accuracy of the TTIprediction and AS prediction by 128 and 08 respec-tively against the ARIMA model which verifies theprominent advantage of the nonparametric method

+e MAE is visualized in Figure 6 where we depict thecumulative distribution function (CDF) of mean absoluteerrors of 10min 20min and 30min predicted results It isclearly observed that the CDF curve of the recurrent mixturedensity network is above the curves of the other twomethods Figure 6 presents strong evidence that the overallperformance of the recurrent mixture density network issuperior to the LSTM network and ARIMA model In theTTI prediction task there are over 90 of the test data withprediction error less than 02 over 80 of the test data withprediction error less than 01 For the prediction of AS theprediction error of more than 92 of the test data is under6 kmh andmore than 76 is under 3 kmh It is notable thatthe proportion of test data with low prediction error inLSTM experiments and ARIMA experiments are close tothat in the recurrent mixture density network experimentsWhile there are more test data with large errors in the LSTMexperiments and ARIMA experiments +e CDF of errorimplies that in most tests all the three methods yieldpromising results whereas in some unexpected scenariosuch as confronting traffic congestion the recurrent mixturedensity network is more reliable

+e prediction results of the recurrent mixture densitynetwork on two different road sections are demonstrated inFigure 7 which are the typical instances of heavy trafficcongestion and low traffic volume respectively +e ob-served TTI and AS are also depicted for comparison whichare clearly coherent in the periods of all the 21 days It is

Input

Label

hellipx1 x2 x3 x4 x5 x6 xn

Input

Label


Figure 3 Schematic of generating the training data when inputsequence length k 3 and prediction length h 2

1

2

3

4

5

0 2 4 6 8 10 12 14 16 18 20 22 240

25

50

75

TTI

AS

Time (h)

TTIAS

Figure 4 +e patterns of traffic time indices (TTI) and averagespeed (AS) of a certain road section on 27 March 2019


Table 1 Comparison of prediction performance

Time stepsMAE MRE RMSE

RMDN LSTM ARIMA RMDN LSTM ARIMA RMDN LSTM ARIMATTI1 00587 00705 00781 00376 00433 00504 01424 01630 018112 00923 01071 01115 00582 00650 00705 02180 02531 025873 01139 01261 01357 00718 00747 00857 02627 02998 03122AS4 17141 18088 19774 00415 00463 00495 24898 28266 317175 25568 27001 27395 00644 00714 00708 38673 45994 456116 31055 32512 33361 00790 00880 00869 47613 56278 56286

02 04 06 08 10 12 14 16 180001020304050607080910

Cum

ulat

ive d

istrib

utio

n fu

nctio

n

Mean absolute error (TTI)

RMDNLSTMARIMA

(a)

RMDNLSTMARIMA

3 6 9 12 15 18 21 24 27 300001020304050607080910

Cum

ulat

ive d

istrib

utio

n fu

nctio

n

Mean absolute error (AS (kmh))

(b)

Figure 6 Cumulative distribution function (CDF) of the mean absolute error (a) TTI (b) AS

0 5 10 15 20 25 30 35 40 4510

20

30

40

50

60

70

Loss

Epoch

Train lossValidation loss

Figure 5 Loss function evaluation in the training process


shown that the predicted results match well in both twoscenarios However the recurrent mixture density networkperforms better when the traffic condition is fluent Al-though the periodical pattern of the traffic flow indices isevident there are still some abnormal values especially inthe road section with heavy traffic congestion where the

traffic condition could be extremely terrible in rush hour Itis challenging to predict the traffic congestion with a largeTTI value and low AS value even so most of the mutationsare matched well as shown in Figure 7

Figure 8 presents the comparison of prediction results ofthree methods on a certain road section on 9 December

0 2 4 6 8 10 12 14 16 18 20123456789

10TT

I

Time (day)

TTIPre_TTI

ASPre_AS

0 2 4 6 8 10 12 14 16 18 200

1020304050607080

AS

(km

h)

Time (day)

(a)

TTIPre_TTI

0 2 4 6 8 10 12 14 16 18 20

1

2

3

TTI

Time (day)

0

ASPre_AS

0 2 4 6 8 10 12 14 16 18 20

10

20

30

40

50

60

AS

(km

h)

Time (day)

(b)

Figure 7 Short-term traffic flow prediction on different road sections (a) Road section with heavy traffic congestion (b) Road section withslight traffic congestion

0 2 4 6 8 10 12 14 16 18 20 22 240

2

4

6

8

10

TTI

Time (h)

Ground truthRMDN

LSTMARIMA

(a)

Ground truthRMDN

LSTMARIMA

0 2 4 6 8 10 12 14 16 18 20 22 240

20

40

60

80

AS

(km

h)

Time (h)

(b)

Figure 8 Comparison of prediction results of three methods on a certain road section on 9 December 2019


2019 where a typical traffic congestion phenomenon occursIn the period from 8 orsquoclock to 9 orsquoclock the traffic timeindex increased rapidly while the average speed becamelower In the rush hour it is shown that the recurrentmixture density network and ARIMA model successfullycapture this sudden change of the indices while the per-formance of the LSTM network is not as good as the othertwo methods Meanwhile the overall prediction results ofthe recurrent mixture density network and the LSTM net-work match the ground truth better than that of the ARIMAmodel

5 Discussion and Conclusion

In this paper we are dedicated to the issue of short-termtraffic flow prediction +e main contributions of this paperare as follows (1) we constructed the recurrent mixturedensity network by integrating the LSTM network andmixture density network To the best of our knowledge it isthe first time that the recurrent mixture density network isutilized in a short-term traffic flow prediction issue (2) Weattempted to implement these methods on a practical short-term traffic flow prediction task including two comparisonmethods

+e advantage of the recurrent mixture density networkconsists in two aspects (1) it has the ability to learn historicalinformation and contextual dependency in traffic flow dataseries which is useful for yielding promising prediction (2)+e strategy of fuzzy prediction generating probable dis-tributions of the following traffic flow indices instead ofconcrete values is closer to the actual behavior of traffic flowevolution Different from most of the existing work pro-cessing one certain traffic flow index at once we take thecoherent relationship between the traffic time indices andthe average speed into consideration +e recurrent mixturedensity network treats the two indices as a two-dimensionalvector and processes it simultaneously Meanwhile wepredict the traffic flow at several following time steps at oncewhich could provide more valuable information for trafficsituation awareness

+e recurrent mixture density model is easy to be appliedto different application scenarios since it has strong gen-eralization and learning ability In practical fields short-term traffic flow prediction is expected to play a crucial rolein the development of intelligent transportation systems andthe construction of a smart city

Data Availability

+e data can be accessed on the page httpswwwsodiccomcn

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is work was supported by National Natural ScienceFoundation of China (Grant no 42074013)

References

[1] Z Lv H FuW Tang andX Chen ldquoTraffic jam prediction basedon analysis of residents spatial activitiesrdquo in Proceedings of the2020 International Conference on Computer Information and BigData Applications (CIBDA) Guiyang China April 2020

[2] Yu Zheng L Capra O Wolfson and H Yang ldquoUrbancomputing concepts methodologies and applicationsrdquoACMTransactions on Intelligent Systems and Technology (TIST)-Special Section on Urban Computing vol 5 no 3 pp 1ndash552014

[3] Z Dongjie D Haiwen S Yundong and C Ning ldquoResearchon path planning model based on short-term traffic flowprediction in intelligent transportation systemrdquo Sensorsvol 18 no 12 p 4275 2018

[4] Y Zheng ldquoTrajectory data miningrdquo ACM Transactions onIntelligent Systems and Technology vol 6 no 3 pp 1ndash41 2015

[5] C Shifen ldquoA spatiotemporal multi-view-based learningmethod for short-term traffic forecastingrdquo InternationalJournal of Geo-Information vol 7 no 6 p 218 2018

[6] D I Tselentis E I Vlahogianni and M G Karlaftis ldquoIm-proving short-term traffic forecasts to combine models or notto combinerdquo IET Intelligent Transport Systems vol 9 no 2pp 193ndash201 2015

[7] B M Williams and L A Hoel ldquoModeling and forecastingvehicular traffic flow as a seasonal ARIMA process theoreticalbasis and empirical resultsrdquo Journal of Transportation Engi-neering vol 129 no 6 pp 664ndash672 2003

[8] I Okutani ldquo+e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation amp Trafficeory 1987

[9] S Wang J Cao and P S Yu ldquoDeep learning for spatio-temporal data mining a surveyrdquo IEEE Transactions onKnowledge and Data Engineering 2019

[10] Y Endo H Toda K Nishida and J Ikedo ldquoClassifying spatialtrajectories using representation learningrdquo InternationalJournal of Data Science and Analytics vol 2 no 3-4pp 107ndash117 2016

[11] R Chen M Chen W Li J Wang and X Yao ldquoMobilitymodes awareness from trajectories based on clustering and aconvolutional neural networkrdquo ISPRS International Journal ofGeo-Information vol 8 no 5 p 208 2019

[12] M Chen J M Davis C Liu Z Sun M M Zempila andW Gao ldquoUsing deep recurrent neural network for directbeam solar irradiance cloud screeningrdquo in Proceedings of theRemote Sensing and Modeling of Ecosystems for Sustain abilityXIV International Society for Optics and Photonics SanDiego CA USA September 2017

[13] R Chen M Chen W Li and N Guo ldquoPredicting futurelocations of moving objects by recurrent mixture densitynetworkrdquo ISPRS International Journal of Geo-Informationvol 9 p 2 2020

[14] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using BoxndashJenkins techniquesrdquo Trans-portation Research Record vol 722 pp 1ndash9 1979

[15] S R Chandra and H Al-Deek ldquoPredictions of freeway trafficspeeds and volumes using vector autoregressive modelsrdquoJournal of Intelligent Transportation Systems vol 13 no 2pp 53ndash72 2009

[16] C Chen Y Wang L Li J Hu and Z Zhang ldquo+e retrieval ofintra-day trend and its influence on traffic predictionrdquoTransportation Research Part C Emerging Technologiesvol 22 pp 103ndash118 2012


[17] B L Smith and M J Demetsky ldquoTraffic flow forecastingcomparison of modeling approachesrdquo Journal of Trans-portation Engineering vol 123 no 4 pp 261ndash266 1997

[18] N Polson and V Sokolov ldquoDeep learning predictors fortraffic flowsrdquo Transportation Research Part C EmergingTechnologies vol 79 pp 1ndash17 2016

[19] D Kang Y Lv and Y Y Chen ldquoShort-term traffic flowprediction with LSTM recurrent neural networkrdquo in Pro-ceedings of the 2017 IEEE 20th International Conference onIntelligent Transportation Systems (ITSC) Yokohama JapanOctober 2017

[20] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings ofthe 2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) Wuhan China November2017

[21] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation Re-search Part C Emerging Technologies vol 54 pp 187ndash1972015

[22] Y Tian and L Pan ldquoPredicting short-term traffic flow by longshort-termmemory recurrent neural networkrdquo in Proceedingsof the IEEE International Conference on Smart Citysocialcomsustaincom Chengdu China December 2015

[23] X Ma Z Dai Z He J Ma Y Wang and Y Wang ldquoLearningtraffic as images a deep convolutional neural network forlarge-scale transportation network speed predictionrdquo Sensorsvol 17 p 4 2017

[24] Y Wu and H Tan ldquoShort-term traffic flow forecasting withspatial-temporal correlation in a hybrid deep learningframeworkrdquo 2016 httpsarxivorgabs161201022

[25] Y Lv D Yanjie K Wenwen and L Zhengxi ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16pp 865ndash873 2015


estimated by the empirical way based on historical dataWidely used parametric methods include the time-seriesmodel Autoregressive Integrated Moving Average(ARIMA) model [7] and Kalman Filter [8] +ere are threecore parameters to determine the structure of ARIMA(p d q) where p denotes the autoregressive order d de-notes the integrated order and q is the moving averagepolynomial order respectively ARIMA model converts thetime-series data into stationary series without tendency andcaptures both the autocorrelation features and noise featuresof the stationary series Kalman Filter based methods rely ontransition matrix and dynamic equation However theshort-term traffic flow is highly nonlinear and stochastic+e evolution of traffic flow is actually an uncertain eventHence traditional parametric methods are limited in cap-turing the inherent features of traffic flow and fail to achieveexpected performance

In order to improve prediction precision plenty of lit-erature focuses on applying nonparametric models to trafficflow prediction Nonparametric methods are mainly basedon machine learning and deep learning technologies whichdo not involve predefined parameters and structure [9]Deep neural networks (DNN) [10] including convolutionalneural network (CNN) [11] and recurrent neural network(RNN) [12] have achieved remarkable performance in awide range of fields such as computer vision (CV) andnatural language processing (NLP) +e rapid developmentof deep learning technologies brings opportunities to im-prove the precision of traffic flow prediction more efficientlyRNN has the ability to learn historical information andcontextual dependency from traffic flow series FurthermoreRNN excels at fitting nonlinear time series by virtue of thenonlinear activation units Actually the concrete value oftraffic flow indices is nonrepetitive and usually has a certaindistribution +e way to predict concrete values does notmatch with the fact that the future evolution of traffic flow isan uncertain event Hence the methods introduced aboveregarding the anticipated traffic flow indices as deterministicresults could be confused by the continuous and non-repetitive values

Motivated by the current study status we dedicate thispaper to a short-term traffic flow prediction approachbased on the recurrent mixture density network +e re-current mixture density network is constructed by inte-grating long short-term memory (LSTM) network andmixture density network (MDN) [13] LSTM network is avariant of RNN with a special unit structure which iscompetent for learning historical information and con-textual dependency in time series and avoiding the issue ofvanishing gradient +e main purpose of introducing MDNis to parameterize mixture distributions by the outputs ofthe neural network In this way the prediction modelgenerates the probable distributions of anticipated trafficflow indices instead of concrete values +e ultimate pre-diction results can be obtained by sampling from thedistributions +e manner of fuzzy prediction accords withthe actual behavior of traffic flow evolution To the best ofour knowledge this is the first time that the recurrentmixture density network is applied to a real-world short-

term traffic flow prediction task Additionally the recurrentmixture density model is easy to be transplanted to dif-ferent application scenarios due to the prominent ad-vantage of generalization and learning ability

+e prediction model is implemented on a real-worldtraffic flow prediction task Different from most of the otherexisting work which predicts the value of only one certaintraffic flow index of next one time step at once we predictboth the traffic time indices (TTI) and the average speed(AS) of a certain road section at several following time stepssimultaneously Since it is reasonable to consider the co-herent relationship between TTI and AS predicting trafficflow at more than one time step also provides more valuableinformation for traffic situation awareness+e remainder ofthis paper is organized as follows the previous work relatedto short-term traffic flow prediction is reviewed in Section 2Section 3 elaborates the prediction model Section 4 presentsthe experimental results and the corresponding evaluationand finally the discussion and conclusions are presented inSection 5

2 Related Work

+e manner of short-term traffic flow prediction methodscan be concluded as follows build a mathematic modelbased on history series data and then use the model topredict the future value of traffic flow indices In general theexisting prediction methods can be categorized into twogroups parametric methods and nonparametric methodsaccording to the way of modeling

In the category of parametric methods the modelparameters are estimated by the empirical way based onhistorical data ARIMA model is a typical parametricmodel As early as 1979 ARIMA has been applied torepresent freeway traffic time-series data and make fore-casts one time-interval in advance [14] In recent yearsvarious ARIMA models and the ARIMA-based modelshave been proposed A vector autoregressive model wasfound appropriate and better than the traditional ARIMAmodel in practical experiments [15] which introduced thetraffic flow information upstream as well as downstream+e ARIMA model and its variants such as autoregressivemoving average (ARMA) and seasonal ARIMA (SARIMA)are also studied comparatively with other widely usedmachine learning methods to evaluate the efficiency intraffic flow prediction tasks [16 17]

However short-term traffic flow is difficult to predictdue to the nonlinear and stochastic feature +e perfor-mance of parametric methods is limited Plenty of existingwork attempts to develop nonparametric methods withflexible structure and parameters Nonparametricmethods mainly include machine learning and deeplearning techniques [18] RNN and its variants such asLSTM and gated recurrent units (GRU) network arewidely used in short-term traffic flow prediction tasksOne of the advantages of the LSTM network is that theformat of the input data is flexible Hence in order toimprove the prediction performance several traffic flowindices such as flow speed and occupancy can be used as





3 Methodology












(1)



1113957yj 1113944N

n1Whny middot h

nj + by (2)




Cjndash1

hjndash1

fj

xj

Wf Wi Wc Wo hj

ij oj

hjCj

~

tanh

tanhσ σ σ

Cj




1113957yj 1113957πmj 1113957ρm

j 1113957μmj 1113957σm

j1113966 1113967M

m1(3)



yj πmj ρm

j μmj σm

j1113966 1113967M

m1(4)

where


j1113872 1113873 exp 1113957πm

j1113872 1113873

1113936Mm1 exp 1113957πm

j1113872 1113873


j1113872 1113873

μmj 1113957μm

j

σmj exp 1113957σm

j1113872 1113873

(5)



Pr xj+l | yj1113872 1113873 1113944M

m

πmj G xj+l| μ

mj σm

j ρmj1113872 1113873 (6)


G(x|μ σ ρ) 1

2πσ1σ2


minusZ

2 1 minus ρ21113872 1113873⎡⎢⎣ ⎤⎥⎦

Z x1 minus μ1( 1113857

2

σ21+

x2 minus μ2( 11138572


σ1σ2

(7)



m1πm

i G xi+l|μmi σm

i ρmi( 1113857⎛⎝ ⎞⎠ (8)



4 Experiments





MDNlayer


LSTMlayers

Inputlayer

Traininglabel




MAE 1n

1113944

n

i1xiprime minus xi

11138681113868111386811138681113868111386811138681113868

MRE 1n

1113944

n

i1

xiprime minus xi

11138681113868111386811138681113868111386811138681113868

xi

RMSE

1n

1113944

n

i1xiprime minus xi

111386811138681113868111386811138681113868111386811138682

11139741113972

(9)









Input

Label


Input

Label



1

2

3

4

5

0 2 4 6 8 10 12 14 16 18 20 22 240

25

50

75

TTI

AS

Time (h)

TTIAS






02 04 06 08 10 12 14 16 180001020304050607080910

Cum

ulat

ive d

istrib

utio

n fu

nctio

n


RMDNLSTMARIMA

(a)

RMDNLSTMARIMA

3 6 9 12 15 18 21 24 27 300001020304050607080910

Cum

ulat

ive d

istrib

utio

n fu

nctio

n


(b)


0 5 10 15 20 25 30 35 40 4510

20

30

40

50

60

70

Loss

Epoch







0 2 4 6 8 10 12 14 16 18 20123456789

10TT

I

Time (day)

TTIPre_TTI

ASPre_AS

0 2 4 6 8 10 12 14 16 18 200

1020304050607080

AS

(km

h)

Time (day)

(a)

TTIPre_TTI

0 2 4 6 8 10 12 14 16 18 20

1

2

3

TTI

Time (day)

0

ASPre_AS

0 2 4 6 8 10 12 14 16 18 20

10

20

30

40

50

60

AS

(km

h)

Time (day)

(b)


0 2 4 6 8 10 12 14 16 18 20 22 240

2

4

6

8

10

TTI

Time (h)

Ground truthRMDN

LSTMARIMA

(a)

Ground truthRMDN

LSTMARIMA

0 2 4 6 8 10 12 14 16 18 20 22 240

20

40

60

80

AS

(km

h)

Time (h)

(b)








Data Availability




Acknowledgments


References































3 Methodology












(1)



1113957yj 1113944N

n1Whny middot h

nj + by (2)




Cjndash1

hjndash1

fj

xj

Wf Wi Wc Wo hj

ij oj

hjCj

~

tanh

tanhσ σ σ

Cj




1113957yj 1113957πmj 1113957ρm

j 1113957μmj 1113957σm

j1113966 1113967M

m1(3)



yj πmj ρm

j μmj σm

j1113966 1113967M

m1(4)

where


j1113872 1113873 exp 1113957πm

j1113872 1113873

1113936Mm1 exp 1113957πm

j1113872 1113873


j1113872 1113873

μmj 1113957μm

j

σmj exp 1113957σm

j1113872 1113873

(5)



Pr xj+l | yj1113872 1113873 1113944M

m

πmj G xj+l| μ

mj σm

j ρmj1113872 1113873 (6)


G(x|μ σ ρ) 1

2πσ1σ2


minusZ

2 1 minus ρ21113872 1113873⎡⎢⎣ ⎤⎥⎦

Z x1 minus μ1( 1113857

2

σ21+

x2 minus μ2( 11138572


σ1σ2

(7)



m1πm

i G xi+l|μmi σm

i ρmi( 1113857⎛⎝ ⎞⎠ (8)



4 Experiments





MDNlayer


LSTMlayers

Inputlayer

Traininglabel




MAE 1n

1113944

n

i1xiprime minus xi

11138681113868111386811138681113868111386811138681113868

MRE 1n

1113944

n

i1

xiprime minus xi

11138681113868111386811138681113868111386811138681113868

xi

RMSE

1n

1113944

n

i1xiprime minus xi

111386811138681113868111386811138681113868111386811138682

11139741113972

(9)









Input

Label


Input

Label



1

2

3

4

5

0 2 4 6 8 10 12 14 16 18 20 22 240

25

50

75

TTI

AS

Time (h)

TTIAS






02 04 06 08 10 12 14 16 180001020304050607080910

Cum

ulat

ive d

istrib

utio

n fu

nctio

n


RMDNLSTMARIMA

(a)

RMDNLSTMARIMA

3 6 9 12 15 18 21 24 27 300001020304050607080910

Cum

ulat

ive d

istrib

utio

n fu

nctio

n


(b)


0 5 10 15 20 25 30 35 40 4510

20

30

40

50

60

70

Loss

Epoch







0 2 4 6 8 10 12 14 16 18 20123456789

10TT

I

Time (day)

TTIPre_TTI

ASPre_AS

0 2 4 6 8 10 12 14 16 18 200

1020304050607080

AS

(km

h)

Time (day)

(a)

TTIPre_TTI

0 2 4 6 8 10 12 14 16 18 20

1

2

3

TTI

Time (day)

0

ASPre_AS

0 2 4 6 8 10 12 14 16 18 20

10

20

30

40

50

60

AS

(km

h)

Time (day)

(b)


0 2 4 6 8 10 12 14 16 18 20 22 240

2

4

6

8

10

TTI

Time (h)

Ground truthRMDN

LSTMARIMA

(a)

Ground truthRMDN

LSTMARIMA

0 2 4 6 8 10 12 14 16 18 20 22 240

20

40

60

80

AS

(km

h)

Time (h)

(b)








Data Availability




Acknowledgments


References





























1113957yj 1113957πmj 1113957ρm

j 1113957μmj 1113957σm

j1113966 1113967M

m1(3)



yj πmj ρm

j μmj σm

j1113966 1113967M

m1(4)

where


j1113872 1113873 exp 1113957πm

j1113872 1113873

1113936Mm1 exp 1113957πm

j1113872 1113873


j1113872 1113873

μmj 1113957μm

j

σmj exp 1113957σm

j1113872 1113873

(5)



Pr xj+l | yj1113872 1113873 1113944M

m

πmj G xj+l| μ

mj σm

j ρmj1113872 1113873 (6)


G(x|μ σ ρ) 1

2πσ1σ2


minusZ

2 1 minus ρ21113872 1113873⎡⎢⎣ ⎤⎥⎦

Z x1 minus μ1( 1113857

2

σ21+

x2 minus μ2( 11138572


σ1σ2

(7)



m1πm

i G xi+l|μmi σm

i ρmi( 1113857⎛⎝ ⎞⎠ (8)



4 Experiments





MDNlayer


LSTMlayers

Inputlayer

Traininglabel




MAE 1n

1113944

n

i1xiprime minus xi

11138681113868111386811138681113868111386811138681113868

MRE 1n

1113944

n

i1

xiprime minus xi

11138681113868111386811138681113868111386811138681113868

xi

RMSE

1n

1113944

n

i1xiprime minus xi

111386811138681113868111386811138681113868111386811138682

11139741113972

(9)









Input

Label


Input

Label



1

2

3

4

5

0 2 4 6 8 10 12 14 16 18 20 22 240

25

50

75

TTI

AS

Time (h)

TTIAS






02 04 06 08 10 12 14 16 180001020304050607080910

Cum

ulat

ive d

istrib

utio

n fu

nctio

n


RMDNLSTMARIMA

(a)

RMDNLSTMARIMA

3 6 9 12 15 18 21 24 27 300001020304050607080910

Cum

ulat

ive d

istrib

utio

n fu

nctio

n


(b)


0 5 10 15 20 25 30 35 40 4510

20

30

40

50

60

70

Loss

Epoch







0 2 4 6 8 10 12 14 16 18 20123456789

10TT

I

Time (day)

TTIPre_TTI

ASPre_AS

0 2 4 6 8 10 12 14 16 18 200

1020304050607080

AS

(km

h)

Time (day)

(a)

TTIPre_TTI

0 2 4 6 8 10 12 14 16 18 20

1

2

3

TTI

Time (day)

0

ASPre_AS

0 2 4 6 8 10 12 14 16 18 20

10

20

30

40

50

60

AS

(km

h)

Time (day)

(b)


0 2 4 6 8 10 12 14 16 18 20 22 240

2

4

6

8

10

TTI

Time (h)

Ground truthRMDN

LSTMARIMA

(a)

Ground truthRMDN

LSTMARIMA

0 2 4 6 8 10 12 14 16 18 20 22 240

20

40

60

80

AS

(km

h)

Time (h)

(b)








Data Availability




Acknowledgments


References




























MAE 1n

1113944

n

i1xiprime minus xi

11138681113868111386811138681113868111386811138681113868

MRE 1n

1113944

n

i1

xiprime minus xi

11138681113868111386811138681113868111386811138681113868

xi

RMSE

1n

1113944

n

i1xiprime minus xi

111386811138681113868111386811138681113868111386811138682

11139741113972

(9)









Input

Label


Input

Label



1

2

3

4

5

0 2 4 6 8 10 12 14 16 18 20 22 240

25

50

75

TTI

AS

Time (h)

TTIAS






02 04 06 08 10 12 14 16 180001020304050607080910

Cum

ulat

ive d

istrib

utio

n fu

nctio

n


RMDNLSTMARIMA

(a)

RMDNLSTMARIMA

3 6 9 12 15 18 21 24 27 300001020304050607080910

Cum

ulat

ive d

istrib

utio

n fu

nctio

n


(b)


0 5 10 15 20 25 30 35 40 4510

20

30

40

50

60

70

Loss

Epoch







0 2 4 6 8 10 12 14 16 18 20123456789

10TT

I

Time (day)

TTIPre_TTI

ASPre_AS

0 2 4 6 8 10 12 14 16 18 200

1020304050607080

AS

(km

h)

Time (day)

(a)

TTIPre_TTI

0 2 4 6 8 10 12 14 16 18 20

1

2

3

TTI

Time (day)

0

ASPre_AS

0 2 4 6 8 10 12 14 16 18 20

10

20

30

40

50

60

AS

(km

h)

Time (day)

(b)


0 2 4 6 8 10 12 14 16 18 20 22 240

2

4

6

8

10

TTI

Time (h)

Ground truthRMDN

LSTMARIMA

(a)

Ground truthRMDN

LSTMARIMA

0 2 4 6 8 10 12 14 16 18 20 22 240

20

40

60

80

AS

(km

h)

Time (h)

(b)








Data Availability




Acknowledgments


References































02 04 06 08 10 12 14 16 180001020304050607080910

Cum

ulat

ive d

istrib

utio

n fu

nctio

n


RMDNLSTMARIMA

(a)

RMDNLSTMARIMA

3 6 9 12 15 18 21 24 27 300001020304050607080910

Cum

ulat

ive d

istrib

utio

n fu

nctio

n


(b)


0 5 10 15 20 25 30 35 40 4510

20

30

40

50

60

70

Loss

Epoch







0 2 4 6 8 10 12 14 16 18 20123456789

10TT

I

Time (day)

TTIPre_TTI

ASPre_AS

0 2 4 6 8 10 12 14 16 18 200

1020304050607080

AS

(km

h)

Time (day)

(a)

TTIPre_TTI

0 2 4 6 8 10 12 14 16 18 20

1

2

3

TTI

Time (day)

0

ASPre_AS

0 2 4 6 8 10 12 14 16 18 20

10

20

30

40

50

60

AS

(km

h)

Time (day)

(b)


0 2 4 6 8 10 12 14 16 18 20 22 240

2

4

6

8

10

TTI

Time (h)

Ground truthRMDN

LSTMARIMA

(a)

Ground truthRMDN

LSTMARIMA

0 2 4 6 8 10 12 14 16 18 20 22 240

20

40

60

80

AS

(km

h)

Time (h)

(b)








Data Availability




Acknowledgments


References































0 2 4 6 8 10 12 14 16 18 20123456789

10TT

I

Time (day)

TTIPre_TTI

ASPre_AS

0 2 4 6 8 10 12 14 16 18 200

1020304050607080

AS

(km

h)

Time (day)

(a)

TTIPre_TTI

0 2 4 6 8 10 12 14 16 18 20

1

2

3

TTI

Time (day)

0

ASPre_AS

0 2 4 6 8 10 12 14 16 18 20

10

20

30

40

50

60

AS

(km

h)

Time (day)

(b)


0 2 4 6 8 10 12 14 16 18 20 22 240

2

4

6

8

10

TTI

Time (h)

Ground truthRMDN

LSTMARIMA

(a)

Ground truthRMDN

LSTMARIMA

0 2 4 6 8 10 12 14 16 18 20 22 240

20

40

60

80

AS

(km

h)

Time (h)

(b)








Data Availability




Acknowledgments


References

































Data Availability




Acknowledgments


References




























short-term traffic flow prediction with recurrent mixture

Documents