semi-supervised hierarchical recurrent graph neural

8
Semi-Supervised Hierarchical Recurrent Graph Neural Network for City-Wide Parking Availability Prediction Weijia Zhang 1* , Hao Liu 2*† , Yanchi Liu 3 , Jingbo Zhou 2 , Hui Xiong 21 University of Science and Technology of China, Hefei, China, 2 Business Intelligence Lab, Baidu Research, National Engineering Laboratory of Deep Learning Technology and Application, Beijing, China, 3 Rutgers University, USA [email protected], {liuhao30, zhoujingbo}@baidu.com, [email protected], [email protected] Abstract The ability to predict city-wide parking availability is cru- cial for the successful development of Parking Guidance and Information (PGI) systems. Indeed, the effective prediction of city-wide parking availability can improve parking effi- ciency, help urban planning, and ultimately alleviate city con- gestion. However, it is a non-trivial task for predicting city- wide parking availability because of three major challenges: 1) the non-Euclidean spatial autocorrelation among parking lots, 2) the dynamic temporal autocorrelation inside of and between parking lots, and 3) the scarcity of information about real-time parking availability obtained from real-time sen- sors (e.g., camera, ultrasonic sensor, and GPS). To this end, we propose S emi-supervised H iera rchical Re current Graph Neural Network (SHARE) for predicting city-wide parking availability. Specifically, we first propose a hierarchical graph convolution structure to model non-Euclidean spatial auto- correlation among parking lots. Along this line, a contextual graph convolution block and a soft clustering graph convo- lution block are respectively proposed to capture local and global spatial dependencies between parking lots. Addition- ally, we adopt a recurrent neural network to incorporate dy- namic temporal dependencies of parking lots. Moreover, we propose a parking availability approximation module to esti- mate missing real-time parking availabilities from both spa- tial and temporal domain. Finally, experiments on two real- world datasets demonstrate the prediction performance of SHARE outperforms seven state-of-the-art baselines. Introduction In recent years, we have witnessed significant development of Intelligent Transportation Systems (ITS) (Zhang et al. 2011). Parking guidance and information (PGI) systems, es- pecially parking availability prediction, is an indispensable component of ITS. According to a survey by the Interna- tional Parking Institute (IPI) 1 , over 30% cars on the road are searching for parking, and these cruising cars contribute up to 40% traffic jams in urban areas (Shoup 2006). Thus, city- wide parking availability prediction is of great importance * Equal contribution. Corresponding author. Copyright c 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. to help drivers efficiently find parking, help governments for urban planning, and alleviate the city’s traffic congestion. Due to its importance, city-wide parking availability pre- diction has attracted much attention from both academia and industry. On one hand, Google Maps predicts parking diffi- culty on a city-wide scale based on users’ survey and tra- jectory data (Arora et al. 2019), and Baidu Maps estimates real-time city-wide parking availability based on environ- mental contextual features (e.g., Point of Interest (POI), map queries, etc.) (Rong et al. 2018). The above mentions make city-wide parking availability prediction based on biased and indirect input signals (e.g., user’s feedback are noisy and lagged), which may induce inaccurate prediction results. On the other hand, in recent years, we have witnessed real-time sensor devices such as camera, ultrasonic sensor, and GPS become ubiquitous, which can significantly improve the pre- diction accuracy of parking availability (Mathur et al. 2010; Fusek et al. 2013; Zhou and Tung 2015). However, for eco- nomic and privacy concerns, it is difficult to be scaled up to cover all parking lots of a city. In this paper, we propose to simultaneously predict the availability of each parking lot of a city, based on both en- vironmental contextual data (e.g., POI distribution, popu- lation) and partially observed real-time parking availabil- ity data. By integrating both datasets, we can make a bet- ter parking availability prediction at a city-scale. However, it is a non-trivial task faced with the following three ma- jor challenges. (1) Spatial autocorrelation. The availability of a parking lot is not only effected by the occupancy of nearby parking lots but may also synchronize with distant parking lots (Wang et al. 2017; Liu et al. 2017). The first challenge is how to model the irregular and non-Euclidean autocorrelation between parking lots. (2) Temporal autocor- relation. Future availability of a parking lot is correlated with its availability of previous time periods (Rajabioun and Ioannou 2015). Besides, the spatial autocorrelation between parking lots may also vary over time (Liang et al. 2018; Yao et al. 2019). How to model dynamic temporal autocor- relation of each parking lot is another challenge. (3) Parking availability scarcity. Only a small portion of parking lots 1 https://www.parking.org/wp-content/uploads/2015/12/Emergi ng-Trends-2012.pdf

Upload: others

Post on 07-Jul-2022

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semi-Supervised Hierarchical Recurrent Graph Neural

Semi-Supervised Hierarchical Recurrent Graph Neural Networkfor City-Wide Parking Availability Prediction

Weijia Zhang1∗, Hao Liu2∗†, Yanchi Liu3, Jingbo Zhou2, Hui Xiong2†1University of Science and Technology of China, Hefei, China, 2Business Intelligence Lab, Baidu Research,

National Engineering Laboratory of Deep Learning Technology and Application, Beijing, China, 3Rutgers University, [email protected], {liuhao30, zhoujingbo}@baidu.com, [email protected], [email protected]

Abstract

The ability to predict city-wide parking availability is cru-cial for the successful development of Parking Guidance andInformation (PGI) systems. Indeed, the effective predictionof city-wide parking availability can improve parking effi-ciency, help urban planning, and ultimately alleviate city con-gestion. However, it is a non-trivial task for predicting city-wide parking availability because of three major challenges:1) the non-Euclidean spatial autocorrelation among parkinglots, 2) the dynamic temporal autocorrelation inside of andbetween parking lots, and 3) the scarcity of information aboutreal-time parking availability obtained from real-time sen-sors (e.g., camera, ultrasonic sensor, and GPS). To this end,we propose Semi-supervised Hierarchical Recurrent GraphNeural Network (SHARE) for predicting city-wide parkingavailability. Specifically, we first propose a hierarchical graphconvolution structure to model non-Euclidean spatial auto-correlation among parking lots. Along this line, a contextualgraph convolution block and a soft clustering graph convo-lution block are respectively proposed to capture local andglobal spatial dependencies between parking lots. Addition-ally, we adopt a recurrent neural network to incorporate dy-namic temporal dependencies of parking lots. Moreover, wepropose a parking availability approximation module to esti-mate missing real-time parking availabilities from both spa-tial and temporal domain. Finally, experiments on two real-world datasets demonstrate the prediction performance ofSHARE outperforms seven state-of-the-art baselines.

IntroductionIn recent years, we have witnessed significant developmentof Intelligent Transportation Systems (ITS) (Zhang et al.2011). Parking guidance and information (PGI) systems, es-pecially parking availability prediction, is an indispensablecomponent of ITS. According to a survey by the Interna-tional Parking Institute (IPI)1, over 30% cars on the road aresearching for parking, and these cruising cars contribute upto 40% traffic jams in urban areas (Shoup 2006). Thus, city-wide parking availability prediction is of great importance

∗Equal contribution.†Corresponding author.

Copyright c© 2020, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

to help drivers efficiently find parking, help governments forurban planning, and alleviate the city’s traffic congestion.

Due to its importance, city-wide parking availability pre-diction has attracted much attention from both academia andindustry. On one hand, Google Maps predicts parking diffi-culty on a city-wide scale based on users’ survey and tra-jectory data (Arora et al. 2019), and Baidu Maps estimatesreal-time city-wide parking availability based on environ-mental contextual features (e.g., Point of Interest (POI), mapqueries, etc.) (Rong et al. 2018). The above mentions makecity-wide parking availability prediction based on biasedand indirect input signals (e.g., user’s feedback are noisy andlagged), which may induce inaccurate prediction results. Onthe other hand, in recent years, we have witnessed real-timesensor devices such as camera, ultrasonic sensor, and GPSbecome ubiquitous, which can significantly improve the pre-diction accuracy of parking availability (Mathur et al. 2010;Fusek et al. 2013; Zhou and Tung 2015). However, for eco-nomic and privacy concerns, it is difficult to be scaled up tocover all parking lots of a city.

In this paper, we propose to simultaneously predict theavailability of each parking lot of a city, based on both en-vironmental contextual data (e.g., POI distribution, popu-lation) and partially observed real-time parking availabil-ity data. By integrating both datasets, we can make a bet-ter parking availability prediction at a city-scale. However,it is a non-trivial task faced with the following three ma-jor challenges. (1) Spatial autocorrelation. The availabilityof a parking lot is not only effected by the occupancy ofnearby parking lots but may also synchronize with distantparking lots (Wang et al. 2017; Liu et al. 2017). The firstchallenge is how to model the irregular and non-Euclideanautocorrelation between parking lots. (2) Temporal autocor-relation. Future availability of a parking lot is correlatedwith its availability of previous time periods (Rajabioun andIoannou 2015). Besides, the spatial autocorrelation betweenparking lots may also vary over time (Liang et al. 2018;Yao et al. 2019). How to model dynamic temporal autocor-relation of each parking lot is another challenge. (3) Parkingavailability scarcity. Only a small portion of parking lots

1https://www.parking.org/wp-content/uploads/2015/12/Emerging-Trends-2012.pdf

Page 2: Semi-Supervised Hierarchical Recurrent Graph Neural

are equipped with real-time sensors. According to one of thelargest map service application, there are over 70, 000 park-ing lots in Beijing, however, only 6.12% of them have real-time parking availability data. The third challenge is how toutilize the scarce and incomplete real-time parking availabil-ity information.

To tackle above challenges, in this paper, we presentSemi-supervised Hierarchical Recurrent Graph Neural Net-work (SHARE) for city-wide parking availability predic-tion. Our major contributions are summarized as follows:• We propose a semi-supervised spatio-temporal learning

framework to incorporate both environmental contextualfactors and sparse real-time parking availability data forcity-wide parking availability prediction.

• We propose a hierarchical graph convolution module tocapture non-Euclidean spatial correlations among parkinglots. It consists of a contextual graph convolution blockand a soft clustering graph convolution block for local andglobal spatial dependencies modeling, respectively.

• We propose a parking availability approximation mod-ule to estimate missing real-time parking availabilities ofparking lots without sensor monitoring. Specifically, weintroduce a propagating convolution block and reuse thetemporal module to approximate missing parking avail-abilities from both spatial and temporal domain, then fusethem through an entropy-based mechanism.

• We evaluate SHARE on two real-world datasets col-lected from BEIJING and SHENZHEN, two metropolisesin China. The results demonstrate our model achieves thebest prediction performance against seven baselines.

PreliminariesConsider a set of parking lots P = Pl ∪ Pu ={p1, p2, . . . , pN}, where N is the total number of parkinglots, Pl and Pu denote a set of parking lots with and withoutreal-time sensors (e.g., camera, ultrasonic sensor, GPS, etc.),respectively. Let Xt = {xt1,xt2, . . . ,xtN} ∈ RN×M denoteobserved M dimensional contextual feature vectors (e.g.,POI distribution, population, etc.) for all parking lots in Pat time t. We begin the formal definition of parking avail-ability prediction with the definition of parking availability.Definition 1 Parking availability (PA). Given a parking lotpi ∈ P , at time step t, the parking availability of pi, denotedyti is defined as the number of vacant parking spot in pi.

Specifically, we use ytPl= {yt1, yt2, . . . , yt|Pl|} to denote

observed PAs of parking lots in Pl at time step t. In thispaper, we are interested in predicting PAs for all parking lotspi ∈ P by leveraging the contextual data of P and partiallyobserved real-time parking availability data of Pl.Problem 1 Parking availability prediction problem. Givenhistorical time window T , contextual features for all parkinglots X = (Xt−T+1,Xt−T+2, . . . ,Xt), and partially ob-served real-time PAs YPl

= (yt−T+1Pl

,yt−T+2Pl

, . . . ,ytPl),

our problem is to predict PAs for all pi ∈ P over the next τtime steps,

f(X ;YPl)→ (yt+1, yt+2, . . . , yt+τ ), (1)

��� �

!"#!$%&

'($)!$%&

*!!$%&

*$+#,-"

./01

!23$44

������ ���� ������������������

������� ���� ������������������

567#567#567/

89:,/

89<:,/

89=<>,/

89=/>,/

89>,/

?9@/A1

?9@/AB

�����

?C9@/A1

?C9@/AB

��3$44�� �������������

������ �� �� ����� ����������

89@<>,/

89@/>,/

./

89@/0DA1

89@/

?9E/

?9E/0DA1

F3'

���� ������� ��� � �������������� �� ���

���

89=/0DA1

89=/

'G 'H

?C9=/A1

?C9=/AB

Figure 1: The framework overview of SHARE.

where yt+1 = yt+1Pl∪ yt+1

Pu, f(·) is the mapping function we

aim to learn.

Framework overviewThe architecture of SHARE is shown in Figure 1, where theinputs are contextual features as well as partially observedreal-time PAs, and the output are the predicted PAs of allparking lots in next τ time steps. There are three major com-ponents in SHARE. First, the Hierarchical graph convolu-tion module models spatial autocorrelations among parkinglots, where the Contextual Graph Convolution (CxtConv)block captures local spatial dependencies between parkinglots through rich contextual features (e.g., POI distribution,regional population, etc.), while the Soft Clustering GraphConvolution (SCConv) block captures global correlationsamong distant parking lots by softly assigning each park-ing lot to a set of latent cluster nodes. Second, the tem-poral autocorrelation modeling module employs the GatedRecurrent Unit (GRU) to model dynamic temporal depen-dencies of each parking lot. Third, the PA approximationmodule estimates distributions of missing PAs for parkinglots in Pu, from both spatial and temporal domain. In thespatial domain, the Propagating Graph Convolution (Prop-Conv) block propagates observed real-time PAs to approxi-nate missing PAs based on the contextual similarity of eachparking lot. In the temporal domain, we reuse the GRU mod-ule to approximate current PA distributions based on its out-put in previous time period. Two estimated PA distributionsare then fused through an entropy-based mechanism andfeed to SCConv block and GRU module for final prediction.

Hierarchical spatial dependency modelingWe first introduce the hierarchical graph convolution mod-ule, including the contextual graph convolution block andthe soft clustering graph convolution block.

Contextual graph convolutionIn the spatial domain, the PA of nearby parking lots are usu-ally correlated and mutually influenced by each other. Forexample, when there is a big concert, the PAs of parkinglots near the concert hall are usually low, and the parkingdemand usually gradually diffuses from nearby to distant.

Page 3: Semi-Supervised Hierarchical Recurrent Graph Neural

Inspired by the recent success of graph convolution net-work (Kipf and Welling 2017; Velickovic et al. 2018) onprocessing non-Euclidean graph structures, we first intro-duce the CxtConv block to capture local spatial dependen-cies solely based on contextual features.

We model the local correlations among parking lots as agraph G = (V,E,A), where V = P is the set of parkinglots, E is a set of edges indicating connectivity among park-ing lots, and A denotes the proximity matrix of G (Ma etal. 2019). Specifically, we define the connectivity constrainteij ∈ E as

eij =

{1, dist(vi, vj) ≤ ε0, otherwise

, (2)

where dist(·) is the road network distance between parkinglots pi and pj , ε is a distance threshold.

Since the influence of different nearby parking lots mayvary non-linearly, we employ an attention mechanism tocompute the coefficient between parking lots, defined as

cij = Attn(Waxci ,Wax

cj), (3)

where xci and xcj are current contextual representations ofparking lot pi and pj , Wa is a learnable weighted ma-trix shared over all edges, and Attn(·) is a shared attentionmechanism (e.g., dot-product, concatenation, etc.) (Vaswaniet al. 2017). The proximity score between pi and pj is fur-ther defined as

αij =exp(cij)∑

k∈Niexp(cik)

. (4)

In general, the above attention mechanism is capable ofcomputing pair-wise proximity score for all pi ∈ P . How-ever, this formulation will lead to quadratic complexity.To weigh more attention on neighboring parking lots andhelp faster convergence, we inject the adjacency constraintwhere the attention operation only operate on adjacent nodesj ∈ Ni, where Ni is a set of neighboring parking lots of piin G. Note that the influence of nearby parking lot at differ-ent time step may also vary, we learn a different proximityscore for each different time steps.

Once αij is obtained, the contextual graph convolutionoperation updates representation of current parking lot byaggregating and transforming its neighbors, defined as

xc′

i = σ(∑j∈Ni

αijWcxcj), (5)

where σ is a non-linear activation function, and Wc ∈ Rd×dis a learnable weighted matrix shared over all parking lots.Note that we can stack l identical contextual graph convolu-tion layers to capture l-hop local dependencies, and xcj is theraw contextual feature in the first CxtConv layer.

Soft clustering graph convolutionBesides local correlation, distant parking lots may also becorrelated. For example, distant parking lots in similar func-tional areas may show similar PA, e.g., business areas mayhave lower PA at office hour, and residential areas may have

Soft assignment matrix./.0 Latent node

345,7 = 1:

7;<

4 ∈ >?×:

Figure 2: Hierarchical soft clustering.

higher PA at the same time. However, CxtConv only cap-tures local spatial correlation. (Li, Han, and Wu 2018) showswhen l goes large, the representation of all parking lots tendsto be similar, therefore losses discriminative power. To thisend, we propose the SCConv block to capture global corre-lations between parking lots. Specifically, SCConv defines aset of latent nodes and learns the representation of each la-tent node based on learned representations of each parkinglot. Rather than cluster each parking lot into a specific clus-ter, we learn a soft assignment matrix so that each parkinglot have a chance to belong to multiple clusters with differ-ent probabilities (but with total probability equal to one), asshown in Figure 2.

The intuition behind SCConv is two-fold. First, distantparking lots may have similar contextual features and PAs,therefore should have similar representations. The sharedlatent node representation can be viewed as a regulariza-tion for the prediction task. Second, one parking lot may bemapped to multiple latent nodes. If we view each latent nodeas a different functionality class, a parking lot may serve forseveral functionalities. For example, a parking lot in a recre-ational center may be occupied by external visitors from anearby office building.

The key component in SCConv is the soft assignment ma-trix. Given that there are K latent nodes, let S ∈ RN×Kdenotes the soft assignment matrix, where Si,j ∈ S denotesthe probability of i-th parking lot pi maps to j-th latent node.Specifically, we use Si,· denote the i-th row and S·,j denotethe j-th column of S. Given the learned representation ofeach parking lot xi, each row of S is computed as

Si,· = Softmax(Wsxi), (6)

which guarantees that the probabilities that a given parkinglot belongs to each latent node sum equals one.

Once S is obtained, the representation of each latent nodexsi ∈ Xs can be derived by

xsi =

N∑j=1

S>i,jxj . (7)

Given the representation of each latent node, similar toCxtConv, we apply soft clustering convolution operation to

Page 4: Semi-Supervised Hierarchical Recurrent Graph Neural

capture the dependency between each latent node,

xs′

i = σ(∑j∈Ni

αsijWlxsj), (8)

where σ is non-linear activation function, and αsij is theproximity score between two latent nodes. Rather than in-troduce extra attention parameter as in CxtConv, we deriveproximity score between latent nodes based on adjacencyconstraint between parking lots,

αsij =

N∑m=1

N∑n=1

S>i,mamnSn,j . (9)

where amn equals one if parking lots pm and pn are con-nected. With learned latent node representation, we generatethe soft clustering representation for each parking lot as areverse process of latent node representation generation,

xsci =

K∑j=1

Si,jxs′

j . (10)

Temporal dependency modelingWe leverage the Gated Recurrent Unit (GRU) (Chunget al. 2014), a simple yet effective variant of recurrentneural network (RNN), to model the temporal depen-dency. Consider previous T step inputs of parking lot pi,(xt−T+1i ,xt−T+2

i , · · · ,xti), we denote the status of pi attime step t− 1 and t as ht−1i and hti, respectively. The tem-poral dependency between ht−1i and hti can be modeled by

hti = (1− zti) ◦ ht−1i + zti ◦ hti, (11)

where zti, hti are defined asrti = σ(Wr[h

t−1i ⊕ xti] + br)

zti = σ(Wz[ht−1i ⊕ xti] + bz)

hti = tanh(Wh[rti ◦ ht−1i ⊕ xti] + bh)

, (12)

where Wr, Wz , Wh, br, bz , bh are learnable parameters,⊕ is the concatenation operation, and ◦ denotes Hadamardproduct. Then the hidden state hti is directly used to predictPAs of next τ time steps,

(yt+1i , yt+2

i , . . . , yt+τi ) = σ(Wohti), (13)

where Wo ∈ R|hti|×τ .

Parking availability approximationThe real-time PA is a strong signal for future PA predic-tion. However, only a small portion (e.g., 6.12% in Beijing)of real-time PAs can be obtained through real-time sensors,which prevents us directly apply real-time PA as a part of in-put feature. To leverage the information hidden in partiallyobserved real-time PA, we approximate missing PAs fromboth spatial and temporal domain. The proposed methodconsists of three blocks, i.e., the spatial PropConv block, thetemporal GRU block, and the fusion block. Note that ratherthan approximate a scalar PA y, we learn the distribution ofPA, xp = P (y), for better information preservation. Givena PA y, we discretize its distribution to a p dimensional onehot vector y ∈ Rp. The objective of the PA approximationis to minimize the difference between y and xp.

Spatial based PA approximationSimilar to CxtConv, for each pi ∈ Pu, the PropConv opera-tion is defined as

xspi =∑j∈Ni

αijyj , (14)

where xspi is the obtained PA distribution, αij is the prox-imity score between pi and pj . Different from CxtConv, theestimated PA is only aggregated from nearby parking lotswith real-time PA, and we preserve the aggregated vectorrepresentation without extra activation function. The prox-imity score is computed through same attention mechanismin Equation (4), but with a relaxed connectivity constraint

eij =

{1, dist(vi, vj) ≤ max(ε, distknn(vi)), i 6= j

0, otherwise,

(15)where distknn(vi) denotes the road network distance be-tween parking lot pi and its k-th nearest parking lot pj ∈ Pl.The relaxed adjacency constraint improves node connec-tivity for more sufficient propagation of observed PA, andtherefore alleviates the data scarcity problem.

Temporal based PA approximationWe reuse the output of the GRU block to approximate real-time PA from the temporal domain. The difference betweencurrent PA approximation and future PA prediction is herewe employ a different Softmax function. Remember thatin previous step, we have obtained hidden state ht−1i fromGRU, we directly approximate distribution of PA at t by

xtp,ti = Softmax(Wtpht−1i ). (16)

This step doesn’t introduce extra computation for GRU, andthe Softmax layer normalizes xtp,ti sum equals one.

Approximated PA fusionRather than directly averaging xspi and xtpi , we proposean entropy-based mechanism to fuse two PA distributions.Specifically, we weigh more on the approximation less un-certainty (Hsieh, Lin, and Zheng 2015), i.e., the one withsmaller entropy. Given an estimated PA distribution xi, itsentropy is

H(xi) = −p∑j=1

xi(j) logxi(j), (17)

where xi(j) represents the j-th dimension of xi. We fusetwo PA distributions xspi and xtpi as follow:

xpi =exp(−H(xspi ))xspi + exp(−H(xtpi ))xtpi

Zi, (18)

where Zi = exp(−H(xspi )) + exp(−H(xtpi )).The approximated PA distribution xpi is applied for two

tasks. First, it is concatenated with the learned representationof the CxtConv and fed to the SCConv block for latent noderepresentation learning. Second, it is combined with the out-put of the CxtConv and SCConv, xti = xc,ti ⊕ xsc,ti ⊕ xp,ti .We use xti as the overall representation for each parking lotpi ∈ P at time step t, and feed it into the GRU module togenerate final PA prediction results.

Page 5: Semi-Supervised Hierarchical Recurrent Graph Neural

Model trainingSince only parking lots Pl are with observed labels, follow-ing the semi-supervised learning paradigm, SHARE aimsto minimize the mean square error (MSE) between the pre-dicted PA and the observed PA

O1 =1

τ |Pl|

|Pl|∑i=1

τ∑j=1

(yt+ji − yt+ji )2. (19)

Additionally, in PA approximation, we introduce extracross entropy (CE) loss to minimize the error between theobserved PA and approximated PA distributions (i.e., thespatial and temporal based PA distribution approximationxsp,ti and xtp,ti ) in current time step t,

O2 = − 1

|Pl|

|Pl|∑i=1

yti logxsp,ti , (20)

O3 = − 1

|Pl|

|Pl|∑i=1

yti logxtp,ti . (21)

By considering both MSE loss and CE loss, SHARE aimsto jointly minimize the following objective

O = O1 + β(O2 +O3), (22)

where β is the hyper-parameter controls the importance oftwo CE losses.

ExperimentsExperimental setupData description. We use two real-world datasets col-lected from BEIJING and SHENZHEN, two metropolises inChina. Both datasets are ranged from April 20, 2019, toMay 20, 2019. All PA records are crawled every 15 minutesfrom a publicly accessible app, in which all parking occu-pancy information are collected by real-time sensors. We as-sociate POI distribution (Liu et al. 2019b; Zhu et al. 2016) toeach parking lot and aggregate check-in records nearby eachparking lot in every 15 minutes as the population data. POIand check-in data are collected through Baidu Maps PlaceAPI and location SDK (Liu et al. 2019a). We chronologi-cally order the above data, take the first 60% as the trainingset, the following 20% for validation, and the rest as the testset. In each dataset, 70% parking lots are masked as unla-beled. The spatial distribution of parking lots in BEIJINGare shown in Figure 3. The statistics of the datasets are sum-marized in Table 1.

Implementation details. Our model and all seven base-lines are implemented with PaddlePaddle. Following previ-ous work (Li et al. 2018; Yu, Yin, and Zhu 2018), the PAis normalized before input and scaled back to absolute PAin output. We choose T = 12 and select τ = 3 for predic-tion. We set ε = 1Km and k = 10 to connect parking lots.The dimension of xc and xsc are fixed to 32, p is fixed to 50.The layer of CxtConv, SCConv, and PropConv are 2, 1, 1, re-spectively. We use dot-product attention in this paper. In SC-Conv, the number of latent nodes is set to K = 0.1N , where

������ � ���� �������� �� ������ � ���� �� �������� ��

Figure 3: Spatial distribution of parking lots in BEIJING.

Table 1: Statistics of datasets.Description BEIJING SHENZHEN

# of parking lots 1,965 1,360# of PA records 5,847,840 4,047,360

Averaged # of parking spots 210.24 185.36# of check-ins 9,436,362,579 3,680,063,509

# of POIs 669,058 250,275# of POI categories 197 188

N is the total number of parking lots. The activation func-tion in CxtConv and SCConv are LeakyReLU (α = 0.2),and Sigmoid in other layers. We employ the Adam optimizerfor training, fix the learning rate to 0.001 and set β to 0.5.For a fair comparison, all parameters of each baseline arecarefully tuned based on the recommended settings.

Evaluation metrics. We adopt Mean Average Er-ror (MAE) and Rooted Mean Square Error (RMSE), twowidely used metrics (Liang et al. 2018) for evaluation.

Baselines. We compare our full approach with the follow-ing seven baselines and two variants of SHARE:• LR uses logistic regression for parking availability pre-

diction. We concatenate previous T steps historical fea-tures as the input and predict each parking lot separately.

• GBRT is a variant of boosting tree for regression tasks. Itis widely used in practice and performs well in many datamining challenges. We use the version in XGboost (Chenand Guestrin 2016), and the input is the same as LR.

• GRU (Chung et al. 2014) predicts the PA of each parkinglot without considering spatial dependency. We train twoGRUs for Pl and Pu separately.

• Google-Parking (Arora et al. 2019) is the parking diffi-culty prediction model deployed on Google Maps. It usesa feed-forward deep neural network for prediction.

• Du-Parking (Rong et al. 2018) is the parking availabilityestimation model used on Baidu Maps. It fuses severalLSTMs to capture various temporal dependencies.

• STGCN (Yu, Yin, and Zhu 2018) is a state-of-the-artgraph neural network model for traffic forecasting. It

Page 6: Semi-Supervised Hierarchical Recurrent Graph Neural

Table 2: Parking availability prediction error given by MAE and RMSE on BEIJING and SHENZHEN.

Algorithm BEIJING (15/ 30/ 45 min) SHENZHEN (15/ 30/ 45 min)MAE RMSE MAE RMSE

LR 29.90 / 30.27 / 30.58 69.74 / 70.95 / 72.00 24.59 / 24.80 / 25.09 51.31 / 52.36 / 52.80GBRT 17.29 / 17.81 / 18.40 44.60 / 48.50 / 51.59 13.90 / 14.67 / 14.71 35.05 / 37.98 / 38.09GRU 18.51 / 18.78 / 19.73 55.43 / 55.92 / 58.64 16.73 / 16.88 / 17.14 46.92 / 47.26 / 47.56

Google-Parking 21.49 / 21.68 / 22.85 57.26 / 59.25 / 60.48 17.10 / 18.33 / 18.69 47.30 / 48.45 / 49.34Du-Parking 17.67 / 17.70 / 18.03 50.17 / 50.63 / 51.75 13.91 / 14.17 / 14.39 42.66 / 43.24 / 43.56

STGCN 16.57 / 16.44 / 17.10 50.79 / 51.04 / 52.61 13.46 / 13.59 / 13.88 39.26 / 39.96 / 40.29DCRNN 15.66 / 15.97 / 16.30 46.28 / 47.80 / 48.87 13.11 / 13.19 / 13.89 42.74 / 43.37 / 44.27

CxtGNN (ours) 15.29 / 15.69 / 16.15 45.55 / 46.69 / 47.78 12.39 / 12.73 / 13.09 36.31 / 36.92 / 37.46CAGNN (ours) 12.45 / 12.77 / 13.20 39.99 / 40.81 / 41.31 10.50 / 10.62 / 10.98 31.86 / 32.12 / 32.83SHARE (ours) 10.68 / 10.97 / 11.43 32.00 / 32.78 / 33.78 9.23 / 9.41 / 9.66 30.44 / 30.90 / 31.70

(a) Ratio of labeled parking lot (b) Ratio of latent node (c) Effect of T (d) Effect of τ

Figure 4: Parameter sensitivity on BEIJING.

models both spatial and temporal dependency with con-volution structure. The input graph is constructed as de-scribed in the original paper but keeps same graph con-nectivity with our CxtConv.

• DCRNN (Li et al. 2018) is another graph convolution net-work based model, which models spatial and temporal de-pendency by integrating graph convolution and GRU. Theinput graph is the same as STGCN.

• CxtGNN is a basic version of SHARE, without includingPA approximation and soft clustering graph convolution.

• CAGNN is another variant of SHARE but without in-cluding the soft clustering graph convolution block.

Overall performanceTable 2 reports the overall results of our methods andall the compared baselines on two datasets with respectto MAE and RMSE. As can be seen, our model to-gether with its variants outperform all other baselines usingboth metrics, which demonstrates the advance of SHARE.Specifically, SHARE achieves (31.8%, 31.3%, 29.9%) and(30.9%, 31.5%, 30.9%) improvements beyond the state-of-the-art approach (DCRNN) on MAE and RMSE on BEIJINGfor (15min, 30min, 45min) prediction, respectively. Simi-larity, the improvement of MAE and RMSE on SHENZHENare (29.6%, 28.7%, 30.5%) and (28.8%, 28.8%, 28.4%).Moreover, we observe significant improvement by compar-ing SHARE with its variants (i.e., CxtGNN and CAGNN).For example, by adding the PA approximation module,CAGNN achieves (18.6%, 18.6%, 18.3%) lower MAE and

(12.2%, 12.6%, 13.5%) lower RMSE than CxtGNN on BEI-JING, respectively. By further adding the SCConv block,SHARE achieves (14.2%, 14.1%, 13.4%) lower MAE and(20%, 19.7%, 18.2%) lower RMSE than CAGNN on BEI-JING. The improvement in SHENZHEN are consistent. Allabove results demonstrate effectiveness of the PA approxi-mation and the hierarchical graph convolution architecture.

Looking further in to the results, we observe allgraph convolution based models (i.e., STGCN, DCRNNand SHARE) outperform other deep learning based ap-proaches (i.e., Google-Parking and Du-parking), which con-sistently reveals the advantage of incorporating spatial de-pendency for parking availability prediction. Remarkably,GBRT outperforms Google-parking, GRU, LR, and achievesa similar result with Du-parking, which validates our ex-ception that GBRT is a simple but effective approach forregression tasks. One extra interesting finding is that bothMAE and RMSE of all methods on SHENZHEN is relativelysmaller than on BEIJING. This is possible because the spa-tial distribution of parking lots is more dense and evenly dis-tributed in SHENZHEN; therefore they are easier to predict.

Parameter sensitivityDue to space limitations, here we report the impact of theratio of labeled parking lot (i.e., |Pl|/N ), the proportion oflatent nodes in the soft clustering graph convolution withrespect to the total number of parking lot (i.e., K/N ), theinput time step T and the prediction time step τ using MAEon BEIJING. Each time we vary a parameter, set others totheir default values. The results on BEIJING using RMSE

Page 7: Semi-Supervised Hierarchical Recurrent Graph Neural

(a) MAE (b) # of parking spot

Figure 5: Robustness study on BEIJING.

and on SHENZHEN using both metrics are similar.First, we vary the ratio of the labeled parking lot from 0.1

to 0.9. The results are reported in Figure 4(a). The resultsare unsurprising: equipping more real-time sensors in park-ing lots enables us to more accurately predict PA. However,equipping more sensors lead to extra economic cost and maybe constrained by policies of each parking lot. Finding themost cost-effective ratio and exploring optimal sensor distri-bution are important problems in the future study.

Then, we vary the ratio of the latent nodes from 0.01 to0.8. For example, there are 1, 965 parking lots on BEIJING,0.01 corresponds to 20 latent nodes. The results are reportedin Figure 4(b). As can be seen, there is a performance im-provement by increasing the ratio of latent node form 0.01to 0.1, but a performance degradation by further increasingthe ratio of the latent node from 0.1 to 0.8. The reason is thatheavily reduce the number of latent nodes reduces the dis-criminative power of learned latent representation, whereastoo many latent nodes reduces the regularization power oflearned latent representation.

To test the impact of input length, we vary T from 3 to18. The results are reported in Figure 4(c). SHARE achievesleast errors when T = 12. One possible reason is that an ex-cessively short-term input can not provide sufficient tempo-ral correlated information, whereas too long input introducesmore noises for temporal dependency modeling.

Finally, to test the impact of prediction step, we vary τfrom 1 to 6. The results are reported in Figure 4(d). We sep-arate the result of labeled and unlabeled parking lots sepa-rately. Overall, labeled parking lots are much easier to pre-dict. Besides, by increase τ , the error of all parking lots in-creases consistently. However, we can observe the error oflabeled parking lots are increasing faster, this makes sensebecause the temporal dependency between observed PA andfuture PA becomes lower when τ goes large.

Effectiveness on different regionsTo evaluate the performance of SHARE on different re-gions, we partition BEIJING into a set of disjoint gridbased on longitude and latitude, and test the performanceof SHARE on each region. Figure 5(a) and Figure 5(b)plot the averaged MAE of SHARE and averaged number ofparking spot in each region on BEIJING, respectively. Over-all, the MAE in each region is even except for several out-liers. We find the performance of SHARE is highly corre-lated with the averaged number of parking spots in each re-

gion. For example, the MAE on region (116.46, 39.91) and(116.46, 39.95) are 31.6 and 31.5, which are greater than theoverall MAE 10.68. Meanwhile, the averaged parking spotof these two regions are 601 and 367, significantly greaterthan overall averaged parking spot 210.24. This is possiblebecause for the same ratio of parking availability fluctuate,parking lot with a larger number of parking spot will havelarger MAE. This result indicates in the future further op-timization can be applied to these large parking lots to im-prove the overall performance.

Related WorkParking availability prediction. Previous studies on park-ing availability prediction mainly fall in two categories, con-textual data based prediction and real-time sensor basedprediction. For contextual data based prediction, Google-parking (Arora et al. 2019) and Du-parking (Rong et al.2018) predict parking availability based on indirect sig-nals (e.g., user feedbacks and contextual factors), which mayinduce an inaccurate prediction result. For real-time sensorbased prediction, study in (Rajabioun and Ioannou 2015)proposes an auto-regressive model and study in (Fusek et al.2013) proposes a boosting method for parking availabilityinference. Above approaches are limited by economic andprivacy concerns and are hard to be scaled to all parking lotsin a city. Moreover, all the above approaches don’t fully ex-ploit non-Euclidean spatial autocorrelations between park-ing lots, which limits their prediction performance.Graph neural network. Graph neural network (GNN) ex-tends the well-known convolution neural network to non-Euclidean graph structures, where the representation of eachnode is derived by first aggregating and then transformingrepresentations of its neighbors (Velickovic et al. 2018). Itis worth to point out that the idea of our soft clusteringgraph convolution is partially inspired by (Ying et al. 2018),but our objective is to capture global spatial correlation fornode-level prediction. Due to its effectiveness, GNN hasbeen successfully applied to several spatiotemporal forecast-ing tasks, such as traffic flow forecasting (Li et al. 2018; Guoet al. 2019) and taxi demand forecasting (Geng et al. 2019;Wang et al. 2019). However, we argue these approacheseither overlook contextual factors or global spatial depen-dency and are not tailored for parking availability prediction.

ConclusionIn this paper, we present SHARE, a city-wide parking avail-ability prediction framework based on both environmen-tal contextual data and partially observed real-time park-ing availability data. We first propose a hierarchical graphconvolution module to capture both local and global spa-tial correlations. Then, we adopt a simple yet effective GRUmodule to capture dynamic temporal autocorrelations ofeach parking lot. Besides, a parking availability approxima-tion module is proposed for parking lots without real-timeparking availability information. Extensive experimental re-sults on two real-world datasets show that the performanceof SHARE for parking availability prediction significantlyoutperforms seven state-of-the-art baselines.

Page 8: Semi-Supervised Hierarchical Recurrent Graph Neural

AcknowledgementThis research is supported in part by grants from theNational Natural Science Foundation of China (GrantNo.71531001).

ReferencesArora, N.; Cook, J.; Kumar, R.; Kuznetsov, I.; Li, Y.; Liang, H.-J.;Miller, A.; Tomkins, A.; Tsogsuren, I.; and Wang, Y. 2019. Hardto park?: Estimating parking difficulty at scale. In Proceedings ofthe 25th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, 2296–2304.Chen, T., and Guestrin, C. 2016. Xgboost: A scalable tree boostingsystem. In Proceedings of the 22nd ACM SIGKDD internationalconference on knowledge discovery and data mining, 785–794.Chung, J.; Gulcehre, C.; Cho, K.; and Bengio, Y. 2014. Empiricalevaluation of gated recurrent neural networks on sequence model-ing. arXiv preprint arXiv:1412.3555.Fusek, R.; Mozdren, K.; Surkala, M.; and Sojka, E. 2013. Adaboostfor parking lot occupation detection. In Proceedings of the 8th In-ternational Conference on Computer Recognition Systems CORES2013, 681–690.Geng, X.; Li, Y.; Wang, L.; Zhang, L.; Yang, Q.; Ye, J.; and Liu, Y.2019. Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting. In Proceedings of the Thirty-ThirdAAAI Conference on Artificial Intelligence, 3656–3663.Guo, S.; Lin, Y.; Feng, N.; Song, C.; and Wan, H. 2019. Atten-tion based spatial-temporal graph convolutional networks for trafficflow forecasting. In Proceedings of the Thirty-Third AAAI Confer-ence on Artificial Intelligence, 922–929.Hsieh, H.-P.; Lin, S.-D.; and Zheng, Y. 2015. Inferring air qualityfor station location recommendation based on urban big data. InProceedings of the 21th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining, 437–446.Kipf, T. N., and Welling, M. 2017. Semi-supervised classificationwith graph convolutional networks. In 5th International Confer-ence on Learning Representations, ICLR 2017.Li, Y.; Yu, R.; Shahabi, C.; and Liu, Y. 2018. Diffusion convolu-tional recurrent neural network: Data-driven traffic forecasting. In6th International Conference on Learning Representations, ICLR2018.Li, Q.; Han, Z.; and Wu, X.-M. 2018. Deeper insights into graphconvolutional networks for semi-supervised learning. In Proceed-ings of the Thirty-Second AAAI Conference on Artificial Intelli-gence, 3538–3545.Liang, Y.; Ke, S.; Zhang, J.; Yi, X.; and Zheng, Y. 2018. Geoman:Multi-level attention networks for geo-sensory time series predic-tion. In Proceedings of the 27th International Joint Conference onArtificial Intelligence, IJCAI 2018, 3428–3434.Liu, Y.; Liu, C.; Lu, X.; Teng, M.; Zhu, H.; and Xiong, H. 2017.Point-of-interest demand modeling with human mobility patterns.In Proceedings of the 23rd ACM SIGKDD International Confer-ence on Knowledge Discovery and Data Mining, 947–955. ACM.Liu, H.; Li, T.; Hu, R.; Fu, Y.; Gu, J.; and Xiong, H. 2019a. Jointrepresentation learning for multi-modal transportation recommen-dation. In Proceedings of the Thirty-Third AAAI Conference onArtificial Intelligence, 1036–1043.Liu, H.; Tong, Y.; Zhang, P.; Lu, X.; Duan, J.; and Xiong, H. 2019b.Hydra: A personalized and context-aware multi-modal transporta-tion recommendation system. In Proceedings of the 25th ACMSIGKDD International Conference on Knowledge Discovery andData Mining, 2314–2324.

Ma, S.; Hu, R.; Wang, L.; Lin, X.; and Huai, J.-P. 2019. An efficientapproach to finding dense temporal subgraphs. IEEE Transactionson Knowledge and Data Engineering.Mathur, S.; Jin, T.; Kasturirangan, N.; Chandrasekaran, J.; Xue,W.; Gruteser, M.; and Trappe, W. 2010. Parknet: drive-by sensingof road-side parking statistics. In Proceedings of the 8th interna-tional conference on Mobile systems, applications, and services,123–136.Rajabioun, T., and Ioannou, P. A. 2015. On-street and off-streetparking availability prediction using multivariate spatiotemporalmodels. IEEE Transactions on Intelligent Transportation Systems16(5):2913–2924.Rong, Y.; Xu, Z.; Yan, R.; and Ma, X. 2018. Du-parking: Spatio-temporal big data tells you realtime parking availability. In Pro-ceedings of the 24th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, 646–654.Shoup, D. C. 2006. Cruising for parking. Transport Policy13(6):479–486.Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.;Gomez, A. N.; Kaiser, L.; and Polosukhin, I. 2017. Attention isall you need. In Advances in Neural Information Processing Sys-tems, 6000–6010.Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; andBengio, Y. 2018. Graph attention networks. In 6th InternationalConference on Learning Representations, ICLR 2018.Wang, P.; Fu, Y.; Liu, G.; Hu, W.; and Aggarwal, C. 2017. Humanmobility synchronization and trip purpose detection with mixtureof hawkes processes. In Proceedings of the 23rd ACM SIGKDD In-ternational Conference on Knowledge Discovery and Data Mining,495–503.Wang, Y.; Yin, H.; Chen, H.; Wo, T.; Xu, J.; and Zheng, K. 2019.Origin-destination matrix prediction via graph convolution: a newperspective of passenger demand modeling. In Proceedings of the25th ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining, 1227–1235.Yao, H.; Tang, X.; Wei, H.; Zheng, G.; and Li, Z. 2019. Revisitingspatial-temporal similarity: A deep learning framework for trafficprediction. In Proceedings of the Thirty-Third AAAI Conference onArtificial Intelligence.Ying, Z.; You, J.; Morris, C.; Ren, X.; Hamilton, W.; and Leskovec,J. 2018. Hierarchical graph representation learning with differ-entiable pooling. In Advances in Neural Information ProcessingSystems, 4800–4810.Yu, B.; Yin, H.; and Zhu, Z. 2018. Spatio-temporal graph convolu-tional networks: A deep learning framework for traffic forecasting.In Proceedings of the 27th International Joint Conference on Arti-ficial Intelligence, IJCAI 2018.Zhang, J.; Wang, F.-Y.; Wang, K.; Lin, W.-H.; Xu, X.; and Chen,C. 2011. Data-driven intelligent transportation systems: A sur-vey. IEEE Transactions on Intelligent Transportation Systems12(4):1624–1639.Zhou, J., and Tung, A. K. 2015. Smiler: A semi-lazy time se-ries prediction system for sensors. In Proceedings of the 2015ACM SIGMOD International Conference on Management of Data,1871–1886.Zhu, H.; Xiong, H.; Tang, F.; Liu, Q.; Ge, Y.; Chen, E.; and Fu, Y.2016. Days on market: Measuring liquidity in real estate markets.In Proceedings of the 22nd ACM SIGKDD International Confer-ence on Knowledge Discovery and Data Mining, 393–402. ACM.