automatic extraction of road networks from gps traces · gps traces preprocessing, point cloud...

47
1 Automatic Extraction of Road Networks from GPS Traces A road map inference method that generates road network from GPS traces through point segmentation and grouping. Jia Qiu, Department of Geomatics Engineering, University of Calgary, 2500 University Drive NW, Calgary, Alberta, Canada T2N 1N4 [email protected] Ruisheng Wang, Department of Geomatics Engineering, University of Calgary, 2500 University Drive NW, Calgary, Alberta, Canada T2N 1N4 [email protected]

Upload: others

Post on 16-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

1

Automatic Extraction of Road Networks from GPS Traces

A road map inference method that generates road network from GPS traces through point

segmentation and grouping.

Jia Qiu, Department of Geomatics Engineering, University of Calgary, 2500 University Drive

NW, Calgary, Alberta, Canada T2N 1N4 [email protected]

Ruisheng Wang, Department of Geomatics Engineering, University of Calgary, 2500 University

Drive NW, Calgary, Alberta, Canada T2N 1N4 [email protected]

Page 2: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

2

Abstract: We propose a point segmentation and grouping method to generate road maps

from GPS traces. First, we present a progressive point cloud segmentation algorithm based on

Total Least Squares (TLS) line fitting. Second, we group topologically connected point clusters

by the point’s orientation and cluster’s spatial proximity, where the topological relationship is

generated using Hidden Markov Model (HMM) map matching. Finally, we refine the

intersections of roads so that their geometrical and topological relationships are consistent with

each other. Experimental results show that our algorithm is robust to noises and the generated

road network has a high accuracy in terms of geometry and topology. Compare to the

representative algorithms; the results of our new algorithm have a higher F-measure score for

different matching thresholds.

Keywords: Map inference; point cloud segmentation; point cluster grouping; HMM map

matching.

Page 3: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

3

1 Introduction

A road network is one of the most fundamental data of geospatial information. It is,

however, challenging to promptly and consistently update the existing road maps (Wang et al.,

2013). Conventional map updating techniques mainly consist of ground-based and aerial

surveys (Guo et al., 2007). Ground-based surveys are expensive and labor intensive. They are

limited to a long cycle of information acquisition. The aerial surveys have the capability to

obtain large-scale data about the Earth’s surface in a short time. Though several algorithms have

been proposed to extract road network from high-resolution satellite imagery (Doucette et al.,

2004, Youn et al., 2008, Kasemsuppakorn et al., 2013 ), it is still difficult to generate large-scale

road networks with these algorithms, due to the complexity of road networks and the occlusions

in satellite images caused by trees and buildings.

To overcome the shortcomings of traditional road map mapping, nowadays, user generated

content (UGC) (Krumm et al., 2008) is being increasingly used in road map generation. The

UGC is achieved in two ways: (1) collaborative mapping programs, such as OpenStreetMap,

which are highly dependent on manual work; and (2) automatic road generation from Global

Positioning System (GPS) trajectories (Li et al., 2014), which is referred to as map inference.

Being an efficient method for road network updates, map inference focus on the extraction of

road’s geometric positions, topological connections as well as some attribute information such

as lane counts (Schrödl et al., 2004) and road names (Li et al., 2014). The GPS traces are

collected by GPS-equipped vehicles, bicycles or pedestrians and therefore, better captures the

Page 4: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

4

geometry and topology of the latest road networks. Most of the existing map inference methods

are designed to deal with low-noise, densely sampled and uniformly distributed GPS traces

(Guo et al., 2007, Wang et al., 2013). In this research, we propose a novel algorithm to infer

high-quality road maps from low-cost and high-noise GPS traces. The primary contributions of

our method are summarized in the following:

A robust progressive 2D point cloud segmentation algorithm using multi-density

thresholds to separate GPS points that visit different roads.

A topological relationship construction algorithm based on Hidden Markov Model

(HMM) based map matching.

An –quantile distance to measure spatial proximity between two point clusters for

grouping.

The remainder of the paper is organized as follows: Section 2 presents a review of the

related work. Section 3 outlines the procedure of our map inference algorithm and Section 4

describes the experiments of two GPS trace datasets. Finally, Section 5 concludes the paper,

outlining possible future work on this topic.

2 Related Work

Several approaches have been proposed to automatically infer road maps from GPS traces.

These approaches mainly include: clustering based methods, trace merging based methods,

Kernel Density Estimation (KDE) based methods, and intersection linking based methods

(Biagioni and Eriksson, 2012a, Ahmed et al. 2014).

Page 5: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

5

The clustering based approaches begin with the determination of a series of cluster seeds

by location, bearing and heading of the traces. Seeds are then linked to form the road segments

according to the raw traces. Edelkamp and Schrödl (2003) introduced an approach to infer road

maps with high geometric accuracy, containing detailed lane information from fine-grained GPS

traces. Based on this work, Schrödl et al. (2004) described a process to refine the intersections

according to transition relationships between traces and segments. Agamennoni et al. (2010,

2011) regarded road map inference as finding the principal curve of the road segment by

determining and linking the cluster seeds. Their method, however, is not robust to noises

(Biagioni and Eriksson, 2012b) and requires densely sampled GPS traces.

Trace merging based approaches involve the consolidation of trace segments based on their

geometric relations and shape similarity. The centerlines of roads are then generated by the

merged traces. Cao and Krumm (2009) proposed an aggregation technique to pull together

traces that visit the same road. Even though their method performs well on densely sampled

GPS traces, it is found to be non-robust towards noisy data. Based on the research of Cao and

Krumm (2009), Wang et al. (2014) presented a novel approach for generating routable road

maps from GPS traces. They introduced circular boundaries to isolate points near intersections

and used a k-means clustering method to refine the intersections. This method also aimed at

dealing with densely sampled GPS traces. Niehoefer et al. (2009) described a map generation

method by Intra Journey Merge Routine (IJMR). The core of IJMR method is the merging of

trace segments by the distances between them. Chen et al. (2010) developed an algorithm for

Page 6: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

6

road network reconstruction by extracting and linking shared structure in planar paths (GPS

traces). Their method theoretically guarantees geometric accuracy based on the assumptions that

the traces are densely sampled and located on the road surface. However, due to noise, the traces

may not meet the second assumption. Wang et al. (2013) applied a map inference method to

update road maps. They matched the GPS traces with the existing road map and aggregated the

unmatched traces. A polygonal principal curve algorithm (Kégl et al., 2000) is employed

thereafter to infer the road centerlines and subsequently, updating the road map. The

performance of trace merging approach, a type of line-based method (Liu et al., 2012), is highly

dependent on the sampling rates. If two consecutive points of traces are far from each other,

especially when the vehicle turns at an intersection, the segments of GPS traces cannot align

with the roads. In this case, spurious road segments are produced.

The KDE based methods treat the map inference as a digital image processing procedure.

Raw GPS traces are rasterized to a 2D gray image and a global threshold is then used to filter

the image. Finally, skeletonization approaches, such as the Voronoi diagram and morphological

operations, are applied to extract centerlines of the road segments. Examples include the method

proposed by Davies et al. (2006), Chen and Cheng (2008), and Shi et al. (2009). This type of

method is sensitive to the density distribution and it is difficult to select an appropriate global

threshold. To overcome the limitations of the KDE, Biagioni and Eriksson (2012b) proposed a

progressive KDE based method. They developed a hybrid map inference pipeline, consisting of

initial map generation by gray-scale skeletonization, map pruning with a Viterbi map matching

Page 7: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

7

algorithm and road map topology and geometry refinement. Their method is robust to noise and

disparity; however, since it is a line-based approach, sparsely sampled traces cannot be handled.

Intersection linking approaches first detect the intersections of the road map and then link

the intersections according to the raw traces. Fathi and Krumm (2010) developed a classifier to

find the intersections from GPS traces. After finding the intersections, they used the raw traces

to prune and connect the intersections. Their method works well on road networks with simple

intersections and densely sampled traces. Karagiorgou and Pfoser (2012) proposed a similar

algorithm consisting of the identification and connection of intersections and the network graph

reduction. They use a speed threshold and changes in direction as indicators to identify the

intersections. The traces around the intersections are then clustered in terms of the distance to

determine the turns of the intersections. Finally, the links (road segments) are generated and

compacted along the paths of the traces. These methods are also sensitive to the sampling rates

of the GPS traces.

Due to the accumulation of noisy GPS points in an area, spurious road segments will be

produced by clustering based methods and KDE based methods. The performance of trace

merging based methods and intersection linking based methods highly depends on the accuracy

and sampling rate of GPS traces. In most of the cases, however, the accuracy of GPS receiver

cannot be guaranteed due to the urban canyon. Recently, Qiu et al. (2014) proposed a method to

segment sparsely sampled GPS traces into point clusters for generation of road centerlines.

Based on this work, we propose a new map inference algorithm for generating road network

Page 8: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

8

with high geometric accuracy from noisy GPS traces.

3 Method

The general procedure of our method is shown in Figure 1 . It consists of four main steps:

GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and

road network geometry refinement.

Figure 1 The general procedure of our map inference method.

3.1 GPS Trace Pre-Processing

The dataset of GPS traces Tr are collected by GPS-equipped vehicles. A GPS trace tr is

collected by a vehicle in a time span. It consists of a sequence of GPS points, denoted by tr={pi

|i=1, 2,…, m} where each point pi=(loni, lati, ti) contains the information of the vehicle’s positon

at a specific time. The coordinates loni[-180, 180] and lati[-90, 90] are the longitude and

Page 9: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

9

latitude of the point and tiR+ is the time at which the point is collected. The number of the

points in tr is m. Having a large amount of GPS traces collected by vehicles over space and time,

we can generate road network by the accumulated traces. In the experiment, the longitude and

latitude of all the points are transformed to a projected coordinate system, denoted by (x, y). We

associate an orientation for each point in the raw GPS traces. The orientation is computed by the

vector that is pointed from the point to its next point. If the point is the last point of a trace, set

its orientation as the value of its previous one. The range of the orientation is in [0°, 360°). It

starts from x-axis and advances in the counterclockwise direction.

Before segmenting GPS points, all traces are preprocessed as follows. For a trace tr={pi |

i=1,…, m},

1) if the distance between two consecutive point d(pg, pg+1)<dmin or d(pg, pg+1)>dmax, we

separate the trace tr into two traces as tr={pi |i=1,…, g}, and tr’={pi| i=g+1,…, m}, where

1≤g<m.

2) if the difference of two points’ (pg and pg+1) orientations is larger than a threshold

dDoOmax, we separate the trace tr into two traces as in 1).

The value of dmin is set to 3 meters, which approximates to the accuracy of standard GPS

devices. After computing all the distances between two consecutive points for all GPS traces as

D, we estimate dmax as dmax= d +3×, where d is the mean of the values in D and is estimated

by Median Absolute Deviation (MAD) as Equation (1).

1.4826 median | median ( ) |i i j jX X (1)

Page 10: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

10

where X is a set representing D. The reasons for dividing traces by distance are: 1) if the

distance between two consecutive points is smaller than the GPS error, the two points might be

collected in the same position; 2) if two points are far away from each other, the line segment

between them cannot properly align with the real road. We set the value of dDoOmax to 90°. A

sharp change in the orientation of two points indicates a sudden change in the direction of the

road, however, this conclusion violates the smoothness assumption of the roads.

3.2 Progressive Segmentation of Points

Given a set of GPS traces, our algorithm generates a set of clusters C={ci | i=1, 2,…, N},

where each cluster represents a smooth road. A cluster ci={pi,r | r=1, 2,…, ni} consists of a set of

GPS points, where ni is the point number of cluster ci. The distribution of GPS traces and points

varies over space. Some areas have more points than others. Thus, we employ a progressive

strategy to segment the points. The process is controlled by a density parameter , which

represents the number of points in a given area. The general procedure is described in Algorithm

1. The inputs are the set of GPS points P, a radius and the difference of point orientations

for neighbours searching, and a list of densities list sorted in descending order. The output is a

set of point clusters C. First, we segment the points using the first in list. Second, we apply

robust Locally Weighted Scatter plot Smooth (Lowess) (Cleveland, 1979) algorithm to generate

centerlines L for the point clusters C. Third, we match all raw GPS traces with resampled

centerlines L’ to filter GPS points. If an unclassified raw point is matched with a point of a

resampled centerline, we filter out the point as outlier. Finally, we repeat the above three steps

Page 11: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

11

by the next until all in list are visited. In this way, the points on the roads frequently visited

by vehicles are extracted first.

Algorithm 1: Progressive Points Segmentation

1

2

3

4

5

6

7

8

9

10

Input: P, , , list

Output: C

initialize nOutlier = 100000000, C = ∅, ∀𝑝 ∈ 𝑃 set p.cId = -1;

for 𝛿 ∈ 𝛿𝑙𝑖𝑠𝑡 do

compute C = C ∪ PointsSegmentation{P, , , list};

generate centerlines L = {li | i = 1, 2, …, N} by C using robust Lowess;

matching raw GPS traces Tr with resampled centerlines L’={ l’i | i = 1, 2, …, N };

for tr ∈ Tr do

obtain tr = {pi | i = 1, 2, …, m};

for pi ∈ tr do

if pi is matched ⋀ pi.cId == -1 then

set pi.cId = nOutlier;

3.2.1 Points Segmentation

We develop an efficient segmentation algorithm to partition GPS points into clusters. First,

we initialize a cluster ck by a point and its neighbours. Then, we fit the cluster by a line lk, which

is computed using Total Least Squares (TLS) fitting. Finally, we grow the cluster ck by adding

the neighbours of points in ck under the line lk’s constraint. Algorithm 2 outlines the general

Page 12: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

12

procedure of our GPS points segmentation. We associate an ,-neighborhood (Definition 1) for

each point and an -neighborhood (Definition 2) for each line generated by TLS.

Definition 1: The ,-neighbourhood of a point p, denoted by point set N(p,,), is defined

by N(p,,)={qP | d(p,q)≤ DoO(p,q)≤}, where P is the set of points, d(p,q) is the

Euclidean distance of p and q, and DoO(p,q) is the difference of points’ orientation between p

and q.

Definition 2: The -neighbourhood of a line l, denoted by point set N(l,), is defined by

N(l,)={pP | d(p,l)≤}, where P is the set of points, d(p,l) is the perpendicular distance from

point p to line l.

Algorithm 2: Points Segmentation

1

2

3

4

5

6

7

8

Input: P, , ,

Output: C

initialize cluster id k = 1;

for pi ∈ P do

if pi.cId == -1 ⋀ pi is a segment point then

Initialize ck = pi ∪ N(pi, , ,), nSize = sizeof(ck), 𝛿𝑘=1000;

if nSize >= 2 then

while 𝛿𝑘< do

ck = UpdateCluster {P, ck, , , };

𝛿𝑘 = sizeof(ck) – nSize, nSize = sizeof(ck);

Page 13: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

13

9

10

11

12

13

14

15

16

17

if sizeof(ck) < then

empty ck and continue;

construct a new coordinate system CS by ck using PCA;

update 𝛿𝑘 = 1000, nSize = sizeof(ck);

while 𝛿𝑘 > do

ck = ExpandCluster {P, ck, , , , CS};

𝛿𝑘 = sizeof(ck) – nSize, nSize = sizeof(ck);

∀pj ∈ ck, set pj.cId = k, push ck into C.

k = k + 1;

In algorithm 2, after initializing the point cluster ck by a point pi and N(pi,,), we update ck

by Algorithm 3 ‘UpdateCluster.’ The output of algorithm 3 is an updated point cluster c’k. First,

a base orientation dOri and a line lk are computed, where dOri is the median value of

orientations of points in ck and lk is generated by ck using TLS fitting. If a point pt meets three

conditions: 1) pt is in N(lk,), 2) the DoO of pt’s orientation and dOri is less than and 3) the

previous point pt(pre) or next point pt(next) of pt is also in N(lk,), then we add pt to the new

cluster c’k. These conditions aim at finding points to form a line segment. The third condition is

to filter GPS outliers. If a point is located in the buffer region of a centerline, and its previous

and next points do not present in the buffer region, then the point has a higher probability to be

an outlier. Second, we apply these conditions to all points pt in ck to initialize a queue Q. If pt

meets these conditions, then push pt into Q and c’k. Third, we pop point from Q as pt and find

Page 14: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

14

points N(pt,,) joined with pt’s previous and next point as Ntotal. If the point in Ntotal meets the

conditions, we push the point into Q and c’k. Finally, the procedure is terminated when Q is

empty. A flag ‘visited’ is used for points in the process to avoid pushing points into Q or c’k

twice. The previous and the next points are obtained by the raw GPS traces. Due to the

disparities in GPS traces distribution, the ,-neighbourhood of some points is empty. Thus, we

consider the previous and next point to expand the point cluster. We repeat the ‘UpdateCluster’

algorithm until the number of cluster points is stable, which means that less than points are

added to cluster ck after the cluster expansion.

Algorithm 3: Update Cluster

1

2

3

4

5

6

7

8

9

Input: P, ck, ,

Output: c’k

initialize dOri as median orientation of points in ck;

fit cluster ck by line lk using Total Least Square (TLS) algorithm;

initialize an empty queue Q;

for pt ∈ ck do

get orientation dOrit of pt;

if pt ∈ N(lk,) ∧ DoO(dOrit, dOri) < then

if pt(pre) ∈ N(lk,) ∨ pt(next) ∈ N(lk,) then

push pt into Q, add pt into c’k and set pt as ‘visited’;

while Q ≠ ∅ do

Page 15: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

15

10

11

12

13

14

15

16

pop point from Q as pt;

find previous and next point of pt joined with N(pt, , ) as Ntotal;

for pt ∈ Ntotal do

get orientation dOrit of pt;

if pt is not ‘visited’ ∧ pt ∈ N(lk,) ∧ DoO(dOrit, dOri) < then

if pt(pre) ∈ N(lk,) ∨ pt(next) ∈ N(lk,) then

push pt into Q, add pt into c’k and set pt as ‘visited’;

In Algorithm 2, after generating the cluster ck (Algorithm 3), we expand it (Algorithm 4)

using the points present in the head and tail regions of ck. The ‘ExpandCluster’ algorithm is

designed to generate curves or winding roads rather than straight roads. The output is an

expanded cluster c’k. Consider an illustration in Figure 2, (a) shows the transformed coordinate

system CS and the point cluster ci generated by ‘UpdateCluster.’ The CS is computed by

Principle Component Analysis (PCA), where the abscissa axis of CS is the eigenvector of the

first component and the vertical axis is the eigenvector of the second component. After sorting

the points in CS, we compute x0=min{pi,r.x | r=1, 2,…,ni}, x3=max{pi,r.x | r=1, 2,…,ni}, and

x1=x0+6×, x2=x3-6×. We then set the head region and the tail region as R(H)=[x0, x1], and

R(T)=[x2, x3], respectively. The points in head region P(H) and points in tail region P(T) are

obtained by R(H) and R(T). This setting guarantees that P(H) and P(T) will form road segments

that their orientations are consistent with the abscissa of CS. Subsequently, we apply

‘UpdateCluster’ algorithm to points P(H) and P(T) separately to generate expanded clusters

Page 16: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

16

P’(H) and P’(T), as shown in Figure 2 (c). The head expands to the region with x value less than

x1 and tail will expand to the region with x value greater than x2. Points of P’(H) and P’(T) with

x values in R(H) and R(T) also require the distances to the nearest point of P(H) or P(T) less

than . The ‘ExpandCluster’ algorithm will terminate when less than points are added after

expanding the cluster.

Figure 2 An illustration of expanding cluster by the head and tail regions, (a) the transformed

coordinate system CS; (b) the head and tail region of the point cluster; and (c) the new cluster

generated by expanding the tail region.

Algorithm 4: Expand Cluster

1

2

3

4

Input: P, ck, , , , CS

Output: c’k

sort points of ck in ascending order by their x values in CS;

define head and tail ranges of ck as R(H) = [x0, x1] and R(T) = [x2, x3];

find points P(H) and P(T) in the head and tail regions of ck by their x values in CS;

initialize 𝛿𝑘 = 1000, P(H)temp=P(H), P(T)temp=P(T), nSize = sizeof(P(H)) + sizeof(P(T));

Page 17: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

17

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

while 𝛿𝑘 > 𝛿 do

𝑃′(𝐻) = UpdateCluster {P, P(H)temp, , , };

𝑃′(𝑇) = UpdateCluster {P, P(T)temp, , , };

generate 𝑃′(𝑈) = 𝑃′(𝐻) ∪ 𝑃′(𝑇);

transform all points in 𝑃′(𝑈) to the coordinate system CS;

empty P(H)temp, P(T)temp;

for 𝑝𝑢′ ∈ 𝑃′(𝑈) do

if 𝑝𝑢′ . 𝑥 < 𝑥0 then

add 𝑝𝑢′ to P(H)temp;

if 𝑥0 ≤ 𝑝𝑢′ . 𝑥 ∧ 𝑝𝑢

′ . 𝑥 < 𝑥1 then

dmin = min{𝑑(𝑝𝑢′ , 𝑝ℎ)}, ∀𝑝ℎ ∈ 𝑃(𝐻);

if dmin < then

add 𝑝𝑢′ to P(H)temp;

if 𝑥2 ≤ 𝑝𝑢′ . 𝑥 ∧ 𝑝𝑢

′ . 𝑥 < 𝑥3 then

dmin = min{𝑑(𝑝𝑢′ , 𝑝𝑡)}, ∀𝑝𝑡 ∈ 𝑃(𝑇);

if dmin < then

add 𝑝𝑢′ to P(T)temp;

if 𝑝𝑢′ . 𝑥 > 𝑥3 then

add 𝑝𝑢′ to P(T)temp;

𝛿𝑘 = sizeof(P(H)temp) + sizeof(P(T)temp) – nSize,

Page 18: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

18

25

26

nSize = sizeof(P(H)temp) + sizeof(P(T)temp);

transform all points in 𝑃′(𝑈) to the original coordinate system;

update 𝑐′𝑘 = P(H)temp ∪ P(T)temp

All the clusters are initialized to a point and points in its ,-neighbourhood. Different

initial points generate different clusters. We classify all GPS points into two categories, the

‘intersection points’ and ‘segment points.’ The intersection points are points located around two

ends of the road segments. The segment points are points located in the middle area of the road

segments. In our experiment, we notice that using segment point to initialize the point clusters

will generate better results in terms of geometric accuracy of the road map. The reason is that

the orientation of a segment point is more accurate than that of an intersection point. Based on

the Difference of Normals (DoN), which is proposed by Ioannou et al. (2012), we use a

multi-scale operator called Difference of Orientations (DoO) to classify GPS points. We

estimate the orientation dOri(pi,) of a point pi by applying PCA to the union of pi and N(pi,,).

The orientation is the direction of the first component’s eigenvector. We compute two

orientations dOri(pi,r1), dOri(pi,r2) using the support radii, r1 and r2. Then the difference of

orientation dDoOi= |dOri(pi,r1) - dOri(pi,r2)|, module=180°, is computed, where r1=, r2=1.6×r1.

Plate 1 shows the distribution of the DoO of our GPS data. DoO ranges from 0° to 90°. After

obtaining difference of orientations, DoO = {dDoOi | i=1, 2, …, n} for all GPS points, we use

MAD formula given in Equation (1) to estimate a DoO, which in turn describes the distribution

of DoO. Then, we classify point pi as segment point if dDoOi≤3×DoO and the others as

Page 19: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

19

intersection points.

Plate 1 The DoO distribution of road network

Our GPS point segmentation method needs three parameters: , for neighbourhood

generation and for initializing clusters and terminating clusters growth. We fix the search

radius to 15 meters. This value is estimated based on the lane width and the lane number of the

real urban roads. The lane width in cities is about 3.75~4.0 meters. We assume that the roads

visited by vehicles contain at least 4 lanes. We set the difference of orientations to to 30°. The

value has a high tolerance to the noise of GPS points and to the inaccurate orientations of points.

The parameter represents the density of points and we will use this parameter to implement

the progressive segmentation algorithm.

3.2.2 Centerline Generation

After obtaining the segmentation result C, we generate centerlines to represent the real

roads. First, we construct a new coordinate system CS by all the points in ci using PCA

transformation. Second, we connect all the points in ci by their x values in CS in ascending order

to form the centerline li. Finally, we apply the robust Lowess method to generate centerlines

from the point clusters. The method is a curve smoothing algorithm and removes noises and

Page 20: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

20

outliers by residual analysis. It includes five steps. First, a locally weighted linear least-squares

regression is applied for each point to smooth the curve. Second, the residual for each point is

calculated according to the smoothed curve. Third, the value computed by MAD is applied to

remove outliers and update local weights. Fourth, the curve is smoothed using step one again by

the updated weights. Finally, the previous three steps are repeated for a total of five iterations to

obtain a smooth curve.

In the algorithm, the local linear regression for each point is performed on its neighbours,

where the neighbours are computed by their x values in CS. It is important to determine how

many neighbours are used for the local linear regression. We use the average number of the

points contained in a specified span, dSpan to estimate the number of neighbours k for each

cluster. Considering Figure 3 as an illustration, after transforming all the points into the new

coordinate system CS, we use the formula k=dSpan/l×n to estimate the number of neighbours,

where l=xmax-xmin, n is the number of points in the cluster. We fix the dSpan to 10 meters

according to the experimental tests to calculate k for each cluster.

Figure 3 Illustration of robust Lowess algorithm

For each point cluster ci and the generated centerline li, we assume that the distances from

the points of the cluster to li complies to Gaussian distribution G~(, ), where =0 represents

Page 21: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

21

the centerline and is estimated by MAD (Equation (1)). In the next section, we use max ii

as max to search the candidate list for initializing emission probabilities.

3.2.3 HMM based Map Matching

According to Newson and Krumm (2009), matching GPS traces with the actual road map

(i.e. map matching) is to find the most probable route that the vehicle visited based on the

location of vehicle measured by GPS and the connectivity of the road map. They modeled the

map matching problem as a HMM. In our method, by matching GPS points with sample points

of road centerlines, we construct the topological relationship between road centerlines. We also

use the matching result to filter out points generated by GPS noise. For a point cluster ci, we use

the robust Lowess to generate the centerline li. Then we sample the road centerline as follows. If

the road length of three consecutive points is smaller than a threshold (i.e. 3 meters), remove the

middle point. If the road length between two points is larger than the same threshold, add points

every 3 meters between them. A new road centerline will be obtained as li = {vi,r | r=1, 2,…, n’i},

where vi,r is the sample point of li, and n’i is the new number of points.

The transition probabilities of HMM give the probability of a vehicle moving from one

sample point to another. The emission probability of a particular point pt at time t associated

with a nearby sample point vf, denoted by Pr(vf |pt), gives the likelihood that pt was observed

from nearby sample point vf. For a point pt of a GPS trace, we search up to 10 nearest sample

points in N(pt,,) to generate the candidate list for matching with pt. Figure 4 presents the

procedure of matching GPS points with the sample points. We can calculate probabilities of all

Page 22: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

22

the possible routes that allow a vehicle travel from point pt-1 to point pt according to the

transition probabilities and emission probabilities. We then choose the route that maximizes the

probability as the matching result.

Figure 4 All possible routes for transferring from point pt-1 to point pt

We compute a weighted connection matrix W = {wi,j | i, j=1, 2,…,N} between point clusters

in C according to the raw GPS traces to initialize the transition matrix. If point pi,r in cluster ci

and its next point pj,s in cluster cj belong to a same GPS trace, we conclude that vehicles have

probability to travel from sampled road centerline li to lj. Thus, we can infer the weighted

connection wi,j from cluster ci to cj. Consider Figure 5 as an illustration, 9 points are contained in

cluster ci, where 4 points transit from ci to cj, 3 points transit from ci to ck and 2 points transit

from ci to itself. We then compute wi,j=4/9, wi,k=3/9 and wi,i= 2/9. The sum of the connection

weight originated from ci is equal to 1. The number of non-zero weight originated from ci is

defined as the outdegree, deg(ci) of the cluster ci. In practice, for most of the cluster ci, the

number of points that transit from ci to itself is larger than the number of points that transit from

Page 23: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

23

ci to others. This fact results in that wi,i is much greater than wi,j, where i j. The other fact is

that wi,j is much greater than wr,s, if deg(ci) is much greater than deg(cr). It indicates that

connection weights originated from lower outdegree clusters are larger than the connection

weights originated from higher outdegree clusters. In order to make the connection weight

independent of the outdegree, we compute the weighted connection matrix W as follows:

1) ciC, calculate the number m of the points in ci such that their next points are not in

ci.

2) wi,jW, (i j), calculate the number k of the points in ci such that their next points in

cj, then, set wi,j = k/m.

3) wi,iW, set wi,i =1 if m ni.

4) ciC, compute the outdegree, degi = ||wi,j 0||, (j=1, 2,…, N).

5) compute degmin = min{degi | i=1, 2,…, N}.

6) wi,jW,(i j), update wi,j = wi,j×(degi/degmin).

Figure 5 Points transition between clusters

Given two sampled centerlines li = {vi,r | r=1, 2,…, n’i} and lj = {vj,s | s=1, 2,…, n’j}, which

are derived from the point cluster ci = {pi,r | r=1, 2,…, ni}and cj = {pj,s | s=1, 2,…, nj}, the

transition probabilities are computed based on W as follows:

Page 24: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

24

1) For any two sample points vi,r and vi,t that belong to the same centerline li, if the length

of the route {vi,r, vi,r+1,…, vi,t}, where r<t, is less than 6×max, we set the transition probability as

Pr(vi,r, vi,t) = 1.

2) For any two sample points vi,r and vj,s that belong to two different sampled centerline li

and lj, the transition probability is calculated by Equation (2)

2

, , , , ,Pr( , ) / ( , )i r j s i j i r j sv v w d v v (2)

where wi,j is the connection weight from cluster ci to cj and d(vi,r, vj,s) is the Euclidean distance

between the sample points. The intuitions of setting the transition probabilities are: 1) vehicles

prefer to transit from a point to the other on the same centerline; and 2) if staying on the same

centerline is impossible, vehicles are most likely to transit between the two points with smaller

distance and larger connection weight.

For a point pt, its candidate list for matching is computed by N(pt,,). The search radius

is set to 6×max. We then initialize the emission probability Pr(vf |pt), (f=1, 2,…,10) for pt by the

size of points in clusters as follows.

1) For each vf, get the point cluster ci that the derived sampled centerline li contains vf.

2) For each Pr(vf |pt), set its value as the size of points in cluster ci.

3) Normalize Pr(vf |pt) , (f =1, 2,…,10) for pt by scaling the sum of them to 1.

After initializing the transition probabilities and emission probabilities, we use Viterbi map

matching algorithm (Newson and Krumm, 2009, Thiagarajan et al. 2009) to match raw GPS

traces with sampled centerlines to find the most probabilistic route that vehicles visited. The

Page 25: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

25

route with highest probability indicates the connection relationship between centerlines.

Centerlines that do not match with any raw trace are treated as noisy points. If two consecutive

sample points belong to two centerlines, we infer that the two centerlines are topologically

connected. We then use all the points that are not segmented and matched as new input for the

next loop of our point segmentation algorithm (Algorithm 1).

3.3 Point Cluster Grouping

Due to the accumulation of GPS noises, points outside the road can be segmented to form

another road centerline. The result also contains some short centerlines with similar orientations.

Thus, we use point clusters grouping to form long curves and join together point clusters near

each other to reduce redundancy. A curve drawn in one smooth movement is known as ‘Stroke’

(Thomson and Brooks, 2002). It is derived from the principle of “Good Continuation,” which

indicates that elements following similar directions tend to be grouped together. We use this idea

to guide our point cluster grouping algorithm.

Considering Figure 6 as an illustration, given two topologically connected road centerline li

and lj, we decide whether to group the corresponding point cluster ci and cj according to the

spatial proximity of li and lj.

First, we temporally group ci and cj to obtain a new point cluster cg=ci cj and generate a

new centerline lg from cg, where lg={vg,t | t=1, 2,…,n’g}, n’g≤n’i+n’j. Vertices of lg and points in

cg are in a one-to-one correspondence with each other except outliers. Vertices have associated

cluster IDs are transformed into a coordinate system CS generated by the PCA transformation.

Page 26: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

26

Second, we find the common range R(C) of the two clusters using centerline lg and

generate two centerlines ,{ | 1,2,..., }c c

i r i

c

il v r n and ,{ | 1,2,..., }cc c

j j s jl v s n by points c

ic

and c

jc . Here, c

ic and c

jc , constitute the points of R(C) originally from ci and cj. The common

range R(C) is defined as a range of CS where x values lie in [xmin, xmax]. We find the minimum

and maximum x values among all vertex pairs {vg,t, vg,t+1} of lg as xmin and xmax, where vg,t and

vg,t+1 corresponding to two points in clusters ci and cj, separately.

Third, we propose an -quantile distance between c

il and c

jl to measure the spatial

proximity of the two centerlines li and lj.

1) ,

c c

i r iv l , find , , , ,( , ) min{ ( , ) | }c c c c c c

i r j i r j s j s jd v l d v v v l as distance set ( , )c c

i jD l l ;

2),

c c

j s jv l , find , , , ,( , ) min{ ( , ) | }c c c c c c

j s i j s i r i r id v l d v v v l as distance set ( , )c c

j iD l l ;

3) sort distances in ( , )c c

i jD l l and ( , )c c

j iD l l as ( , )c c

s i jD l l and ( , )c c

s j iD l l in descending

order;

4) obtain the -quantile distance according to ( , )c c

s i jD l l and ( , )c c

s j iD l l as ( , )c c

i jd l l

and ( , )c c

j id l l .

Finally, if the distance ( ( , ) ( , )) / 2c c c c

i j j id l l d l l is less than a pre-defined threshold, we

group ci and cj permanently, otherwise, retain ci and cj. Based on this idea, we propose a point

cluster grouping algorithm described in Algorithm 5.

Page 27: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

27

Figure 6 Illustration of generating the common region of two point clusters

Algorithm 5: Point Cluster Grouping

1

2

3

4

5

6

7

8

9

10

11

Input: ci and cj, , , vdSigma

Output: CN

calculate the difference of mean orientations dDoOmean by points in ci and cj;

calculate the distance d = min{min{d(pi,r, pj,s) | pj,s ∈ cj} | pi,r ∈ ci};

if 𝑑𝐷𝑜𝑂𝑚𝑒𝑎𝑛 > 𝛽 ∧ 𝑑 > 6 × max{𝑣𝑑𝑆𝑖𝑔𝑚𝑎} then

return CN = {ci, cj};

generate centerlines li and lj for point clusters ci and cj;

generate centerline lg by 𝑐𝑔 = 𝑐𝑖 ∪ 𝑐𝑗, and construct a coordinate system CS;

∀{𝑣𝑔,𝑡 , 𝑣𝑔,𝑡+1} that belong to two clusters, find R(C) by x values in CS as [xmin, xmax];

∀𝑝𝑖,𝑟 ∈ 𝑐𝑖, find points in R(C) as 𝑐𝑖𝑐 = {𝑝𝑖,𝑟𝑐

𝑐 |𝑝𝑖,𝑟𝑐𝑐 . 𝑥 ∈ [𝑥𝑚𝑖𝑛, 𝑥𝑚𝑎𝑥]};

∀𝑝𝑗,𝑠 ∈ 𝑐𝑗, find points in R(C) as 𝑐𝑗𝑐 = {𝑝𝑗,𝑠𝑐

𝑐 |𝑝𝑗,𝑠𝑐𝑐 . 𝑥 ∈ [𝑥𝑚𝑖𝑛, 𝑥𝑚𝑎𝑥]};

compute d according to –quantile distance as 𝑑 = (𝑑𝛼(𝑙𝑖𝑐 , 𝑙𝑗

𝑐) + 𝑑𝛼(𝑙𝑗𝑐 , 𝑙𝑖

𝑐))/2.0;

if 𝑑 > 6 × (𝑣𝑑𝑆𝑖𝑔𝑚𝑎[𝑖] + 𝑣𝑑𝑆𝑖𝑔𝑚𝑎[𝑗]) ∧ 𝑑 > 2 × 휀 then

Page 28: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

28

12

13

14

return CN = { ci, cj};

else

return CN={cg};

The input parameter ‘vdSigma’ is a vector of i for each cluster ci. The output is {ci, cj} or

the grouped cluster cg. At the beginning, we use the difference of mean orientations of the two

point clusters ci and cj to determine whether the two centerlines li and lj smoothly connect to

each other. According to Jiang et al. (2008), the deflection angle ranging in [30°, 75°] produces

stable ‘Stroke’ from road segments. The deflection angle is the angle between two connected

road segments. In our method, we intend to generate a longer smooth curve lg from point cluster

cg. Similar to the idea of ‘Stroke’ generation, if the difference of mean orientations of ci and cj is

larger than a pre-defined threshold , we assume that ci and cj cannot be grouped to generate a

longer smooth curve. We set to 60°, which is in the range [30°, 75°]. However, in order to

group point clusters with smaller difference of orientations earlier than point clusters with larger

difference of orientations, we apply the grouping algorithm in a progressive manner. More

specifically, we begin Algorithm 5 on a set of point cluster C with a small (i.e. 10°) and

iteratively run Algorithm 5 by increasing value of with an interval size (i.e. 25°) until

reaches 60°.

Due to the disparities in the sampling rate of the raw GPS traces, the two clusters along the

most probabilistic route that vehicles visited may be far from each other. Thus, we use the

minimum function described in Equation (3) to measure the distance between two point clusters.

Page 29: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

29

If the distance between two clusters d (ci, cj) is greater than 6×max, they are not groupable.

, ,

, ,, min mi( n ( , )) ( )i r i j s j

i j i r j sp c p c

d c c d p p

(3)

If the difference of mean orientations of ci and cj is smaller than , we use the geometric

position of the vertices in li and lj to determine if the two point clusters can be grouped or not.

We compute two -quantile distances to measure the spatial proximity between two point

clusters. One is computed from li to lj and the other is computed from lj to li. If the mean of two

distances is smaller than 6×(i+j) or 2×, we group the two clusters.

3.4 Geometry Refinement

After grouping the point clusters, we generate centerlines using the robust Lowess

algorithm. However, intersections of centerlines are not represented explicitly. Figure 7 (a)

shows the typical centerlines of an inferred road map. We observe that centerlines are not

intersected with each other, but they are topologically connected. Therefore, we use the map

matching result as a constraint to refine the geometry of the centerlines so that they are

consistent with their topological relationship.

Figure 7 Intersections refinement of inferred road maps

First, we simplify all centerlines by Douglas-Peuker (DP) algorithm (1973). The threshold

is set to 3 meters, which is much smaller than the assumed road width. It ensures that the

simplified road centerline remains on the road. Figure 7 (b) shows the simplified centerlines of

Page 30: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

30

Figure 7 (a), most points of the centerlines are removed by DP algorithm.

Then, for two topologically connected centerlines, we compute the intersection as follows.

1) If the two centerlines are intersected with each other, illustrated by intersection 1 of Figure 7

(b), we compute the intersection. 2) If one of the centerline can be extended to intersect with the

other, e.g., intersections 2 and 4 of Figure 7 (b), we extend the centerline and compute the

intersection. 3) If the two centerlines can be extended to intersect with each other, such as the

intersection 3 of Figure 7 (b), we compute the intersection by extending both of the two

centerlines.

Figure 7 (c) shows the intersections computed by the described procedure. Finally, all the

dead end line segments (either of its two end points are intersections) having a length less than

2× are removed.Figure 7 (d) shows the refined road centerlines.

4 Experiments

We use two datasets to test our algorithm. The first one is the GPS traces collected in

Chicago and the second is the GPS traces covering parts of San Francisco city. The Chicago

dataset is GPS traces from buses serving the University of Illinois at Chicago campus (Biagioni

and Eriksson, 2012b). The dataset covers an area of 3.8 km 2.4 km. There are 889 traces and

about 118,000 points. The total length of the traces is 2,671 km. We find that the distances from

some of the GPS points to the road centerline are more than 100 meters. The San Francisco

dataset is GPS traces downloaded from OpenStreetMap (2013). The traces are uploaded by

people around the world. The dataset covers an area of 3.3 km 3.5 km. There are 161 traces

Page 31: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

31

and about 261,000 points. The total length of the traces is 16,507 km. The accuracy of the traces

obtained from OpenStreetMap cannot be guaranteed and there is no meta data about the

accuracy. We then preprocess the GPS traces by distances and difference of orientations

between points before our experiments.

Plate 2 (a) shows the raw GPS traces of the Chicago dataset displayed as points. We apply

our progressive segmentation algorithm to the dataset. The list is set to {20, 10, 5} by trial and

error method. Plate 2 (b) shows the clusters generated with different . Black points are the

union of clusters generated with =20, blue points are clusters generated with =10 and red

points are clusters generated with =5. We can observe that most of the points are segmented

with =20 due to the dense distribution of the GPS points. Plate 2 (c) shows the obtained 94

clusters. Different colors represent different clusters. We then group clusters that are

topologically connected according to their geometric position. Plate 2 (d) shows the point

cluster grouping results, and forty-three clusters are obtained.

Page 32: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

32

Plate 2 (a) Raw GPS traces of the University of Illinois displayed as points; (b) point clusters

generated by the progressive segmentation algorithm, different colors represent clusters

generated with different ; (c) point clusters generated by the progressive segmentation

algorithm, different colors represent different clusters; and (d) point cluster after grouping.

As a final step, we use the robust Lowess algorithm to generate centerlines from the

grouped point clusters. The inferred road map (black lines) is overlapped with the reference map

(gray lines) as shown in Figure 8. We then apply our geometry refinement method to process the

intersection of the interred road map and obtain the final result shown in Figure 9. The close-up

shows that the intersections of topologically connected centerlines are computed properly.

Figure 8 The generated road map (black) overlapped with the reference map (gray) of Chicago

Page 33: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

33

Figure 9 The refined road map (black) overlapped with the reference map (gray) of Chicago

We then repeat the experiment with the same parameters on dataset 2. Figure 10 shows the

raw GPS traces of San Francisco displayed as points. Figure 11 shows the final result of the

inferred road map. According to Figure 11, most of the straight line segments and the centerlines

in high density area (this indicates that the roads are frequently visited by vehicles) are

generated by our algorithm.

Page 34: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

34

Figure 10 Raw GPS traces of the San Francisco displayed as points.

Figure 11 The inferred road map of San Francisco.

We evaluate the representative algorithms such as Edelkamp and Schrödl (2003), Davies et

al. (2006), Cao and Krumm (2009) using the San Francisco dataset. The results are shown in

Page 35: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

35

Figure 12. Figure 12 (a) shows the roads generated by Edelkamp and Schrödl’s method. Due to

the noises of GPS traces, the intersections of the road map cannot be generated properly. Figure

12 (b) shows the road map generated by Davies’s method. Some roads are missing in the

inferred road map due to the uneven distribution of GPS traces. Their method also cannot deal

with the complex intersections. Figure 12 (c) shows the road map inferred by Cao and Krumm’s

method. The result shows that their method cannot handle the disparity problem of GPS traces.

Comparing with the result shown in Figure 11, the proposed method achieves the best

performance among these four methods.

Figure 12 Road networks generated by: (a) Edelkamp and Schrödl’s method; (b) Davies’

method and (c) Cao and Krumm’s method from San Francisco dataset.

Page 36: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

36

Thus far, we qualitatively evaluated the proposed map inference algorithm using the

reference map and other related algorithms. Next, we present a quantitative evaluation for the

geometric accuracy and the topological accuracy of our map inference algorithm. We use the

method proposed by Liu et al. (2012) to evaluate the geometric accuracy and the method

proposed by Biagioni and Eriksson (2012a) to measure the topological accuracy. In general,

they match the inferred road map with the reference map by different matching distance

thresholds, and use F-measure to quantitatively evaluate the inferred road map. Assume that tp

is the matched proportion of inferred map and reference map, ||Truth|| is the road length on

reference map, ||M|| is the road length on inferred road map. F-measure = (2 precision

recall)/(precision+recall), where recall=tp/||Truth||, precision=tp/||M||.

The details of the geometric evaluation method are described as follows:

1. Sample points on the roads of the reference map and inferred maps using equal

intervals (i.e., 1 m).

2. Compute the orientations of all the points by their connections (modulo 180°).

3. Match the sample points on the inferred map with the reference map by nearest

distance matching under the constraint of the orientation. The difference of orientations

of two points matched with each other is no more than 60°.

4. Use different distance thresholds to determine the best match proportion.

The details of the topological evaluation method are described as follows:

1. Sample points on the roads of the reference and inferred maps using equal intervals

Page 37: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

37

(i.e., 1 m).

2. For each intersection in the reference map, find its topologically connected points

within a maximum radius (i.e., 50 m).

3. Find the nearest point of the intersection in the inferred road map as a starting point and

repeat step 2 to generate a topologically connected point set.

4. Match the two sets of points generated in step 2 and step 3 to count the matched and

unmatched holes and marbles.

5. Repeat from step 2 to step 4 until all intersections in reference map are visited.

The reference map (gray lines) and inferred map (black lines) of the Chicago dataset are

overlapped, as shown in Figure 13. Figure 13 (a) presents the road map generated by Biagioni

and Eriksson’s algorithm (2012b). We manually select the roads of the reference map that are

actually visited by vehicles according to the raw GPS traces. Each road is represented by one

centerline, since their algorithm does not separate the centerlines with opposite directions for a

bidirectional road. Figure 13 (b) is the road map generated by our method. We generate two

centerlines for the opposite directions of a bidirectional road. The corresponding roads of the

reference map also have two centerlines.

Page 38: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

38

Figure 13 (a) Road map (black lines) constructed using Biagioni and Eriksson’s method and the

roads that actually visited by vehicles (gray lines); (b) road map (black lines) generated by our

proposed method and the roads that actually visited by vehicles (gray lines).

First, we evaluate the geometric accuracy. The total length of the roads in the reference

map in Figure 13 (a) is 32.89 km, and the total length of the roads inferred by Biagioni and

Eriksson’s method is 24.50 km. The matching results with different matching thresholds of

distance are shown in Table 1. We apply the same evaluation procedure to our inferred road map.

The results are shown in Table 2. The total length of the roads in the reference map in Figure 13

(b) is 42.15 km, and the total length of the inferred roads of our method is 35.53 km.

Bidirectional roads in Figure 13 (b) are represented by two centerlines. In Table 1 and Table 2,

the ‘length of well-matched roads’ is the total length of roads in the reference map (or the

inferred map) that are matched with roads in the inferred map (or the reference map), given the

matching threshold of distance.

Page 39: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

39

Table 1.Matching results of Biagioni and Eriksson’s method on the Chicago dataset

Matching Threshold

of Distance (m)

Length of

Well Matched Roads (km)

Recall Precision F-measure

5 15.79 0.48 0.645 0.55

10 20.22 0.615 0.826 0.705

15 22.71 0.690 0.927 0.791

20 23.57 0.717 0.962 0.821

50 24.01 0.730 0.980 0.837

Table 2.Matching results of our method on the Chicago dataset

Matching Threshold

of Distance (m)

Length of

Well Matched Roads (km)

Recall Precision F-measure

5 30.93 0.734 0.870 0.796

10 33.89 0.804 0.954 0.873

15 34.21 0.812 0.963 0.881

20 34.28 0.813 0.965 0.883

50 34.3 0.814 0.965 0.883

Figure 14 presents the plots of the recall, precision and F-measure of the two methods.

Figure 14 (a) shows Biagioni and Eriksson’s method (2012b). Their values increase as the

matching threshold increases from 5 m to 20 m and become constant after the threshold reaches

20 m. Figure 14 (b) demonstrates that the values of these three criteria do not change much with

Page 40: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

40

our method after reaching the matching threshold of 10 m. The values of the three criteria of our

method are larger than the values of Biagioni and Eriksson’s method when the matching

threshold is smaller than 20 m. These results indicate that the road map inferred by our method

has better accuracy in terms of the geometry. In addition, using Chicago dataset, Biagioni and

Eriksson (2012b) demonstrated that, in terms of geometry, their method outperformed the

typical existing map inference methods, i.e., clustering based methods (Edelkamp and Schrödl,

2003), trace merging based methods (Cao and Krumm, 2009) and KDE based methods (Davies

et al., 2006).

Figure 14 Recall, precision and F-measure for geometric evaluation of (a) Biagioni and

Eriksson’s method, and (b) our method for Chicago dataset.

Then, we evaluate the topological accuracy. Figure 15 presents the plots of the recall,

precision and F-measure of Biagioni and Eriksson’s method (Figure 15 (a)) and our method

(Figure 15 (b)). Compared to Figure 15 (b), the values of the three criteria in Figure 15 (a)

increase faster as the matching threshold increase. The fact indicates that the road map inferred

by our method has a better topological accuracy under a small matching distance threshold. The

precision of the inferred road map generated by our method is better than the map generated by

0.40.50.60.70.80.91

0 5 10 15 20 25 30 35 40 45 50 55

Matching distance threshold (m)

Recall Precision F-measure(a)

0.40.50.60.70.80.91

0 5 10 15 20 25 30 35 40 45 50 55

Matching distance threshold (m)

Recall Precision F-measure(b)

Page 41: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

41

Biagioni and Eriksson’s method; however, the recall is worse after matching distance threshold

is greater than 10 meters. The reason is that we generated the intersections according to the

HMM based map matching result. The intersection of two geometrically intersected roads

cannot be inferred if no vehicles transit from one road to the other via the intersection. The

number of intersections in the reference map of Figure 13 (b) is 117. We only generate 76 of

them according to the HMM based map matching result.

Figure 15 Recall, precision and F-measure for topological evaluation of (a) Biagioni and

Eriksson’s method, and (b) our method for Chicago dataset.

5 Conclusion

In this research, we propose a point segmentation and grouping algorithm for road map

inference from GPS traces. We assume that all road segments can be fitted by a set of line

segments. We then partition all points into clusters to represent the centerlines of roads in a

progressive manner. We use the robust Lowess algorithm to generate centerlines from

unorganized point clouds. We also employ HMM based map matching algorithm to construct

the topological relationship of road centerlines and filter road segments that generated by GPS

noises. Then we group road centerlines according to the topological relationship and spatial

0.5

0.60.7

0.8

0.9

1

0 5 10 15 20 25 30

Matching distance threshold (m)

Recall Precesion F-measure(a)

0.5

0.60.7

0.8

0.9

1

0 5 10 15 20 25 30

Matching distance threshold (m)

Recall Precesion F-measure(b)

Page 42: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

42

proximity to remove the redundancy. Finally, we perform a geometry refinement on the road

network to ensure that the geometric relationship of road centerlines is consistent with its

topological relationship. Our method performs better for the road network that consists of

straight line segments than the road network that includes winding roads. According to the

experimental and evaluation results, we conclude that our method has high performance in

terms of the geometric and topological accuracy.

References

Agamennoni, G., J. I. Nieto and E. M. Nebot. 2010. "Technical Report: Inference of Principal

Road Paths Using GPS Data". The University of Sydney, Australian Centre for Field

Robotics.

Agamennoni, G., J. I. Nieto and E. M. Nebot. 2011. "Robust Inference of Principal Road Paths

for Intelligent Transportation Systems". IEEE Transactions onIntelligent Transportation

Systems, 12 (1): 298-308.

Ahmed, M., B. T. Fasy, K. S. Hickmann, et al. 2013. "Path-Based Distance for Street Map

Comparison".arXiv preprintarXiv:1309.6131.

Ahmed, M., S. Karagiorgou, D. Pfose, et al. 2014. "A Comparison and Evaluation of Map

Construction Algorithms".arXiv preprint arXiv:1402.5138.

Biagioni, J. and J. Eriksson. 2012a. "Inferring Road Maps from Global Positioning System

Traces: Survey and Comparative Evaluation". Transportation Research Record: Journal of

the Transportation Research Board, 2291 (1): 61-71.

Page 43: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

43

Biagioni, J. and J. Eriksson. 2012b. "Map Inference in the Face of Noise and Disparity". In

Proceedings of the20th International Conference on Advances in Geographic Information

Systems, (New York, USA: ACM), pp. 79-88.

Cao, L. and J. Krumm. 2009. "From GPS Traces to a Routable Road Map". In Proceedings of

the17th International Conference on Advances in Geographic Information Systems, (New

York, USA: ACM), pp. 3-12.

Chen, C. and Y. Cheng. 2008. "Roads Digital Map Generation with Multi-track GPS Data". In

2008 International Workshop on Education Technology and Training & 2008 International

Workshop on Geoscience and Remote Sensing (IEEE Computer Society), pp. 508-511.

Chen, D., L. J. Guibas, J. Hershberger, et al. 2010. "Road Network Reconstruction for

Organizing Paths". In Proceedings of the twenty-first annual ACM-SIAM symposium on

Discrete Algorithms. Societyfor Industrial and Applied Mathematics, (Philadelphia, USA),

pp. 1309-1320.

Cleveland, W. S. 1979. "Robust Locally Weighted Regression and Smoothing Scatterplots".

Journal of the American Statistical Association, 74 (368): 829-836.

Davies, J. J., A. R. Beresford and A. Hopper. 2006. "Scalable, Distributed, Real-Time Map

Generation". IEEE Pervasive Computing, 5 (4): 47-54.

Doucette, P., P. Agouris and A. Stefanidis. 2004. "Automated Road Extraction from High

Resolution Multispectral Imagery". Photogrammetric Engineering & Remote Sensing, 70

(12): 1405-1416.

Page 44: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

44

Edelkamp, S. and S. Schrödl, 2003. "Route Planning and Map Inference with Global

Positioning Traces". In Computer Science in Perspective, (Springer, Berlin Heidelberg), pp.

128-151.

Ester, M., H. Kriegel, J. Sande, et al. 1996. "A Density-Based Algorithm for Discovering

Clusters in Large Spatial Databases with Noise". In KDD, pp. 226-231.

Fathi, A. and J. Krumm. 2010. "Detecting Road Intersections from GPS Traces". In Geographic

Information Science, (Springer, Berlin Heidelberg), pp. 56-69.

Guo, T., K. Iwamura and M. Koga. 2007. "Towards High Accuracy Road Maps Generation

from Massive GPS Traces Data".2007 IEEE Geoscience and Remote Sensing Symposium,

pp. 667-670.

Ioannou, Y., B. Taati, R. Harrap and M. Greenspan. 2012. "Difference of Normals as a

Multi-Scale Operator in Unorganized Point Clouds". In 2012 Second International

Conference on 3D Imaging, Modeling, Processing, Visuallization and Transmission

(3DIMPVT) (IEEE).

Jiang, B., S. Zhao and J. Yin. 2008. "Self-organized Natural Roads for Predicting Traffic Flow:

ASensitivity Study". Journal of Statistical Mechanics: Theory and Experiment, 2008 (07):

1-27.

Karagiorgou, S. and D. Pfoser, 2012, "On Vehicle Tracking Data-Based Road Network

Generation". InProceedings of the20th International Conference on Advances in

Geographic Information Systems, (New York, USA: ACM), pp. 89-98.

Page 45: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

45

Kégl, B., A. Krzyzak, T. Linder, et al. 2000. "Learning and Design of Principal Curves". IEEE

Transactions on Pattern Analysis and Machine Intelligence, 22 (3): 281-297.

Krumm, J., J. Letchner and E. Horvitz. 2007. "Map Matching with Travel Time Constraints". In

Society of Automotive Engineers (SAE) World Congress.

Krumm, J., N. Davies and C. Narayanaswami. 2008. "User-Generated Content". IEEE

Pervasive Computing, 7 (4): 10-11.

Kasemsuppakorn P. and H. A. Karimi 2013. "Pedestrian Network Extraction from Fused Aerial

Imagery (Orthoimages) and Laser Imagery (Lidar)". Photogrammetric Engineering &

Remote Sensing, 79 (4): 369-379.

Li, J., Q. Qin, J. Han, et al. 2015. "Mining Trajectory Data and Geotagged Data in Social Media

for Road Map Inference". Transactions in GIS, 19 (1): 1-18.

Liu, X., J. Biagioni, J. Eriksson, et al. 2012. "Mining Large-Scale, Sparse GPS Traces for Map

Inference: Comparision of Approaches". InProceedings of the18th ACM SIGKDD

international conference on Knowledge discovery and data mining, (New York, USA:

ACM), pp. 669-677.

Lou, Y., C. Zhang, Y. Zheng, et al. 2009. "Map-matching for Low-sampling-rate GPS

Trajectories". InProceedings of the17th ACMSIGSPATIAL International Conference on

Advancesin Geographic Information Systems, (New York, USA: ACM), pp. 352-361.

Page 46: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

46

Newson, P. and J. Krumm. 2009. "Hidden Markov Map Matching Through Noise and

Saprseness". In Proceedings of the 17th ACM SIGSPATIAL International Conference on

Advances in Geographic Information Systems, pp. 336-343.

Niehoefer, B., R. Burda, C. Wietfeld, et al. 2009. "GPS Community Map Generation for

Enhanced Routing Methods Based on Trace-Collection by Mobile Phones".The First

International Conference on Advances in Satellite and Space Communications2009.

SPACOMM 2009, pp. 156-161.

OpenStreetMap Foundation: Bulk GPS track data, 2013.

https://blog.openstreetmap.org/2013/04/12/bulk-gpx-track-data/

Qiu, J., R. Wang and X. Wang. 2014. "Inferring Road Maps from Sparsely-Sampled GPS

Traces".InAdvances in Artificial Intelligence, Springer, pp. 339-344.

Schrödl, S., K. Wagstaff, S. Rogers, et al. 2004. "Mining GPS Traces for Map Refinement".

Data Mining And Knowledge Discovery, 9: 59-87.

Shi, W., S. Shen and Y. Liu. 2009. Automatic Generation of Road Network Map from Massive

GPS Vehicle Trajectories. In Proceedings of the 12th International IEEE Conference on

Intelligent Transportation Systems (St. Louis, MO, USA), pp. 28-53.

Thomson, R. C. and R. Brooks. 2002. Exploiting Perceptual Grouping for Map Analysis,

Understanding and Generalization: The Case of Road and River Networks. In Graphics

Recognition Algorithms and Applications, (Springer, Berlin Heidelberg), pp. 148-157.

Page 47: Automatic Extraction of Road Networks from GPS Traces · GPS traces preprocessing, point cloud progressive segmentation, point cluster grouping and road network geometry refinement

47

Thiagarajan, A., L. Ravindranath, K. LaCurts, et al. 2009. "VTrack: Accurate, Energy-aware

Road Traffic Delay Estimation Using Mobile Phones". In Proceedings of the 7th ACM

Conference onEmbedded Networked Sensor Systems, (New York, USA: ACM), pp. 85-98.

Wang, J., X. Rui, X. Song, et al. 2015. "A Novel Approach for Generating Routable Road Maps

from Vehicle GPS Traces". International Journal of Geographical Information Science, 29:

69-91.

Wang, J., Z. Yu, W. Zhang, et al. 2014. "Robust Reconstruction of 2D Curves from Scattered

Noisy Point Data". Computer-Aided Design, 50: 27-40.

Wang, Y., X. Liu, H. Wei, et al. 2013. "CrowdAtlas: Self-Updating Maps for Cloud and

Personal Use". In Proceeding of the 11th annual international conference on Mobile

systems, applications, and services, ACM, pp. 27-40.

Youn J., J. S. Bethel, et al. 2008, "Extracting Urban Road Networks from High-resolution True

Orthoimage and Lidar". Photogrammetric Engineering & Remote Sensing, 74 (2): 227-237.