data innovation€¦ · rizal khaefi (junior data scientist) and mellyana frederika (partnership...

31
Data Innovation Metropolitan Sustainable Transportation

Upload: others

Post on 07-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

Data InnovationMetropolitan Sustainable Transportation

Draft as per January 26, 2020

Data InnovationMetropolitan Sustainable Transportation

Page 2: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

GIZ Data Lab brings together thinkers and practitioners to promote the effective, fair and responsible use of digital data for sustainable development. The Lab applies an agile and experimental way of working to explore new data related trends and questions, and develop forward-looking solutions in GIZ partner countries. External partnerships - both when it comes to innovating and implementing - are key to the work of the Lab.

Pulse Lab Jakarta is a joint data innovation facility of the United Nations (Global Pulse) and the Government of Indonesia (via the Ministry of National Development Planning, Bappenas). The Lab employs a mixed-method approach, through which it harnesses alternative data sources and advanced data analytics methods to obtain actionable insights and applies human-centered design to ground-truth insights from its data analysis and research, providing evidence to inform policy makers.

Grab is the leading super app in Southeast Asia, providing everyday services that matter most to consumers. Today, the Grab app has been downloaded onto over 185 million mobile devices, giving users access to 9 million drivers, merchants and agents. Grab offers the widest range of on-demand transport services in the region, in addition to food, package, grocery delivery services, mobile payments and financial services across 349 cities in eight countries.

This report was produced by Pulse Lab Jakarta, co-authored in particular by Muhammad Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy Lead) and support of Narawan Mahasarakam.

Page 3: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

Table of Contents

Page

Introduction 3

1 Context 3

2 Bangkok Metropolitan Transportation 3

3 The Project 3

4 Objective and Scope of Work 4

5 Datasets 55.1 Ride-hailing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55.2 Transportation Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55.3 Environment Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55.4 Demography Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Data for Mobility 8

Work Package 1 - Macroscopic Traffic Modelling 9

1 Summary 9

2 Data and Features 9

3 Methods 93.1 Gravity laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Radiation laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Results 10

Work Package 2 - Road Speed Profiling 12

1 Summary 12

2 Data and Features 12

3 Methods 123.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 Speed Profiling Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1

Page 4: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

4 Results 13

Work Package 3 - Traffic Congestion Nowcasting 15

1 Summary 15

2 Data and Features 15

3 Methods 153.1 Preprocessing and Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Congestion level calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.3 Model development and validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.4 Comparison with eBum model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Results 16

Work Package 4 - Quantifying Population Exposure to Air Pollution 20

1 Summary 20

2 Data and Features 20

3 Methods 203.1 Preprocessing and Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 Developing model to infers AQI from predictors . . . . . . . . . . . . . . . . . . . 203.3 AQI level prediction using developed model and predictors data . . . . . . . . . . . 213.4 AQI level inference where predictors data incomplete . . . . . . . . . . . . . . . . . 21

4 Results 21

Reflections and Further Works 26

1 The Potential of Alternative Data Sets 26

2 Partnership in Data Innovation 26

3 Conclusions and next steps 263.1 Macroscopic traffic flow modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2 Road speed profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3 Traffic Congestion Nowcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4 Inferring air pollution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 Mainstreaming the use of alternate data sources 28

References 29

2

Page 5: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

4 Results 13

Work Package 3 - Traffic Congestion Nowcasting 15

1 Summary 15

2 Data and Features 15

3 Methods 153.1 Preprocessing and Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Congestion level calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.3 Model development and validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.4 Comparison with eBum model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Results 16

Work Package 4 - Quantifying Population Exposure to Air Pollution 20

1 Summary 20

2 Data and Features 20

3 Methods 203.1 Preprocessing and Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 Developing model to infers AQI from predictors . . . . . . . . . . . . . . . . . . . 203.3 AQI level prediction using developed model and predictors data . . . . . . . . . . . 213.4 AQI level inference where predictors data incomplete . . . . . . . . . . . . . . . . . 21

4 Results 21

Reflections and Further Works 26

1 The Potential of Alternative Data Sets 26

2 Partnership in Data Innovation 26

3 Conclusions and next steps 263.1 Macroscopic traffic flow modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2 Road speed profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3 Traffic Congestion Nowcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4 Inferring air pollution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 Mainstreaming the use of alternate data sources 28

References 29

2

Introduction

1 CONTEXT

Urbanization changes how the city operates. By2050, it is expected that nearly 70% percent of theworld’s population will live in urban areas. Nearly90% the increment of the urban population by 2050is expected to be concentrated in Asia and Africa.In Southeast Asia, this phenomenon created megaurbans, such as Jabodetabek, Bangkok Metropoli-tan Region and Metro Manila. These cities arefacing many transboundary challenges from watersupply, wastewater to transport and environmentalprotection. Rapid urbanisation in this region hasled to a large share of urban growth involvingunplanned, unstructured expansion, with high ratesof car use and a substantial proportion of people relyon personal vehicles to commute. Without adequatetransport infrastructure, cities suffer the growingcongestion on the road and there is a significanteconomic cost to this.

Transportation planning has been using data-driven analysis and modelling analysis to supportpolicy decisions in this issue [1]. The data has tra-ditionally informed transport planning has primarilybeen provided by manual surveys of people andtheir travel behaviour, for estimating travel demand,by manual mapping, service level and landscapesurveys, for estimating transport supply, and bya mixture of manual and automated surveys ofmovements and transactions, for calibrating flows.However, data to develop and validate transportmodels has been limited by availability, frequency,or acquisition costs and time. Planners and policymakers are looking for ways to improve this.

2 BANGKOK METROPOLITAN TRANSPORTATION

In recent years, Southeast Asia cities such asBangkok are rapidly expanding. Between the years2000 and 2010, the capital of Thailand has grown

from 1,900 square kilometers to 2,100 making itthe fifth largest urban area in East Asia in 2010.Today, it is the second largest city in the region.However, this growth has not been aligned properlywith Bangkoks land use and transport planningstrategy, resulting in uncontrolled traffic growthand over reliance on private motorized vehicles.This unsustainable transport approach has increasedtraffic congestion, air pollution, massive investmentin road infrastructure, high energy demand andGHG emissions, affecting the lives of millions ofBangkok’s residents.

The Office of Transport and Traffic Policy andPlanning (OTP) develops Extended Bangkok UrbanModel (eBUM) as a transportation base model andusing it as a tool to analyze and forecast transportand traffic situations. Initiated in 2016, the surveyis done every five years with improvement and newdata sources being fed into the model. New alterna-tive data sources have potential to refine this model,validate assumptions and provide more responsive,near real time model.

3 THE PROJECT

The GIZ Data Lab is a platform that bringstogether thinkers and practitioners to promote theeffective, fair and responsible use of digital datafor sustainable development. The Lab applies anagile and experimental way working to explorenew trends and develop forward-looking solutionsin GIZ partner countries. Together with a networkof partners, the Lab promotes the use of digital datato advance development and to support the deliveryof innovative services by GIZ.

The GIZ Data Lab is a platform that bringstogether thinkers and practitioners to promote theeffective, fair and responsible use of digital datafor sustainable development. The Lab applies anagile and experimental way working to explore

3

Page 6: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

new trends and develop forward-looking solutionsin GIZ partner countries. Together with a networkof partners, the Lab promotes the use of digital datato advance development and to support the deliveryof innovative services by GIZ.

The Data Lab team up with Pulse Lab Jakarta,a joint initiative of the United Nations and theGovernment of Indonesia, to conduct a numberexperiment contributing to Sustainable BangkokMetropolitan Transport. Pulse Lab Jakarta, estab-lished to explore the potential use of big data forpublic policy and social good, aim to harness bigdata and artificial intelligence responsibly as publicgood. Its mission is to accelerate the discoveryand adoption of data innovation for sustainabledevelopment and humanitarian action.

This project is an experiment aimed at answeringthe question of how we can use non-traditional datasources to create more sustainable and inclusivetransportation systems. The idea is to combine dif-ferent data sources to improve the understandingof travel patterns for sustainable urban mobilityplanning.

4 OBJECTIVE AND SCOPE OF WORK

In recent years, the emergence of new tech-nologies has greatly impacted personal mobility invarious dimensions by modifying and/ or overcom-ing traditional travel barriers and constraints. Theextensive use of smartphones by individuals hasled innovators to develop app-based transportationservices that efficiently link passengers to driverswithin minutes. One of the most significant appli-cations of ICT in transportation is ride-hailing atech-driven adaptation of traditional street-hailing.Itincludes transportation network companies (TNC)offer methods of shared mobility that enable passen-gers to quickly book a ride directly with a vehiclesowner using smartphone applications [2].

Ride-hailing transportation service allows passen-gers to request a ride in a real-time via smartphoneapplication that links passengers to nearby drivers.This new form of on-demand transportation capi-talizes on innovations like GPS chips to developapp-based, on-demand transportation that quicklyand reliably connects riders and drivers. Both thepassenger and the driver can use Global Position

System (GPS) navigation during the trip, and theapplication guides the driver to shortcuts and less-congested roadways.

Ride-hailing applications are increasingly popularin Southeast Asian cities. Statista, a German onlineportal for statistics, stated that user penetrationgrowth in the region is 16,8% in 2019 and is ex-pected to hit 25,1% by 2023. In Bangkok alone, userpenetration is 6,8% in 2019 and is expected to hit11,7% by 2023 [3]. This growth leads to significantincrease in the data being generated from ride-hailing applications related to mobility patterns inthe region. GrabTaxi Holdings Pte. Ltd. (branded assimply Grab) is a Southeast Asia-based technologycompany that offers ride-hailing, ride-sharing andlogistics services through its app, covering Singa-pore, Cambodia, Indonesia, Malaysia, Philippines,Vietnam, Thailand, and Myanmar. Grab joined thiscollaboration through their partnership with PulseLab Jakarta in promoting mobility and transportaccessibility in a responsible and ethical manner.Grab as data holder provides the main alternativedata sets in this experimentation and the scope ofthe experimentation is as follows:

Based on this opportunity, the scope of the ex-perimentation is as follows:

1) Exploring the use of ride-hailing data to in-form macroscopic traffic flow modeling.

2) Exploring the use of ride-hailing data to de-velop road speed profile.

3) Exploring the use of ride-hailing data to now-cast location, time and patterns of traffic toinform the formulation of a congestion charge.

4) Proof of concept to use ride-sharing dataas a proxy for measuring population activitypatterns to calculate population-weighted ex-posure to air pollution on a city-wide scale.

Note: This is a different scope of work with a plan.The initial plan to conduct experiments to inferroad quality using ride-hailing data is changed todevelop road speed profiles as requested by GIZ.

4

Page 7: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

new trends and develop forward-looking solutionsin GIZ partner countries. Together with a networkof partners, the Lab promotes the use of digital datato advance development and to support the deliveryof innovative services by GIZ.

The Data Lab team up with Pulse Lab Jakarta,a joint initiative of the United Nations and theGovernment of Indonesia, to conduct a numberexperiment contributing to Sustainable BangkokMetropolitan Transport. Pulse Lab Jakarta, estab-lished to explore the potential use of big data forpublic policy and social good, aim to harness bigdata and artificial intelligence responsibly as publicgood. Its mission is to accelerate the discoveryand adoption of data innovation for sustainabledevelopment and humanitarian action.

This project is an experiment aimed at answeringthe question of how we can use non-traditional datasources to create more sustainable and inclusivetransportation systems. The idea is to combine dif-ferent data sources to improve the understandingof travel patterns for sustainable urban mobilityplanning.

4 OBJECTIVE AND SCOPE OF WORK

In recent years, the emergence of new tech-nologies has greatly impacted personal mobility invarious dimensions by modifying and/ or overcom-ing traditional travel barriers and constraints. Theextensive use of smartphones by individuals hasled innovators to develop app-based transportationservices that efficiently link passengers to driverswithin minutes. One of the most significant appli-cations of ICT in transportation is ride-hailing atech-driven adaptation of traditional street-hailing.Itincludes transportation network companies (TNC)offer methods of shared mobility that enable passen-gers to quickly book a ride directly with a vehiclesowner using smartphone applications [2].

Ride-hailing transportation service allows passen-gers to request a ride in a real-time via smartphoneapplication that links passengers to nearby drivers.This new form of on-demand transportation capi-talizes on innovations like GPS chips to developapp-based, on-demand transportation that quicklyand reliably connects riders and drivers. Both thepassenger and the driver can use Global Position

System (GPS) navigation during the trip, and theapplication guides the driver to shortcuts and less-congested roadways.

Ride-hailing applications are increasingly popularin Southeast Asian cities. Statista, a German onlineportal for statistics, stated that user penetrationgrowth in the region is 16,8% in 2019 and is ex-pected to hit 25,1% by 2023. In Bangkok alone, userpenetration is 6,8% in 2019 and is expected to hit11,7% by 2023 [3]. This growth leads to significantincrease in the data being generated from ride-hailing applications related to mobility patterns inthe region. GrabTaxi Holdings Pte. Ltd. (branded assimply Grab) is a Southeast Asia-based technologycompany that offers ride-hailing, ride-sharing andlogistics services through its app, covering Singa-pore, Cambodia, Indonesia, Malaysia, Philippines,Vietnam, Thailand, and Myanmar. Grab joined thiscollaboration through their partnership with PulseLab Jakarta in promoting mobility and transportaccessibility in a responsible and ethical manner.Grab as data holder provides the main alternativedata sets in this experimentation and the scope ofthe experimentation is as follows:

Based on this opportunity, the scope of the ex-perimentation is as follows:

1) Exploring the use of ride-hailing data to in-form macroscopic traffic flow modeling.

2) Exploring the use of ride-hailing data to de-velop road speed profile.

3) Exploring the use of ride-hailing data to now-cast location, time and patterns of traffic toinform the formulation of a congestion charge.

4) Proof of concept to use ride-sharing dataas a proxy for measuring population activitypatterns to calculate population-weighted ex-posure to air pollution on a city-wide scale.

Note: This is a different scope of work with a plan.The initial plan to conduct experiments to inferroad quality using ride-hailing data is changed todevelop road speed profiles as requested by GIZ.

4

5 DATASETS

5.1 Ride-hailing DataWe use anonymous ride-hailing data sets col-

lected from September 2018 to December 2018and from March 2019 to April 2019 to capturethe different patterns between common days anduncommon days. To represent uncommon days, weselect days during Songkran Festival, the Thai NewYears national holiday on the 13th April every yearand can be extended to 15th April or more. The sizeof this data set is over 4 billion measurements.

Ride-hailing data used in this experiment is tra-jectory data. It is collected through the driver’s GrabApplication and uploaded to the data managementcenter by Grab Application. The trajectory data isreported at a 30-s interval, including real-time infor-mation as described in Table I such as longitude andlatitude (including altitude), instantaneous speed,moving direction (360 degrees) and timestamp.Other ride-hailing data generated by accelerometerand gyroscope at high spatio-temporal resolutions.

5.2 Transportation DataThe Office of Transport and Traffic Policy Plan-

ning of the Government of Thailand (OTP) usesa trip-based transportation model called ExtendedBangkok Transport Model. The model has beenused to analyse and forecast transport and trafficsituations in Bangkok in the past 15 years.

TABLE I: Ride hailing metadata

No Column Name Description Example of value

1 driver id Anonymized driver id (64 characters)ec79db0575ad620621ede0d3d36740c918731a5f0b4990d4

961a9552f1cef6a42 timestamp Date and time (yyyy-mm-dd) 2019-01-01

3 latitude Geographic coordinate of east-west position on theEarth’s surface (degree) -6.25

4 longitude Geographic coordinate of north-south position on theEarth’s surface (degree) 108.25

5 altitude Height relative with sea level (m) 106 bearing Direction relative to the true north (degree) 1007 speed Moving speed (m/s) 208 gyroscope data Device orientation and angular velocity x,y,z 1,-1,2

9 accelerometerdata Device acceleration forces x,y,z 2,-1,3

From this model, OTP provided access to 2017congestion data and its estimation for 2027 and aroad map network including public transportationnetwork data. In addition, we received access tocongestion data and 2027 estimation from OTP.The congestion data is an integrated part of theextended Bangkok Urban Model. Complete lists ofthe transportation data used in this research areshown in Table II.

5.3 Environment DataPollution Control Department of Government of

Thailand (PCD) provided historical air pollutiondata collected from 12 official sensors in Bangkok.In addition, we used open data sources to collectother environment data as listed in Table III.

5.4 Demography DataWe have limited access to socio-demography and

land use data of Bangkok Metropolitan Area. How-ever, we found basic information on population, cityadministration, and important landmarks as proxy todetail datasets needed as local characteristics to ourmodel. Full lists demographic data used in this studyare shown in Table IV.

5

Page 8: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

TABLE II: Transportation metadata

No Data Details Source Year

1 Roadway andtransportation network

• Road length• Road type• Number of lanes

Bangkok OTP 2011

2 Public transportationnetwork

• Land based (BTS (Skytrain), MRT,Bus)

• River based (Chao PhrayaExpressway, Khlong Saen Saepboat express)

Bangkok OTP 2019

3 Historical congestiondata

Estimated congestion level developed byeBUM model Bangkok OTP 2017

4 Road construction event• Location and time of construction

work• Lane direction

Government official releaseand news 2018-2019

TABLE III: Environment metadata

No Data Details Source Year

1 Historical airpollution Air quality level on a daily basis. PCD 2018-2019

2Aerosol opticaldepth(MCD19A2)

Multi-Angle Implementation ofAtmospheric Correction (MAIAC)algorithm-based Level-2 gridded (L2G)AOD.

US National Aeronautics andSpace Agency (NASA) 2018-2019

3

Enhancedvegetationindices(MOD13A3)

Global MOD13A3 data are providedevery month at 1-kilometer spatialresolution as a gridded Level-3 productin the Sinusoidal projection

NASA 2019

4 Digital elevationmodel (SRTM)

Shuttle Radar Topography Mission(SRTM) provide complete digitalelevation data for global coverage

United States GeographicalSurvey Agency 2014

5 Air temperature Daily aggregate of air temperatures fromground sensors measurements

US National Oceanic andAtmospheric Administration

(NOAA)2018-2019

6

Page 9: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

TABLE II: Transportation metadata

No Data Details Source Year

1 Roadway andtransportation network

• Road length• Road type• Number of lanes

Bangkok OTP 2011

2 Public transportationnetwork

• Land based (BTS (Skytrain), MRT,Bus)

• River based (Chao PhrayaExpressway, Khlong Saen Saepboat express)

Bangkok OTP 2019

3 Historical congestiondata

Estimated congestion level developed byeBUM model Bangkok OTP 2017

4 Road construction event• Location and time of construction

work• Lane direction

Government official releaseand news 2018-2019

TABLE III: Environment metadata

No Data Details Source Year

1 Historical airpollution Air quality level on a daily basis. PCD 2018-2019

2Aerosol opticaldepth(MCD19A2)

Multi-Angle Implementation ofAtmospheric Correction (MAIAC)algorithm-based Level-2 gridded (L2G)AOD.

US National Aeronautics andSpace Agency (NASA) 2018-2019

3

Enhancedvegetationindices(MOD13A3)

Global MOD13A3 data are providedevery month at 1-kilometer spatialresolution as a gridded Level-3 productin the Sinusoidal projection

NASA 2019

4 Digital elevationmodel (SRTM)

Shuttle Radar Topography Mission(SRTM) provide complete digitalelevation data for global coverage

United States GeographicalSurvey Agency 2014

5 Air temperature Daily aggregate of air temperatures fromground sensors measurements

US National Oceanic andAtmospheric Administration

(NOAA)2018-2019

6

TABLE IV: Demography metadata

No Data Details Source Year1 Population Number of population at sub-district level Bangkok Office of Statistics 2015

2 Population Population estimation at 100m x 100mgrid Worldpop 2014

3CityAdministrationMap

Administration boundary down tosub-district level OTP 2011

4 Landmark• Type of landmark (commercial,

residential, office, etc)• Geo-location

OTP 2011

7

Page 10: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

Data for Mobility

The Office of Transport and Traffic Policy andPlanning (OTP) developed and used a trip basedtransportation model called Extended Bangkok Ur-ban Model (eBUM). This is the most completeand up-to-date transportation model developed inBangkok. It is a tool to analyse and forecast trans-port situations as a result of changes in transport net-work in the study area. The model is also used to testtraffic management measures proposed by respon-sible agencies. An eBUM consists of fundamentaldata which is updated through an expensive surveyand this is one important issue in using eBUM.Another challenge is in the model calibration andvalidation process.

We see opportunity to use this experiment toexplore ways that potentially could tackle somechallenges eBum is currently facing by asking thefollowing question: a) can ride-hailing data be usedto validate pattern of traffic flow; b) is it possible touse ride-hailing data to develop road speed profile,to compliment one develop through eBUM; c) isis plausible to harness ride-hailing data to predicttraffic congestion. A more ambitious question butnevertheless important is can ride-hailing data beused to quantify the population exposure to airpollution. There were precedents to track the pol-lution exposure using mobile-device-based mobilitypatterns to consider spatial and temporal aspects ofthe exposure estimates.

The next section explains the the research inthis following order: a) Macroscopic Traffic FlowModelling; b) Road Speed Profiling; c) Traffic Con-gestion Nowcasting; and d) Quantifying PopultionExposure to Air Pollution. This section describe themethod employ by each research and key results.

As initial attempt, the research shows potentialof using ride-hailing data to compliment exisingtransportation model used by OTP. However, thisresult has to be taken carefully because the differentdegree of potential and the extend in which the

experiment able to answer the research questions.Therefore, in conclusion section, we wrote ourreflections and recommendation for further works.

8

Page 11: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

Data for Mobility

The Office of Transport and Traffic Policy andPlanning (OTP) developed and used a trip basedtransportation model called Extended Bangkok Ur-ban Model (eBUM). This is the most completeand up-to-date transportation model developed inBangkok. It is a tool to analyse and forecast trans-port situations as a result of changes in transport net-work in the study area. The model is also used to testtraffic management measures proposed by respon-sible agencies. An eBUM consists of fundamentaldata which is updated through an expensive surveyand this is one important issue in using eBUM.Another challenge is in the model calibration andvalidation process.

We see opportunity to use this experiment toexplore ways that potentially could tackle somechallenges eBum is currently facing by asking thefollowing question: a) can ride-hailing data be usedto validate pattern of traffic flow; b) is it possible touse ride-hailing data to develop road speed profile,to compliment one develop through eBUM; c) isis plausible to harness ride-hailing data to predicttraffic congestion. A more ambitious question butnevertheless important is can ride-hailing data beused to quantify the population exposure to airpollution. There were precedents to track the pol-lution exposure using mobile-device-based mobilitypatterns to consider spatial and temporal aspects ofthe exposure estimates.

The next section explains the the research inthis following order: a) Macroscopic Traffic FlowModelling; b) Road Speed Profiling; c) Traffic Con-gestion Nowcasting; and d) Quantifying PopultionExposure to Air Pollution. This section describe themethod employ by each research and key results.

As initial attempt, the research shows potentialof using ride-hailing data to compliment exisingtransportation model used by OTP. However, thisresult has to be taken carefully because the differentdegree of potential and the extend in which the

experiment able to answer the research questions.Therefore, in conclusion section, we wrote ourreflections and recommendation for further works.

8

Macroscopic Traffic Flow Modelling

1 SUMMARY

Macroscopic traffic flows are one of the ba-sic insights needed for transportation planning andmodelling. It allows planners and modelers to im-portant properties related to traffic flow includinghow they are generated. Macroscopic traffic flowsare useful for short-term forecasting providing theability to understand and calculate, amongst others,average travel times, mean fuel consumption, etc.Traditional methods for developing macroscopictraffic flows depend on data collected through fieldsurveys and observations, which are time consumingto conduct, but also have limited spatio-temporalcoverage, and can be fairly subjective. Commercialproducts for getting those information, includingGoogle Maps and Waze are popular and widely usedbut considered costly for city wide analysis. A tripdistribution model is an alternative approach usingtransportation theory as a proxy to infer the trafficflows. The model promises lower cost, the needfor fewer data, and huge spatio-temporal coveragebut suffers from localization constraints, which needcity-specific calibration. This research explore thepossibility of using Grab data to assess the per-formance of four trip distribution models to proxytraffic flows in Bangkok. These four models include:i) a gravity model with exponential decay, ii) agravity model with power law distance decay, iii) aradiation model, and iv) a radiation-extended model.Results are calibrated with actual traffic activityinferred from Grab data to prove the feasibility.The radiation model shows better correlation score(ρ = 0.5) with inferred actual traffic flows ascompared to the others.

2 DATA AND FEATURES

1) Population count by districts popotp, fromBangkok’s Office of Transport Planning andWorldpop

2) Administrative boundary, from Bangkok’s Of-fice of Transport Planning

3) August,December 2018 and March-April2019 Trajectory data, from Grab

3 METHODS

The purpose of the trip distribution models is tosplit the total number of trips N in order to generatea trip table T of the estimated number of trips fromeach census area to every other. For simplicity, forthis preliminary work we restrict our analyses totrips from home to work, without also consideringthe return journey. N is equivalent to the numberof unique commuters. The trip distribution dependsboth on the characteristics of the census units andthe way they are spatially distributed, as well ason the level of constraints required by the model.Therefore, to fairly compare different trip distri-bution modeling approaches we have to considerseparately the law used to calculate the probabilityto observe a trip between two census units, calledtrip distribution law, and the trip distribution modelused to generate the trip allocations from this law.The four trip distribution models tested [4] are thefollowing:

3.1 Gravity lawsIn the simplest form of the gravity model based

approach, the probability of commuting betweentwo units i and j is proportional to the product of theorigin population mi and destination population mj ,and inversely proportional to the travel cost betweenthe two units:

pij = mimjf(dij), i = j

The travel cost between i and j is modeled withan exponential distance decay function,

f(dij) = e−βdij

and a power distance decay function,

9

Page 12: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

f(dij) = d−βij

Barthelemy et.al [5] found that the form of thedistance decay function can change according basedon the dataset. Therefore, both the exponential andthe power forms are considered in this study. Inboth cases, the importance of the distance in com-muting choices is adjusted with a parameter β withobserved data.

3.2 Radiation lawsIn the intervening opportunity approach, the prob-

ability of commuting between two units i and jis proportional to the origin population mi and tothe conditional probability that a commuter livingin unit i with population mi is attracted to unit jwith population mj , given that there are sij job op-portunities in between. The conditional probabilityP(1|mi,mj, sij) needs to be normalized to ensurethat all the trips end in the region of interest.

pij = miP(1|mi,mj, sij)∑nk=1 P(1|mi,mk, sik)

, i = j

Simini et al. [6] proposed diffusion model whereeach individual living in an unit i has a certainprobability of being absorbed by another unit jaccording to the spatial distribution of opportunities,called radiation model. The original radiation modelis free of parameters and, therefore, it does notrequire calibration. The conditional probability isexpressed as:

P(1|mi,mj, sij) =mimj

(mi + sij)(mi +mj + sij)

An extended radiation model has been proposedby Yang et al. [4]. In this extended version, the prob-ability P(1|mi,mj, sij) is derived under the survivalanalysis framework introducing a parameter α tocontrol the effect of the number of job opportunitiesbetween the source and the destination on the jobselection,

P(1|mi,mj , sij) =[(mi +mj + sij)

α − (mi + sij)α] (mα

i + 1)

[(mi + sij)α + 1] [(mi +mj + sij)α + 1]

After the description of the probabilistic laws, thenext step is to materialize the people commuting.

The purpose is to generate the commuting networkT by drawing at random N trips from the tripdistribution law pij In the absence of empirical com-muting data, we consider only the unconstrainedapproach to randomly sample the N trips from themultinomial distribution,

M(N, (pij)1≤i,j≤n

)

4 RESULTS

Table I and Figure 1 show the four trip distri-bution models along with the results extracted fromGrab data. Both Gravity Models show dominance ofthe bottom left and top right regions. The four high-est traffic flow districts based on Exponential Spa-tial are Sathon, Thon Buri, and Wang Thonglang.The Power Law variant’s four highest traffic flowdistricts are Sampanthawong, Sai Mai, and PomPrap districts. Pearson correlation is calculated foreach to measure the similarity with inferred actualtraffic flows. Both gravity models achieve relativelylow scores with gravity exponential model havingρ = 0.4 and gravity power law having ρ = 0.1,which indicates dissimilarity. The low scores areexplained as only a handful of top-10 districts onboth gravity model are found in the top-10 ofinferred actual traffic flows. Both gravity modelsalso failed to capture high traffic flows in Bangkok’scentral regions.

Different to the Gravity Model, the two Ra-diation Models show relatively even traffic flowdistributions in all regions. The dominant regionsare the following: Ratchathewi, Thon Buri, andWang Thonglang for Standard Radiation Modeland Wang Thonglang, Thon Buri, Khlong Toei forthe Radiation-Extended Model. Standard RadiationModel gives ρ = 0.5 correlation scores(which ishighest score compared with other models), andthe extended variant only achieved ρ = 0.2 score.Standard Radiation Model’s high scores suggest itis best suited to identify high traffic flow in centralBangkok and important traffic hubs.

This promising result should be treated cau-tiously. All models show large deviation with actualtraffic flow (adj.r2 = 0.25) inferred from Grab datawhich indicates noticeable numerical gap.

10

Page 13: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

f(dij) = d−βij

Barthelemy et.al [5] found that the form of thedistance decay function can change according basedon the dataset. Therefore, both the exponential andthe power forms are considered in this study. Inboth cases, the importance of the distance in com-muting choices is adjusted with a parameter β withobserved data.

3.2 Radiation lawsIn the intervening opportunity approach, the prob-

ability of commuting between two units i and jis proportional to the origin population mi and tothe conditional probability that a commuter livingin unit i with population mi is attracted to unit jwith population mj , given that there are sij job op-portunities in between. The conditional probabilityP(1|mi,mj, sij) needs to be normalized to ensurethat all the trips end in the region of interest.

pij = miP(1|mi,mj, sij)∑nk=1 P(1|mi,mk, sik)

, i = j

Simini et al. [6] proposed diffusion model whereeach individual living in an unit i has a certainprobability of being absorbed by another unit jaccording to the spatial distribution of opportunities,called radiation model. The original radiation modelis free of parameters and, therefore, it does notrequire calibration. The conditional probability isexpressed as:

P(1|mi,mj, sij) =mimj

(mi + sij)(mi +mj + sij)

An extended radiation model has been proposedby Yang et al. [4]. In this extended version, the prob-ability P(1|mi,mj, sij) is derived under the survivalanalysis framework introducing a parameter α tocontrol the effect of the number of job opportunitiesbetween the source and the destination on the jobselection,

P(1|mi,mj , sij) =[(mi +mj + sij)

α − (mi + sij)α] (mα

i + 1)

[(mi + sij)α + 1] [(mi +mj + sij)α + 1]

After the description of the probabilistic laws, thenext step is to materialize the people commuting.

The purpose is to generate the commuting networkT by drawing at random N trips from the tripdistribution law pij In the absence of empirical com-muting data, we consider only the unconstrainedapproach to randomly sample the N trips from themultinomial distribution,

M(N, (pij)1≤i,j≤n

)

4 RESULTS

Table I and Figure 1 show the four trip distri-bution models along with the results extracted fromGrab data. Both Gravity Models show dominance ofthe bottom left and top right regions. The four high-est traffic flow districts based on Exponential Spa-tial are Sathon, Thon Buri, and Wang Thonglang.The Power Law variant’s four highest traffic flowdistricts are Sampanthawong, Sai Mai, and PomPrap districts. Pearson correlation is calculated foreach to measure the similarity with inferred actualtraffic flows. Both gravity models achieve relativelylow scores with gravity exponential model havingρ = 0.4 and gravity power law having ρ = 0.1,which indicates dissimilarity. The low scores areexplained as only a handful of top-10 districts onboth gravity model are found in the top-10 ofinferred actual traffic flows. Both gravity modelsalso failed to capture high traffic flows in Bangkok’scentral regions.

Different to the Gravity Model, the two Ra-diation Models show relatively even traffic flowdistributions in all regions. The dominant regionsare the following: Ratchathewi, Thon Buri, andWang Thonglang for Standard Radiation Modeland Wang Thonglang, Thon Buri, Khlong Toei forthe Radiation-Extended Model. Standard RadiationModel gives ρ = 0.5 correlation scores(which ishighest score compared with other models), andthe extended variant only achieved ρ = 0.2 score.Standard Radiation Model’s high scores suggest itis best suited to identify high traffic flow in centralBangkok and important traffic hubs.

This promising result should be treated cau-tiously. All models show large deviation with actualtraffic flow (adj.r2 = 0.25) inferred from Grab datawhich indicates noticeable numerical gap.

10

TABLE I: Top-10 Highest Traffic Flows by Districts

No

Inferred from Grab Gravity - Exponential Gravity - Power Radiation Radiation - Extended

District % flows District % flows District % flows District % flows District % flows

1 Chatuchak 6.2 Sathon 3.2 Sampanthawong 3.3 Ratchathewi 3.7 Wang Thonglang 3.12 Ratchathewi 6.1 Thon Buri 3.2 Sai Mai 2.7 Thon Buri 3.7 Thon Buri 2.93 Pathum Wan 5.8 Wang Thonglang 3 Pom Prap S.P. 2.6 Wang Thonglang 3.7 Khlong Toei 2.84 Khlong Toei 5.6 Suan Luang 2.9 Lat Phrao 2.6 Sathon 3.6 Sathon 2.85 Huai Khwang 4.9 Khlong Toei 2.9 Don Mueang 2.5 Phaya Thai 3.2 Suan Luang 2.56 Vadhana 4.8 Taling Chan 2.8 Taling Chan 2.4 Din Daeng 3.1 Rat Burana 2.57 Bang Rak 3.9 Yan Nawa 2.7 Phra Nakhon 2.4 Pom Prap S.P. 3 Sai Mai 2.58 Din Daeng 3.6 Ratchathewi 2.6 Chom Thong 2.4 Vadhana 3 Saphan Sung 2.59 Bang Kapi 3.4 Phasi Charoen 2.6 Khan Na Yao 2.4 Khlong Toei 3 Yan Nawa 2.5

10 Suan Luang 3 Phra Nakhon 2.5 Chatuchak 2.3 Suan Luang 2.9 Ratchathewi 2.4

(a) Gravity Model - Exponential Decay (b) Gravity Model - Power Law Decay

(c) Radiation Model (d) Radiation Extended Model

Fig. 1: Comparison of generated traffic flows in Bangkok

11

Page 14: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

Road Speed Profiling

1 SUMMARY

Road-speed profiling refers to the process ofcomputing the expected, median, upper-bound andlower-bound (within a confidence interval) speedsof road segments in a city at a given time of theday. [7] Road speed profiling is used to measurepoint-to-point travel time estimation, build advancedtraffic information and management system thatcould eventually help ease congestion given theconstraints in term of building new infrastructure.Aggregating road speed is challenging task becausevehicle speeds are inherently stochastic. Tradition-ally, road-speed profile is calculated based on fieldsurvey and observation data, with limited spatio-temporal coverage. This research explores ways toimprove road-speed profile accuracy by developingroad-speed profile for 9 major roads in Bangkokin 15 minutes intervals in a day using one monthride-hailing data provided by Grab. The road profileshows promising result that reflect mobility patternbetween normal days and major event day.

2 DATA AND FEATURES

1) August, December 2018 and March-April2019 Trajectory Data

• Location• Timestamp• Speed

2) Digitized Road Network• Road link/segment• Road length• Road type

3 METHODS

The data used represents real speed measure-ments from smartphone sensors from the period ofSeptember 2018 to December 2018 and March toApril 2019. This data provide accurate and represen-tative picture of the actual driving conditions acrossthe Bangkok road network as shown in Figure 1.

A Trajectory of a moving grab driver is a se-quence of discrete points in geographical space, ar-ticulated as geo-locations with corresponding times-tamps and speed profile, i.e.,

Trajectory = < p1, s1, t1 >,< p2, s2, t2 >, ..., < pn, sn, tn >,

where each element < pi, si, ti > indicates adriver is at location p1 at timestamp t1 with speeds1 m/s. Furthermore, elements are sorted by times-tamps, i.e., tj < tk if 1 ≤ j < k.

The spatial reference for this research is the dig-itized road network provided by Bangkok’s Officeof Transport Planning. The road network consistsof road areas, a smaller segment called road link,and also additional information, such as length androad type. For every road link there will be lanes,which we classified as inbound and outbound. Sincethe trajectory data doesn’t have specific informationto indicate which lane is being utilized, we inferredlane directions are inferred using heading informa-tion and k-means clustering algorithms. Prior to thefocus area for this study, only primary and importantroads in Bangkok are chosen.

3.1 Preprocessing

GPS satellites broadcast their signals in spacewith a certain accuracy. This depends on additionalfactors, including satellite geometry, signal block-age, atmospheric conditions, and receiver designfeatures affect the quality of data. GPS-enabledsmartphones are typically accurate to within a 4.9m (16 ft.) radius under open sky [8]. However,their accuracy worsens near buildings, bridges, andtrees. The GPS data used for this research is mostlytaken on open roads with a few exceptions such asunderpass roads. Ten roads were selected: PhahonYothin, Ratchadaphisek, Sathon, Sukhumvit, Rama4, Phetchaburi, Ramkhamhaeng, Lat Phrao, and

12

Page 15: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

Road Speed Profiling

1 SUMMARY

Road-speed profiling refers to the process ofcomputing the expected, median, upper-bound andlower-bound (within a confidence interval) speedsof road segments in a city at a given time of theday. [7] Road speed profiling is used to measurepoint-to-point travel time estimation, build advancedtraffic information and management system thatcould eventually help ease congestion given theconstraints in term of building new infrastructure.Aggregating road speed is challenging task becausevehicle speeds are inherently stochastic. Tradition-ally, road-speed profile is calculated based on fieldsurvey and observation data, with limited spatio-temporal coverage. This research explores ways toimprove road-speed profile accuracy by developingroad-speed profile for 9 major roads in Bangkokin 15 minutes intervals in a day using one monthride-hailing data provided by Grab. The road profileshows promising result that reflect mobility patternbetween normal days and major event day.

2 DATA AND FEATURES

1) August, December 2018 and March-April2019 Trajectory Data

• Location• Timestamp• Speed

2) Digitized Road Network• Road link/segment• Road length• Road type

3 METHODS

The data used represents real speed measure-ments from smartphone sensors from the period ofSeptember 2018 to December 2018 and March toApril 2019. This data provide accurate and represen-tative picture of the actual driving conditions acrossthe Bangkok road network as shown in Figure 1.

A Trajectory of a moving grab driver is a se-quence of discrete points in geographical space, ar-ticulated as geo-locations with corresponding times-tamps and speed profile, i.e.,

Trajectory = < p1, s1, t1 >,< p2, s2, t2 >, ..., < pn, sn, tn >,

where each element < pi, si, ti > indicates adriver is at location p1 at timestamp t1 with speeds1 m/s. Furthermore, elements are sorted by times-tamps, i.e., tj < tk if 1 ≤ j < k.

The spatial reference for this research is the dig-itized road network provided by Bangkok’s Officeof Transport Planning. The road network consistsof road areas, a smaller segment called road link,and also additional information, such as length androad type. For every road link there will be lanes,which we classified as inbound and outbound. Sincethe trajectory data doesn’t have specific informationto indicate which lane is being utilized, we inferredlane directions are inferred using heading informa-tion and k-means clustering algorithms. Prior to thefocus area for this study, only primary and importantroads in Bangkok are chosen.

3.1 Preprocessing

GPS satellites broadcast their signals in spacewith a certain accuracy. This depends on additionalfactors, including satellite geometry, signal block-age, atmospheric conditions, and receiver designfeatures affect the quality of data. GPS-enabledsmartphones are typically accurate to within a 4.9m (16 ft.) radius under open sky [8]. However,their accuracy worsens near buildings, bridges, andtrees. The GPS data used for this research is mostlytaken on open roads with a few exceptions such asunderpass roads. Ten roads were selected: PhahonYothin, Ratchadaphisek, Sathon, Sukhumvit, Rama4, Phetchaburi, Ramkhamhaeng, Lat Phrao, and

12

(a) Sample of Grab’s Trajectory Data

(b) Road sample

Fig. 1: Visualization of Data used

Chaeng Watthana. In order to reduces the uncer-tainty, a 50m buffer was chosen. Few speed obser-vation outliers were found (e.g. some had speeds ofabout 1000 km/h), which are then removed usingthe Interquartile range (IQR) method.

3.2 Speed Profiling Calculation

The average speed on a road link can be calcu-lated as the moving average of probe speeds over aperiod p:

vi,p =1

p

i+p∑j=1

vj

where p is the duration f the time period p, i isthe start of the period, and vj is the average probespeed in time slot j. [9] Based on consultation wechoose 15 minutes time period in this study.

4 RESULTS

Figure 2 shows the speed graph during weekday,weekend and Songkran Festival at different lanedirection relative to city center (inbound: going to,outbound: leaving). Temporal patterns during week-end and weekday are similar, these are normal days.Moving out from city center is faster with highestaverage speed at 3 am to 4. The pattern changesduring Songkran Festival. The speed to move fromthe city center to outside is slower compare tonormal days, indicates more people leave the city,with highest speed at 5 am to 7 am. The differencebetween weekday and weekend is in lowest speedtime, from 7 am to 7 pm during weekday and 4pm to 7 pm during weekend. There is no significantvariation regarding spatial patterns, Rama 4 is wherethe speed is the lowest and the Ratchadaphisek isroad with highest speed profile.

13

Page 16: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

(a) Weekday Inbound (b) Weekday Outbound

(c) Saturday Inbound (d) Saturday Outbound

(e) Sunday Inbound (f) Sunday Outbound

(g) Songkran Inbound (h) Songkran Outbound

Fig. 2: Speed graph comparison during weekday, saturday, sunday, and songkran festival

14

Page 17: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

(a) Weekday Inbound (b) Weekday Outbound

(c) Saturday Inbound (d) Saturday Outbound

(e) Sunday Inbound (f) Sunday Outbound

(g) Songkran Inbound (h) Songkran Outbound

Fig. 2: Speed graph comparison during weekday, saturday, sunday, and songkran festival

14

Traffic Congestion Nowcasting

1 SUMMARY

Traffic congestion is on of the most importantproblem domains for transportation planning andmanagement. The causes can be complex and some-time random. The prediction of urban traffic con-gestion has emerged as one of the most importantresearch topics of advanced transportation system.In the field of traffic flow forecasting, a number ofmodels and methodologies have been put forwardfor the improvement of the existing model. Thisresearch aim to nowcast traffic congestion usingthe following predictors: historical congestion, road-speed profile, population count, district characteris-tics and time seasonality. The model able to cap-ture hourly seasonality and daily seasonality withlimitation during heavy congestion. In addition, thisresearch explore the possibility of using the result toimprove Extended Bangkok Urban Model (eBUM)by comparing the result of both models. It showsmixed results among different roads which indicatethe need to conduct further research that use digitaldata for both models.

2 DATA AND FEATURES

1) August, December 2018 and March-April2019 Trajectory Data

• Location• Timestamp• Speed

2) Digitized Road Network• Road link/segment• Road length• Road type

3) City Landmark (as proxy to land use)• Residential density• Commerce density• Office density• Industrial density• Tourism density

4) Bangkok Administrative Boundary• Districts area

5) Population• Population density by districts

3 METHODS

3.1 Preprocessing and Feature extraction

The analysis is conducted using processed datafrom road-speed profile describe in previous section.Density of population and city landmark calculatedby summing all units (e.x. population, point ofinterests, or office buildings) within district divideby the district’s area size.

density =

∑units

areadistrict

3.2 Congestion level calculation

In order to calculate congestion level, free flowspeed has to be determined. Free flow speed is con-sidered as the maximum measured average speedover the period:

vfree = maxj∈[t0,t]

vj

The congestion level is defined by ratio of theaverage speed during the most congested period pwith respect to the maximum speed:

cp =vp

vfree

3.3 Model development and validation

For a given data set with n observations, con-sisting of congestion level cp with district and roadcharacteristics m at time window period p, formu-lated by

D = (xijt, cp)(|D| = n, xijt ∈ Mi,Mj ,Et , cp ∈ )M =< pop, landuse, roadprofile, districtsarea >

15

Page 18: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

This is follow by development of a tree ensemblemodel [10] in the form of K additive functions,which describe relations between congestion levelwith district characteristics, time, and the externalfactor,

cp = φ(xi) =k∑

k=1

fk(xi), fk ∈ F

Here q represents the structure of each tree thatmaps an example to the corresponding leaf index.T is the number of leaves in the tree. Each fkcorresponds to an independent tree structure q andleaf weights w.

F = f(x) = wq(x)(q : Mi,Mj ,Et → T,w ∈ T )

To find the best model, κ additive function andΩ model complexity is calculated to minimize theobjective function,

L(φ) =∑ijt

l(cp, cp) +∑κ

Ω(fk)

Convex loss function l measures the differencebetween congestion level prediction cp and actualcongestion level cp.

In order to measure model performances,10-foldcross validation approach used for splitting the dataset into training and testing.

3.4 Comparison with eBum model

The last stage is to compare congestion levelinferred from nowcast model with congestion levelfrom eBum in 2017 as shown in Figure 3a. Un-fortunately, congestion level from eBum model isnot available in digital version and therefore thiscomparison is performed manually, with on-the-field human observation. The comparison is doneby using snapshot of congestion level from bothmodels, focusing on area shown in eBum modeland mirroring the spatial resolution of eBum modelsnapshot. The comparison is conducted by selectedBangkok’s residents who conducted on-the-fieldcongestion observation indicated by different colourcode of both model output, and assess the colorsimilarity range from 1 (totally different) to 10(identical).

This comparison is conducted in limited area.This area is selected based on spatial resolutionof eBum’s model. Six districts are selected, theyare Phaya Thai, Dusit, Bang Plat, Ratchathewi,Pathumwan, and Bangkok Noi. Selected roadare Phahon Yothin, Ratchadaphisek, Sathon,Sukhumvit, Rama 4, Phetchaburi, Ramkhamhaeng,Lat Phrao, and Ram Inthra.

4 RESULTS

A set of predictor features ranging from historicalcongestion, road profile, population count, districtcharacteristics and time seasonality is used to informthe development of nowcast congestion model. Thisresearch develops model to predict congestion levelduring week of 1st to 7th April 2019. Current bestmodel could nowcast congestion with rmse = 0.1,r = 0.7, and r2 = 0.5. In general, the model ableto capture hourly seasonality (eg. able to predictcongestion during peak hour) and daily seasonality(ex. able to predict less congestion during weekend)as shown in Figure 1. Figure 2 shows kernel densityestimation plot of actual and predicted congestion.It shows similar proportion for congestion below0.6 but not for the above levels which tends tounderestimate heavy congestion (0.6 ≤ cp ≤ 0.9)and overestimate the severe congestion (cp = 1).

Table Ia and Ib show similarity scores gatheredfrom respondents. At the district groups, most re-spondents indicated agreement with the idea thatPhaya Thai, Bang Plat, and Pathumwan have goodsimilarity in both model (Median = 5, 6.IQR =0, 1). Opinion seems to be divided in the restof the districts as equal number of respondents ex-pressed strong disagreement or disagreement whichindicates no consensus (IQR ≥ 2). Looking at theroad groups, consensus reached at the several roads(IQR = 0, 1), some of them have relatively highsimilarity scores Median = 6 like Ramkamhaeng,Phetchaburi, and Rama 4. On the other hand, someroads are perceived different Median = 4 likeRatchadaphisek and Lat Phrao.

16

Page 19: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

This is follow by development of a tree ensemblemodel [10] in the form of K additive functions,which describe relations between congestion levelwith district characteristics, time, and the externalfactor,

cp = φ(xi) =k∑

k=1

fk(xi), fk ∈ F

Here q represents the structure of each tree thatmaps an example to the corresponding leaf index.T is the number of leaves in the tree. Each fkcorresponds to an independent tree structure q andleaf weights w.

F = f(x) = wq(x)(q : Mi,Mj ,Et → T,w ∈ T )

To find the best model, κ additive function andΩ model complexity is calculated to minimize theobjective function,

L(φ) =∑ijt

l(cp, cp) +∑κ

Ω(fk)

Convex loss function l measures the differencebetween congestion level prediction cp and actualcongestion level cp.

In order to measure model performances,10-foldcross validation approach used for splitting the dataset into training and testing.

3.4 Comparison with eBum model

The last stage is to compare congestion levelinferred from nowcast model with congestion levelfrom eBum in 2017 as shown in Figure 3a. Un-fortunately, congestion level from eBum model isnot available in digital version and therefore thiscomparison is performed manually, with on-the-field human observation. The comparison is doneby using snapshot of congestion level from bothmodels, focusing on area shown in eBum modeland mirroring the spatial resolution of eBum modelsnapshot. The comparison is conducted by selectedBangkok’s residents who conducted on-the-fieldcongestion observation indicated by different colourcode of both model output, and assess the colorsimilarity range from 1 (totally different) to 10(identical).

This comparison is conducted in limited area.This area is selected based on spatial resolutionof eBum’s model. Six districts are selected, theyare Phaya Thai, Dusit, Bang Plat, Ratchathewi,Pathumwan, and Bangkok Noi. Selected roadare Phahon Yothin, Ratchadaphisek, Sathon,Sukhumvit, Rama 4, Phetchaburi, Ramkhamhaeng,Lat Phrao, and Ram Inthra.

4 RESULTS

A set of predictor features ranging from historicalcongestion, road profile, population count, districtcharacteristics and time seasonality is used to informthe development of nowcast congestion model. Thisresearch develops model to predict congestion levelduring week of 1st to 7th April 2019. Current bestmodel could nowcast congestion with rmse = 0.1,r = 0.7, and r2 = 0.5. In general, the model ableto capture hourly seasonality (eg. able to predictcongestion during peak hour) and daily seasonality(ex. able to predict less congestion during weekend)as shown in Figure 1. Figure 2 shows kernel densityestimation plot of actual and predicted congestion.It shows similar proportion for congestion below0.6 but not for the above levels which tends tounderestimate heavy congestion (0.6 ≤ cp ≤ 0.9)and overestimate the severe congestion (cp = 1).

Table Ia and Ib show similarity scores gatheredfrom respondents. At the district groups, most re-spondents indicated agreement with the idea thatPhaya Thai, Bang Plat, and Pathumwan have goodsimilarity in both model (Median = 5, 6.IQR =0, 1). Opinion seems to be divided in the restof the districts as equal number of respondents ex-pressed strong disagreement or disagreement whichindicates no consensus (IQR ≥ 2). Looking at theroad groups, consensus reached at the several roads(IQR = 0, 1), some of them have relatively highsimilarity scores Median = 6 like Ramkamhaeng,Phetchaburi, and Rama 4. On the other hand, someroads are perceived different Median = 4 likeRatchadaphisek and Lat Phrao.

16

(a) Hourly seasonalityseasonality.png

(b) Daily seasonality

Fig. 1: Congestion nowcasting results

17

Page 20: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

Fig. 2: Kernel density plot of Actual and Predicted Congestion

No District Name Similarity Score (1-10) Median IQR#1 #2 #3 #4 #5

1 Phaya Thai 5 6 6 10 6 6 02 Dusit 7 4 5 10 6 5 23 Bang Plat 6 5 6 10 7 6 14 Ratchathewi 3 6 7 1 4 3 35 Pathumwan 6 5 6 1 5 5 16 Bangkok Noi 4 4 6 1 6 4 2

(a) By Districts

No Road Name Similarity Score (1-10) Median IQR#1 #2 #3 #4 #5

1 Phahon Yothin 4 6 7 7 5 5 22 Ratchadaphisek 4 4 5 5 5 4 13 Sathon 4 3 6 5 6 4 24 Sukhumvit 7 4 5 5 7 5 25 Rama 4 8 6 7 6 7 6 16 Phetchaburi 7 7 6 6 6 6 17 Ramkamhaeng 2 6 6 6 6 6 08 Lat Phrao 4 3 5 5 6 4 19 Ram Inthra 2 6 6 6 5 5 1

(b) By Roads

TABLE I: Respondent scores

18

Page 21: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

Fig. 2: Kernel density plot of Actual and Predicted Congestion

No District Name Similarity Score (1-10) Median IQR#1 #2 #3 #4 #5

1 Phaya Thai 5 6 6 10 6 6 02 Dusit 7 4 5 10 6 5 23 Bang Plat 6 5 6 10 7 6 14 Ratchathewi 3 6 7 1 4 3 35 Pathumwan 6 5 6 1 5 5 16 Bangkok Noi 4 4 6 1 6 4 2

(a) By Districts

No Road Name Similarity Score (1-10) Median IQR#1 #2 #3 #4 #5

1 Phahon Yothin 4 6 7 7 5 5 22 Ratchadaphisek 4 4 5 5 5 4 13 Sathon 4 3 6 5 6 4 24 Sukhumvit 7 4 5 5 7 5 25 Rama 4 8 6 7 6 7 6 16 Phetchaburi 7 7 6 6 6 6 17 Ramkamhaeng 2 6 6 6 6 6 08 Lat Phrao 4 3 5 5 6 4 19 Ram Inthra 2 6 6 6 5 5 1

(b) By Roads

TABLE I: Respondent scores

18

(a) Ebum model

(b) Congestion model based Grab data

Fig. 3: Comparison of Congestion model

19

Page 22: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

Quantifying Population Exposure to Air Pollution

1 SUMMARY

Accurate estimates of human exposure to inhaledair pollutants are necessary for a realistic appraisalof the risks these pollutants pose and for the designand implementation of strategies to control and limitthose risks. This estimation, except in occupationalsetting, are usually based on measurements of pol-lutants concentrations in outside air, recorded withoutdoor-fixed site monitors. From public health per-spective, it is important to determine the populationexposure - the aggregate exposure for a specifiedgroup of people. This research aim to exploreways of harnessing four months observation datato quantifying population exposure to air pollution.Using Land Use Regression (LUR), this researchinfer daily air quality in Bangkok at 1kmx1km level.Bangkok Metropolitan has 13 official ground airquality sensors, although limited in number andcoverage, data from these sensors are sufficientfor preliminary validation. The best model showsr2 = 0.6 inference performance.

2 DATA AND FEATURES

1) Air Quality Level AQI2) Traffic Congestion (reference: Traffic Conges-

tion Nowcasting)3) Digitized Road Network4) Aerosol Optical Depth (AOD) at 047 and 055

micron aod047, aod0555) Open spaces

• Enhanced Vegetation Indeces evi• Normalized Difference Vegetation Inde-

ces ndvi

6) Digital Elevation Model dem7) Air temperature temp8) Population density pop

3 METHODS

3.1 Preprocessing and Feature extractionBased on Bangkok Metropolitan administrative

boundary, a reference grid cells of 1km x 1kmare created. For each predictors (congestion, mainroad, AOD, open spaces, etc), a spatial and tem-poral aggregation performed by calculate predictorobservation j as many n inside grid i at the day d,

predictorsi,d =1

n

n∑j=1

predictorsj

predictors =< congestion,mainroad, aod047,

aod055, evi, ndvi, dem, temp, pop >

As the first step, AQI levels data are investigatedto assess quality of its measurements. Correlationanalysis are conducted for each ground sensor sites.Ground sensor site with low correlation score (r ≤0.2) were removed from next iteration. Land useregression with 3 steps explained below performed[11; 12].

3.2 Developing model to infers AQI from predictorsModel to infers AQI level relationship with avail-

able predictors (referred as type 1) and additionaldata are developed by creating training data setwith n observations, consisting of AQI level AQIijand type 1 predictors at a grid cell i on a day d,formulated by

D = (xid, aqiid)(|D| = n, xid ∈ predictorstype1

, aqiid ∈ )predictorstype1 =< congestion,mainroad,

aod047, aod055, evi, ndvi, dem, temp, pop >

A tree ensemble model [10] developed in theform of K additive functions, which describe re-lations between AQI level with predictors,

aqiid = φ(predictorstype1id ) =

k∑k=1

fk(predictorstype1id ), fk ∈ F

20

Page 23: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

Quantifying Population Exposure to Air Pollution

1 SUMMARY

Accurate estimates of human exposure to inhaledair pollutants are necessary for a realistic appraisalof the risks these pollutants pose and for the designand implementation of strategies to control and limitthose risks. This estimation, except in occupationalsetting, are usually based on measurements of pol-lutants concentrations in outside air, recorded withoutdoor-fixed site monitors. From public health per-spective, it is important to determine the populationexposure - the aggregate exposure for a specifiedgroup of people. This research aim to exploreways of harnessing four months observation datato quantifying population exposure to air pollution.Using Land Use Regression (LUR), this researchinfer daily air quality in Bangkok at 1kmx1km level.Bangkok Metropolitan has 13 official ground airquality sensors, although limited in number andcoverage, data from these sensors are sufficientfor preliminary validation. The best model showsr2 = 0.6 inference performance.

2 DATA AND FEATURES

1) Air Quality Level AQI2) Traffic Congestion (reference: Traffic Conges-

tion Nowcasting)3) Digitized Road Network4) Aerosol Optical Depth (AOD) at 047 and 055

micron aod047, aod0555) Open spaces

• Enhanced Vegetation Indeces evi• Normalized Difference Vegetation Inde-

ces ndvi

6) Digital Elevation Model dem7) Air temperature temp8) Population density pop

3 METHODS

3.1 Preprocessing and Feature extractionBased on Bangkok Metropolitan administrative

boundary, a reference grid cells of 1km x 1kmare created. For each predictors (congestion, mainroad, AOD, open spaces, etc), a spatial and tem-poral aggregation performed by calculate predictorobservation j as many n inside grid i at the day d,

predictorsi,d =1

n

n∑j=1

predictorsj

predictors =< congestion,mainroad, aod047,

aod055, evi, ndvi, dem, temp, pop >

As the first step, AQI levels data are investigatedto assess quality of its measurements. Correlationanalysis are conducted for each ground sensor sites.Ground sensor site with low correlation score (r ≤0.2) were removed from next iteration. Land useregression with 3 steps explained below performed[11; 12].

3.2 Developing model to infers AQI from predictorsModel to infers AQI level relationship with avail-

able predictors (referred as type 1) and additionaldata are developed by creating training data setwith n observations, consisting of AQI level AQIijand type 1 predictors at a grid cell i on a day d,formulated by

D = (xid, aqiid)(|D| = n, xid ∈ predictorstype1

, aqiid ∈ )predictorstype1 =< congestion,mainroad,

aod047, aod055, evi, ndvi, dem, temp, pop >

A tree ensemble model [10] developed in theform of K additive functions, which describe re-lations between AQI level with predictors,

aqiid = φ(predictorstype1id ) =

k∑k=1

fk(predictorstype1id ), fk ∈ F

20

To find an optimal model, a combination ofhyperparameter tuning, feature selection, and a strat-ified 10-fold cross validation approach performed.

3.3 AQI level prediction using developed model andpredictors data

Missing AQI level aqi′ in grid cell i at day d withavailable predictors measurements (referred as type2) predicted by using developed model,

aqi′id = φ(predictorstype2id )

3.4 AQI level inference where predictors data in-complete

Missing AQI level in grid cell where predictorsdata not complete for specific day aqi′′ inferred bymeasuring association of grid cells AQI level valueswith AQI level located elsewhere and the associationwith available AQI values in neighboring grid cells.A universal Kriging Model developed with a smoothfunction of latitude and longitude and a randomintercept for each cell.

aqi′′id = (α+ ui) + (β + vi) aqi′d + smooth(long, lat) + εid

where aqid is the mean AQI level across bangkokon a day d. α, ui, βi and vi are the fixed-random in-tercepts and fixed-random slopes, respectively. Thesmooth long, lat is a spline fit to the latitude andlongitude.

To measure the goodness of fit, Leave-one-outcross-validation performed by dropped ’all observa-tions’ of a sensors site from the dataset to becometesting dataset for the stage-3 model. This processwas repeated for each 11 sensors site and r2 valueswere computed.

4 RESULTS

Bangkok Pollution Control Department providedthree months daily aggregate AQI data from officialsensors in Bangkok Metropolitan Area. Figure 3bshows ground sensors location along with dailymeasurements. As shown in Figure 2a there is onesuspect of outlier sensors, the X12T site. It scoreslowest correlation coefficient compared with othersand therefore excluded from the next iteration.

The next step is to divide Bangkok into a gridof 1km x 1km grid follow by development of the

tree ensemble model trained from ground truth andfeatures from satellite and land use related data.From the best model, the location where there isno ground sensors but complete predictors could beinferred. The last step then extrapolation process topredict AQI level where there are missing predic-tors. Figure 2b shows performance of our model, thebest model achieved 0.6 r2 scores and also it canbe seen from Figure 2c that the model can captureswell the overall density distribution of actual AQIbut may a bit overestimate it.

From the aforementioned processes, daily 1kmx 1km AQI level from whole Bangkok could beinferred as seen in Figure 3 and 4. It can beobserved there is difference between different timeseasonality, the highest aqi level are observed duringpeak season (aqi = 74.5), followed by normal(aqi = 54.9) and low season (aqi = 41.8).

Inferred AQI from our model are also able togives complete situational awareness for example bycalculating spatio and temporal aggregation by us-ing districts boundary to create districts AQI statis-tics. As shown in Table I there are different patterncould be captured during different seasons. For thepeak season the top-3 districts which have highestAQI level are Bangkok Noi (aqi = 89), NongKhaem (aqi = 89), and Bangkok Yai (aqi = 88).Different results are captured during normal season(Yan Nawa (aqi = 75), Khlong San (aqi = 74),Sathon (aqi = 71)) and low season (Yan Nawa(aqi = 62), Phra Nakhon (aqi = 61), Ratchathewi(aqi = 61)).

multirow

21

Page 24: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

(a) Air Quality Level in Bangkok during December 2018, March-April 2019

(b) Sensors Locations

Fig. 1: Air Quality Data from Bangkok’s Pollution Control Department

22

Page 25: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

(a) Air Quality Level in Bangkok during December 2018, March-April 2019

(b) Sensors Locations

Fig. 1: Air Quality Data from Bangkok’s Pollution Control Department

22

TABLE I: Districts AQI Statistics

December 2018 March 2019 April 2019District (Peak Season) (Normal Season) (Low Season)

µ(mean) σ(sd) µ(mean) σ(sd) µ(mean) σ(sd)

Bang Bon 80 4 46 4 31 4Bang Kapi 68 7 47 7 33 7Bang Khae 82 5 48 8 35 6Bang Khen 74 10 49 9 34 7

Bang Kho Laem 87 6 68 4 54 4Bang Khun Thian 83 9 52 8 39 9

Bang Na 71 6 49 6 29 7Bang Phlat 65 9 45 6 35 6Bang Rak 67 4 67 3 58 2Bang Sue 58 3 41 5 35 4

Bangkok Noi 89 8 66 6 53 10Bangkok Yai 88 5 63 6 56 10Bueng Kum 69 7 47 6 34 5Chatuchak 63 8 39 7 32 7

Chom Thong 75 5 44 5 29 6Din Daeng 83 6 63 6 50 8

Don Mueang 64 9 43 14 32 8Dusit 77 9 65 7 57 8

Huai Khwang 62 7 47 4 32 4Khan Na Yao 66 4 45 5 29 4

Khlong Sam Wa 74 12 56 8 41 7Khlong San 75 7 74 5 59 3Khlong Toei 84 11 70 3 57 4

Lak Si 67 8 47 7 35 6Lat Krabang 77 7 54 8 34 6

Lat Phrao 68 8 47 5 33 5Min Buri 74 8 51 7 37 7

Nong Chok 77 13 59 7 40 8Nong Khaem 89 5 46 7 31 7Pathum Wan 71 8 69 4 60 5

Phasi Charoen 69 6 41 8 31 7Phaya Thai 67 6 46 8 34 8

Phra Khanong 73 12 52 11 34 8Phra Nakhon 74 2 68 4 62 5

Pom Prap Sattru Phai 70 4 67 1 60 2Prawet 74 8 54 7 34 5

Rat Burana 78 7 53 5 40 5Ratchathewi 68 7 70 6 61 7

Sai Mai 72 10 52 7 33 6Samphanthawong 73 5 65 2 57 1

Saphan Sung 73 8 50 7 34 7Sathon 80 6 71 7 59 5

Suan Luang 69 10 52 7 34 6Taling Chan 71 7 44 9 28 9

Thawi Watthana 80 7 44 6 32 5Thonburi 85 7 66 9 57 6

Thung Khru 87 5 55 7 38 7Vadhana 83 9 64 9 51 10

Wang Thonglang 69 6 49 6 34 6Yan Nawa 87 12 75 3 62 6

23

Page 26: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

(a) Correlation matrix for AQI ground sensor sites (b) Actual vs Predicted AQI level (r2 = 0.6)

(c) Kernel Density plot of Actual vs Predicted AQI

Fig. 2: Preprocessing and Model Performance

24

Page 27: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

(a) Correlation matrix for AQI ground sensor sites (b) Actual vs Predicted AQI level (r2 = 0.6)

(c) Kernel Density plot of Actual vs Predicted AQI

Fig. 2: Preprocessing and Model Performance

24

(a) Peak Season (b) Normal Season

(c) Low Season

Fig. 3: Bangkok’s air quality analysis results

Fig. 4: Bangkok’s air quality analysis results

25

Page 28: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

Reflections and Further Works

1 THE POTENTIAL OF ALTERNATIVE DATA SETS

We explore the possibility of using ride-hailingdata sets as alternative data sets to improve our un-derstanding of travel patterns in Bangkok Metropoli-tan Region. The research shows the potential of ride-hailing data in inferring travel pattern, road-speedprofile and traffic congestion nowcasting. Basedon promising preliminary findings, we conductedcomparison of traffic nowcasting from our modelwith Extended Bangkok Urban Model (eBUM) withmix results. We use traffic congestion nowcasting tomeasure population daily exposure to air pollution,better result is shown during peak traffic situationcompare to normal traffic situation.

It is important to note that as proof-of-concept re-search, availability and access to different data setsis crucial for triangulation and means of verification.We used open data from sources such as OpenStreet Map and Worlpop when, in the beginningof the research, official statistics are not readilyavailable or accessible. Office of Transport andTraffic Policy (OTP) and Policy Control Departmentof Thailand Government granted access to a setof official statistics on transportation, administrationboundary and air quality information.

Immediate practical application will require fur-ther research to improve robustness of the model,in particular to include other data sources such astraffic counting, hourly air quality, land use, popula-tion and on-the-field observation. It is recommendedto conduct further study in selected area in thecity, based on government priority and primary dataavailability. Based on preliminary findings, it isrecommended to continue improve road-speed pro-filing and traffic nowcasting model to complimentExtended Bangkok Urban Model (eBUM).

2 PARTNERSHIP IN DATA INNOVATION

This research is a collaboration of GIZ DataLab as project owner, GIZ Thailand as problem

owner, Grab as data holder and Pulse Lab Jakartaas data analytics partner. The main beneficiary ofthis research is Government of Thailand and thisis in line with Prime Minister of Thailand messageto harness big data for decision making in 2018.The partnership managed to overcome the bureau-cracy challenges that come for a non-traditionalUN initiative to successfully partner with anothernon-traditional development sector initiative suchas Data Lab and an emerging decacron such asGrab. We need to address Government of Thailandconcern early in the implementation and able to dothat with support from GIZ Thailand. This requirespatience and trust from partners to ensure that thereare shared value emerges for all partners.

3 CONCLUSIONS AND NEXT STEPS

While this work is preliminary, it does howevershow that combining alternate data sources withnew techniques has the potential to augment therichness of insights that are available to devel-opmental domains and in particular transportationplanning and management.

3.1 Macroscopic traffic flow modelling

The analyses on leveraging ride hailing data inmacroscopic traffic flow modelling explored twoinnovations. Firstly it incorporated a complementarydata source that overcomes the timeliness issueof existing data used for traditional macroscopictraffic flow modelling which depends on infrequentsurveys and censuses. And since it is transactiongenerated data (i.e. generated as a by-product ofproviding some service, in this case ride hailing),it is relatively costless to generate when comparedto purpose built surveys and census that can beexpensive and cumbersome to administer. Secondlythe work explores the use of a radiation modeland compares it to the gravity model, which ismore commonly used in macroscopic traffic flow

26

Page 29: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

Reflections and Further Works

1 THE POTENTIAL OF ALTERNATIVE DATA SETS

We explore the possibility of using ride-hailingdata sets as alternative data sets to improve our un-derstanding of travel patterns in Bangkok Metropoli-tan Region. The research shows the potential of ride-hailing data in inferring travel pattern, road-speedprofile and traffic congestion nowcasting. Basedon promising preliminary findings, we conductedcomparison of traffic nowcasting from our modelwith Extended Bangkok Urban Model (eBUM) withmix results. We use traffic congestion nowcasting tomeasure population daily exposure to air pollution,better result is shown during peak traffic situationcompare to normal traffic situation.

It is important to note that as proof-of-concept re-search, availability and access to different data setsis crucial for triangulation and means of verification.We used open data from sources such as OpenStreet Map and Worlpop when, in the beginningof the research, official statistics are not readilyavailable or accessible. Office of Transport andTraffic Policy (OTP) and Policy Control Departmentof Thailand Government granted access to a setof official statistics on transportation, administrationboundary and air quality information.

Immediate practical application will require fur-ther research to improve robustness of the model,in particular to include other data sources such astraffic counting, hourly air quality, land use, popula-tion and on-the-field observation. It is recommendedto conduct further study in selected area in thecity, based on government priority and primary dataavailability. Based on preliminary findings, it isrecommended to continue improve road-speed pro-filing and traffic nowcasting model to complimentExtended Bangkok Urban Model (eBUM).

2 PARTNERSHIP IN DATA INNOVATION

This research is a collaboration of GIZ DataLab as project owner, GIZ Thailand as problem

owner, Grab as data holder and Pulse Lab Jakartaas data analytics partner. The main beneficiary ofthis research is Government of Thailand and thisis in line with Prime Minister of Thailand messageto harness big data for decision making in 2018.The partnership managed to overcome the bureau-cracy challenges that come for a non-traditionalUN initiative to successfully partner with anothernon-traditional development sector initiative suchas Data Lab and an emerging decacron such asGrab. We need to address Government of Thailandconcern early in the implementation and able to dothat with support from GIZ Thailand. This requirespatience and trust from partners to ensure that thereare shared value emerges for all partners.

3 CONCLUSIONS AND NEXT STEPS

While this work is preliminary, it does howevershow that combining alternate data sources withnew techniques has the potential to augment therichness of insights that are available to devel-opmental domains and in particular transportationplanning and management.

3.1 Macroscopic traffic flow modelling

The analyses on leveraging ride hailing data inmacroscopic traffic flow modelling explored twoinnovations. Firstly it incorporated a complementarydata source that overcomes the timeliness issueof existing data used for traditional macroscopictraffic flow modelling which depends on infrequentsurveys and censuses. And since it is transactiongenerated data (i.e. generated as a by-product ofproviding some service, in this case ride hailing),it is relatively costless to generate when comparedto purpose built surveys and census that can beexpensive and cumbersome to administer. Secondlythe work explores the use of a radiation modeland compares it to the gravity model, which ismore commonly used in macroscopic traffic flow

26

modelling. Even at this preliminary stage, the radi-ation model shows promise, performing better thanthe gravity model. These results must be treatedcautiously because in this preliminary work we findlarge deviations with actual traffic flows (inferredfrom the ride hailing data). But with additional workthese newer models that also incorporate new datasources, can potentially enhance the eBUM trafficmodelling currently utilized in Thailand. They couldalso potentially provide more accurate estimates ofactual traffic flow for lower cost, with fewer datapoints but at a higher frequency and at higher spatio-temporal coverage. Irrespective of which model isutilized, they are both susceptible to localizationerrors which would require city specific calibration.Taking this preliminary work forward would meanrefining the models further, integrating them into theexisting eBum model, and conducting systematicvalidation of the outputs with ground truth data,ideally those from OD/ commuting surveys.

3.2 Road speed profiling

The advantage of using ride-hailing data for trans-portation specialists is its ability to provide a highfrequency proxy estimate of traffic conditions andspecifically overall traffic speeds. Given that speedinformation is captured continuously it also allowstransportation specialists to understand the diurnalpatterns of traffic and speed for different situations(e.g. weekday, weekend, special events, etc.). Thisenables the development of profiles for differenttypes of activity that can each be updated in near-real time (subject to data access). Near real-timedata like this can then potentially facilitate quickerfeedback loops for transport agencies to trial andmodel effects of interventions (e.g. making roadsone way). It can also inform the development ofand testing of congestion charging policies.

Pneumatic tubes and induction loops which aremost commonly used to capture vehicular speedon roads can be costly and therefore not feasibleto deploy everywhere. It should be noted that ridehailing data doesnt directly give an estimate ofvehicle counts or vehicle classifications which acombination of induction loops and pneumatic tubescan give. But when coupled with ground truth dataand calibration, it would be possible to infer to someextent the volume of traffic on the roads from just

the speed information from ride hailing services. Ofcourse this comes with further caveats since ridehailing services usually cover only motorcycles andcars (in our preliminary work we only used datafrom cars), both of whose patterns of mobility andspeed vary not just between themselves but alsomore importantly with public transportation such asbuses which will have frequent stops. But as GPSon public transport also becomes more common,collectively with these different streams it wouldbe possible to more improve the accuracy of theinference of volume of traffic. The deployment ofCCTV can also improve the richness of the datathat can be gleaned for transportation managementand planning. Traditionally CCTV has not beenvery suitable for vehicle counts and vehicle clas-sifications, but as costs of cameras continuouslydecrease and the state of the art of machine learningalgorithms that can facilitate highly accurate vehic-ular counts and vehicular classification improves, itis possible that that deployment of these cheapertechnologies coupled with access to data from ridehailing data can provide cheap continuous data ofhigh frequency and high spatio-temporal coveragethan what was possible before.

3.3 Traffic Congestion Nowcasting

By leveraging the ability to have a continuousflow of speed information, albeit with the caveatsmentioned above, our preliminary research alsolooked at the possibility of developing a nowcastingtraffic congestion model and compared its to outputsfrom the eBUM model. The results have been mixedwith our model performing better on some roadsand less so on others. This was to be expectedgiven that we do not have rich ground truth datafor validation purposes and also only leveraging onetype of data (in this case vehicular speeds of carsfrom a ride hailing services). The model itself needsto be further improved so it is too soon to tell ifour preliminary work represents a viable alternativeand/or complement to the eBum model.

3.4 Inferring air pollution

Of the different use cases explored via this re-search, the last on inferring air quality at highspatio-temporal resolution directly tackles one of the

27

Page 30: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

most important environmental and public health is-sues affecting several countries in South and South-East Asia. In the context of Bangkok, which cur-rently only has 12 ground sensors, our preliminarywork explored the use of AI techniques on data frommultiple sources including, amongst others, satelliteimagery and traffic congestion estimates to infer AirQuality at higher spatio-temporal resolution. Ourworks taking account of the Bangkok air qualityconditions on various seasons, i.e. high, normal, andlow seasons. However, data from just 12 sensorsprovides too little in terms of validation. Futurework needs to have more ground truth validationdata, which will also better help calibrate the model.But if this can be improved on and scaled, then itcould provide mechanisms for a lower cost solutionto having high spatio-temporal coverage.

4 MAINSTREAMING THE USE OF ALTERNATEDATA SOURCES

More work needs to be done to be able tomainstream this preliminary work into practise. Thisinitial work is intended to inform policy enlight-enment activities, to show the state of the art andthe possibilities. The next significant step is to co-develop the design of rigorous research studies foreach of these preliminary uses cases explored in thiswork. The design would have to provide for betterground truth data for validation and importantly toimprove the preliminary work showcased here. Assuch having transportation specialists from Thailandinvolved both in the design as well as the implemen-tation of this second phase would also provide forthe cross-pollination of ideas and knowledge aroundthe specific uses and limitation of the new datasources and techniques. range of rigorous researchuse cases to both provide better validation of thispreliminary work but more importantly to improveit. This is all contingent also on having access tothese alternate data sources. Even whilst this maybe possible for the second phase of this work,sustainable longer term solutions to data accesswill need to be negotiated before this work can bemainstreamed.

28

Page 31: Data Innovation€¦ · Rizal Khaefi (Junior Data Scientist) and Mellyana Frederika (Partnership and Advocacy Lead), with the guidance of Sriganesh Lokanathan (Data Innovation & Policy

most important environmental and public health is-sues affecting several countries in South and South-East Asia. In the context of Bangkok, which cur-rently only has 12 ground sensors, our preliminarywork explored the use of AI techniques on data frommultiple sources including, amongst others, satelliteimagery and traffic congestion estimates to infer AirQuality at higher spatio-temporal resolution. Ourworks taking account of the Bangkok air qualityconditions on various seasons, i.e. high, normal, andlow seasons. However, data from just 12 sensorsprovides too little in terms of validation. Futurework needs to have more ground truth validationdata, which will also better help calibrate the model.But if this can be improved on and scaled, then itcould provide mechanisms for a lower cost solutionto having high spatio-temporal coverage.

4 MAINSTREAMING THE USE OF ALTERNATEDATA SOURCES

More work needs to be done to be able tomainstream this preliminary work into practise. Thisinitial work is intended to inform policy enlight-enment activities, to show the state of the art andthe possibilities. The next significant step is to co-develop the design of rigorous research studies foreach of these preliminary uses cases explored in thiswork. The design would have to provide for betterground truth data for validation and importantly toimprove the preliminary work showcased here. Assuch having transportation specialists from Thailandinvolved both in the design as well as the implemen-tation of this second phase would also provide forthe cross-pollination of ideas and knowledge aroundthe specific uses and limitation of the new datasources and techniques. range of rigorous researchuse cases to both provide better validation of thispreliminary work but more importantly to improveit. This is all contingent also on having access tothese alternate data sources. Even whilst this maybe possible for the second phase of this work,sustainable longer term solutions to data accesswill need to be negotiated before this work can bemainstreamed.

28

References

[1] D. Milne and D. Watling, “Big data and understanding change in the context of planning transportsystems,” Journal of Transport Geography, vol. 76, pp. 235 – 244, 2019.

[2] R. Etminani-Ghasrodashti and S. Hamidi, “Individuals demand for ride-hailing services: Investigatingthe combined effects of attitudinal factors, land use, and travel attributes on demand for app-basedtaxis in tehran, iran,” Sustainability, vol. 11, no. 20, 2019.

[3] “Ride-hailing - Asia, Statista market forecast,” https://www.statista.com/outlook/368/101/ride-hailing/asia, accessed: 2019-12-15.

[4] Y. Yang, M. C. Gonzlez, A. Maritan, and A.-L. Barabsi, “Limits of predictability in commuting flowsin the absence of data for calibration.” Nature, vol. 4, no. 5662, 2014.

[5] M. Barthlemy, “Spatial networks,” Physics Reports, vol. 499, no. 1, pp. 1 – 101, 2011.[6] F. Simini, M. C. Gonzalez, A. Maritan, and A.-L. Barabasi, “A universal model for mobility and

migration patterns,” Nature, vol. 484, no. 7392, pp. 96–100, Feb. 2012.[7] A. Sunderrajan, J. Varadarajan, and K. Lye, “Road speed profiling for upfront travel time estimation,”

in 2018 IEEE International Conference on Data Mining Workshops (ICDMW). Los Alamitos, CA,USA: IEEE Computer Society, nov 2018, pp. 648–654.

[8] F. van Diggelen and P. Enge, “The worlds first gps mooc and worldwide laboratory using smartphones,”in Proceedings of the 28th International Technical Meeting of the Satellite Division of The Instituteof Navigation, 2015, pp. 361–369.

[9] P. Christidis and N. I. Rivas, “Measuring road congestion,” Institute for Prospective TechnologicalStudies (IPTS), European Commission Joint Research Centre, 2012.

[10] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22NdACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16.New York, NY, USA: ACM, 2016, pp. 785–794.

[11] I. Kloog, P. Koutrakis, B. A. Coull, H. J. Lee, and J. Schwartz, “Assessing temporally andspatially resolved pm2.5 exposures for epidemiological studies using satellite aerosol optical depthmeasurements,” Atmospheric Environment, vol. 45, no. 35, pp. 6267 – 6275, 2011.

[12] I. Kloog, F. Nordio, B. A. Coull, and J. Schwartz, “Incorporating local land use regression and satelliteaerosol optical depth in a hybrid model of spatiotemporal pm2.5 exposures in the mid-atlantic states,”Environmental Science & Technology, vol. 46, no. 21, pp. 11 913–11 921, 2012.

29