Transforming Sensing Data into Smart Data for
Smart Sustainable Cities
Koji Zettsu
National Institute of Information and Communications Technology (NICT), Japan
BDA2019
December 19, 2019
2
About NICT
Japan’s sole public research institute specializing in the field of ICT
Established in 1896 1,000+ employees 11 institutes and centers at 12 branches in Japan
Remote sensing Cyber security Space weather
Universal communication
Integrated Testbed Network systems
Source: Keidanren, SDGs https://www.keidanrensdgs-world.com/
Rapid Change of Urban Environment
Population concentration in urban areas • 1.3 million people moving into cities each week • 68% of the world’s population is expected to be living in cities by 2050 • 90% of this urban population growth set to occur in Africa and Asia
Complication of Social Problems • Energy, transportation, disaster response, social security, air pollution, garbage
treatment, etc.
[World Urbanization Report, UN, 2011]
3
Source: Society 5.0, Cabinet Office of Japan, https://www8.cao.go.jp/cstp/english/society5_0/index.html Source: Keidanren, SDGs https://www.keidanrensdgs-world.com/
Towards Smart Sustainable City
4
Source: Society 5.0, Cabinet Office of Japan, https://www8.cao.go.jp/cstp/english/society5_0/index.html Source: Keidanren, SDGs https://www.keidanrensdgs-world.com/
Towards Smart Sustainable City
5 [Cristina Bueti: Shaping Smart Sustainable Cities in Latin America, ITU Green Standards Week, 2016]
Source: Society 5.0, Cabinet Office of Japan, https://www8.cao.go.jp/cstp/english/society5_0/index.html Source: Keidanren, SDGs https://www.keidanrensdgs-world.com/
Towards Smart Sustainable City
[Cristina Bueti: Shaping Smart Sustainable Cities in Latin America, ITU Green Standards Week, 2016]
6
Data-driven Solutions based on Open Government Data
7
• Data.SF(San Francisco)
Provide > 200 datasets on environment, traffic, healthcare, security, economic, etc. to develop 60 > applications by citizens, local industries and NPOs.
•OpenSense (Switzerland)
Collect air pollution data in Zurich city by public
transportations and crowdsensing to utilize
environmental management and healthcare
• Data.gov.sg (Singapole)
Provide API for accessing dynamic data on city weather, air pollution, energy, traffic, etc.
8
Society 5.0 (Cabinet Office of Japan)
Super Smart Society by high degree of convergence between cyberspace and physical space through IoT
Source: Society 5.0, Cabinet Office of Japan, https://www8.cao.go.jp/cstp/english/society5_0/index.html
New Values of Mobility in Society 5.0
9
Source:Society 5.0, Cabinet office of Japan, https://www8.cao.go.jp/cstp/english/society5_0/transportation_e.html
AI analysis of big data spanning diverse types of information including sensor data from automobiles, real-time information on the weather, traffic, accommodations, and food and drink, and personal history
Case Study: Traffic Problems Caused by Unusual Weather
10
Internal causes (signal trouble, etc.)
External causes (injury accident, etc.)
Disaster causes (heavy rain, etc.)
# of transportation accidents in Japan
Whitepaper, Ministry of Land, Infrastructure, Transportation and Truism (2015) http://www.mlit.go.jp/hakusyo/mlit/h27/hakusho/h28/data/html/ns009040.html
2014 1988
Ex.1) Rush of evacuating cars on earthquake (Kumamoto, 2016)
Discovering Traffic Obstacles from Traffic Data
Realtime discovery of traffic obstacles from probe data changes
Normal traffic
Alternative traffic
Detect big change of traffic → traffic obstacle
Collect probe car data on a nationwide level with five-minute intervals
Lat.
Long.
Ex.2) Road traffic suppressed by heavy rain (Ehime, 2018)
11 Courtesy: Masao Kuwahara (Tohoku Univ., DOMINGO)
NICT M2M Data Center
Discovering Driving Risks from Drive Recorder Data
12 Courtesy: Masashi Toyoda (Univ. of Tokyo)
Drive recorder data collection
• 257 cars, 2700 drivers
Driving characteristics
Road characteristics
Camera image analysis
(3.5 years archive)
Near-miss location map (Tokyo)
Realtime Distribution of Event Data Streams
13 Courtesy: Jin Nakazawa (Keio Univ.)
Realtime monitoring of public car sensor data
Make Mobility Smart and Sustainable
14
Informa tive
• High quality transport information to meet diverse needs
Interac tive
• Enhanced traveler experience with smarter interactivity
Assis tive
• Towards a safe and secure roadway environment
Smart Mobility
Quote (partly): Smart Mobility 2030, Singapore government
Adaptive Environmental events
Example of Safer Route Discovery
Shortest route Safer route (25%-lower risk)
Risk-free route
Traffic congestion data
Precipitation radar data
rainfall:15-20mm/h ⇒
speed: <10km/h,
congestion_length:300m-600m
• support=0.14, confidence=0.55, lift=1.37
• # transactions=75, density=59.2
Lat.
Long.
JOIN
Time
Discovery of association rules between traffic and environmental events
Realtime prediction of mobility risks Dynamic search for safer route
15
VEENA: An All-Weather Road Congestion Prediction Model
Predict congestions on a road for a given weather condition (i.e., low to heavy rainfall) Discover sets of neighboring roads where large congestions may happen
16
1. Association Rule Mining • Fast algorithm for Weighted
Frequent Itemset (WFI)
discovery
2. Spatial High Utility Itemset Mining
• Finding a traffic congestion occurrence area as a set of neighboring road segments whose total congestion length (utility) exceeds a given threshold.
Precipitation data
Traffic congestion data
Insufficient data for many road segments lead to inaccurate predictions
[Kiran, R. U., Zettsu, et. al.: Discovering Spatial High Utility Itemsets in Spatiotemporal Databases, SSDBM 2019]
Transactions
Predicted congestion areas (road segment groups)
VEENA Example
17
• Database: 39,873 data points, 2,412 items/sensors • Accuracy:81% and Precision:79%
Actual Predicted
Congested roads in Kobe, Japan at Typhoon Nangka (17-July-2015)
Congestion occurrence (internal utility)
Spatial High Utility Itemset Mining (SHUIM)
High-utility itemset • Utility of itemset >= minUtil
18
Utility of Item a at T1 : U(a,T1)=100*2=200
Utility of itemset ab at T1 : U(ab,T1)=200+150=350
Utility of itemset ab in DB: U(ab)=350+300=650
Utility of itemset cd in DB U(cd)=900+600=1500
Item := road segment (a, b, c, … )
Spatial High utility pattern • Distance between all items <= maxDist
Congestion length (external utility)
If minUtil=1000, then cd is a high utility pattern
If maxDist=5, then cd is a spatial high utility itemset
Distance of itemset cd : D(cd) = 5
Spatial High Utility Itemset Mining (SHUIM)
SHUIM algorithm • Performs depth-first search to discover desired itemsets
• Needs only a single scan on the data
19
Naïve algorithm encounters memory out of bounds exception at low minUtil values
SHUIM algorithm finds desired itemsets even at low minUtil values effectively.
• C++ program, 1.5 GHz CPU/4GB RAM machine • Congestion database (Typhoon Nangka, Kobe, 17-
July-2015): 39,873 data points, 2,412 items/sensors
Comparison: extended EFIM (naïve) vs SHUIM
Experimental results
Predicting Variety of Mobility Risks for Unusual Weather
Unusual traffic risks by heavy snow
Near-miss risks by heavy rain
Probe car Snowfall Drive
recorder Rainfall
20
Developing Risk-Adaptive Drive Navigation Applications
21
Risk map visualization
Alert rule setting
Vehicle running simulation
Drive navigation setting
Transparent area
Navigation application User interface design
Route search
Design for Safe and Smart Navigation
22 Courtesy: Zenrin Data Com, Samurai Startup Island
Prototyping car navigation application using NICT Cross-Data Platform APIs for mobility risk prediction on unusual weather • 2019/2/23-24 in Tokyo • 20 participants (IT/ITS engineers, researchers, students)
Driving support on heavy snow roads with risk information selected from drive recorders, traffic cameras, SNS and dynamic maps
[Best Prize Winner]
23
Smart Services
Collection
Association
Navigation
Prediction
APIs
Atmosphere Traffic SNS, etc. Health
Weather
A Framework for Transforming IoT Data to Smart Data
Feedback
Feedback
24
NICT xData Platform
Sensing data • Meteorological
observation data
• Environmental
monitoring data
• Road traffic data
• Vehicle sensor data
(floating car data, etc.)
• Wearable sensor data
(environment, physical
condition, activity, etc.)
• SNS data (Twitter), etc.
Map Creation
API
Route
search
API
Alert
API Prediction
results Associative
dataset Event data
(common format)
Association
Mining API
Associative
Prediction API
Smart sustainable mobility Smart environmental healthcare
NICT Integrated Testbed DB servers×8, Storage servers×2,
Analysis server×10, Cluster server×36
Citizen sensing
data
Data
Loader
API
Collection Association Prediction Navigation
Feedback
Applications
Event Data Warehouse (EvWH)
Distributed data warehouse system for extracting, joining and mining common format of “event” data from heterogeneous sensing data sources
11 domains, 15.8 billion records/23.6TB(as of October 2019)
25
Complex Event Analysis on EvWH
Discover and predict a co-occurrence pattern of multi-source events • E.g.) Weather event x traffic event x SNS event
Application to Smart City Dashboard • Environment-aware situation monitoring: relative traffic risk by extraordinary
weather, air quality hazard caused by heavy traffic, etc.
26
Traffic Risk Prediction based on Sensing Data Fusion
Predict moving patterns of traffic obstacles in different time horizons. • Explore impacts of external factors such as rainfall amounts and Twitter posts
Data-level fusion strategy • Consider various and unlimited sensing data types for predictive modelling
Deep-learning-based approach to:
• Predict future traffic risk over multi-scale geographical area w.r.t multi-time-horizon considerations by utilizing associations of complex event.
27
Raster Image Representation of Complex Event
Converting events (event factors) of different sensing data into spatiotemporal multi-layered raster images
Visual exploration of latent associations among heterogeneous events in a scalable manner
28
Precipitation layer
[Dao, M. S. and Zettsu, K. : Complex Event Analysis of Urban Environmental Data based on Deep CNN of Spatiotemporal Raster Images, IEEE BigData 2018]
Congestion layer
SNS layer
Raster image
Time series of
raster images
Geograph
ical mesh
Complex Event Prediction by 3D-CNN
29
Past k periods of input data
Next m periods of predictions
[Minh-Son Dao, et.al.: Multi-time-horizon Traffic Risk Prediction using Spatio-Temporal Urban Sensing Data Fusion, IEEE BigData 2019]
Example of Relative Traffic Risk Prediction
Predict traffic risk events on extraordinary weather (heavy rain, etc.)
30
Experimental Settings • 30 min/frame • Past frames (k) = 6 (3 hrs) • Pred. frames (m) = 3 (1.5hrs) • Batch size: 1 • Learning rate: 3e-5 to 5e-5 • Decay per each iteration: 1e-5
to 2e-5
Prediction performance (vs. historical average) • MSE: 2207 (2624) • RMSE: 42.48 (46.07) • MAE4.88 (5.60)
• Datasets: rainfall data, traffic congestion data, tweets on disasters in Kobe, Japan, May - October, 2014 and 2015
Smart Environmental Healthcare Service
Case Study: Environmental Healthcare
31 Source: Effects on Public Health - Air Pollution, a Preventable Risk, GRID-Arendal (2014)
Air Quality Monitoring
32
World Air Quality Index (https://waqi.info/)
• Collect monitoring data from 12,000 stations in 1000 major cities from 100 countries
33
IoT Sensing of Air Quality
Automotive sensing
Personal exposure tracking
OMRON environment sensor
Mapping air pollution in Oakland (Google, Aclima) Two Google Street View cars covered 14,000 miles and collected millions of data points about black carbon, NO, NO2 between May 2015 and May 2016.
Atmospheric sensing in Tokyo (NICT, Greenblue) Two cars covers two routes (10km each) in Tokyo and collected O3, PM2.5, PM10, NO2,tempertre, humidity and drive camera data every weekdays between February to April, 2019 (and 2020)
Personal PM2.5 sensor (NICT, Nagoya Univ.)
Environmental Sensor Box (Greenblue)
Air Quality Prediction with Multi-source Data
34
Local observation data
Personal exposure data
Regional observation data
…
…
City air quality
Personal air quality
xData Platform (Associative Prediction API)
Deep Learning for Environment-related Events
CRNN for predictive modeling of spatiotemporal association between environment-related events
35
LSTM
Linear1
LSTM
Softmax
Output of CNNs t1
Linear2
Output t1
Linear1
LSTM
Softmax
Output of CNNs tN
Linear2
Output tN
Spatial association modeling by Convolutional Neural Network (CNN)
Temporal association modeling by Long Short-Term Memory (LSTM)
t1 t2 tn
Short-term AQI prediction by CRNN
Trans-border air pollution
36
Experimental Result
37
Prediction performance • Predict time N in range 1 to 24 hours, with past
L = 24 hours with delay D = 36 hours.
• F-measure of CRNN model compared with LSTM-only and Linear regression prediction model (LRPF)
• 14,401 hours data from January 2015 to July 2017 are divided into datasets for training (60%), validation (20%) and testing (20%)
AQI rank Label size E.g.) PM2.5 condition
Rank1(Good) 7597 PM2.5 < 15𝜇g/𝑚3
Rank2(Moderate) 5875 15 𝜇g/𝑚3 < PM2.5 < 35 𝜇g/𝑚3
Rank3(Unhealthy) 929 PM2.5 > 35 𝜇g/𝑚3
Datasets • Local data from 60 observation stations in Fukuoka, Japan:
Atmospheric Environmental Regional Observation System(AEROS) - SO2, NOx, NO, NO2, CO, Ox, NMHC, CH4, THC, SPM, PM2.5, Wind direction, Wind speed, Temperature, Humidity
• Regional data from 33 coastal cities in China: Ministry of
Environmental Protection (MEP) - Air quality index (AQI), PM2.5, PM10, SO2,
NO2, O3, CO, Temperature - TEMP
Transfer Learning for Personal Air Quality Prediction
38
Decoder Transfer
Transfer Prediction 𝑌𝑛
Decoder Transfer layer
Decoder Transfer layer
Decoder Transfer layer
Encoder Transfer layer
Encoder Transfer layer
Encoder Transfer layer
Encoder feature map 𝐻𝑛
Encoder
Decoder
Matching feature map
CRNN (pre-training)
…
…
Personal exposure data
Training
Pre-training output Transfer prediction output
Decoding
Wasserstein distance loss
Auto-encoder loss
Transfer loss
Transfering
39
Experimental Result for Personal AQI Prediction
0
5
10
15
20
25
30
35
40
45
50
Route1 Route2 Route3 Route4 Route5M
AE
AQI Prediction Error
IDW
IDW-LR
TTL-CNN
TTL-CRNN
DTL-CNN
DTL-CRNN
DTLCRNN: Decoder transfer-learning (CRNN pre-training model) Baselines • IDW: Inverse distance weighting (IDW) for interpolation • IDWlr: Linear regression and the IDW • TTLCNN: Typical transfer-learning layer (CNN pre-training model) • TTLCRNN: Typical transfer-learning layer (CRNN pre-training
model) • DTLCNN: Decoder transfer-learning (CNN pre-training model)
Route 2
Route 1 Route 4
Route 3
Route 5
Training data collection by crowdsensing • (ex-)marathon course
for Tokyo Olympic Game 2020
• 5 routes (5km/route) • 9:00-11:00am at every
weekday in March - April, 2019
• NO2, PM2.5, O3 + temperature, humidity
Smart Environmental Healthcare Service
Smart Environmental Healthcare Datathon
40
Prototyping smart services with air quality health risk prediction • Map navigation for “good air” route
• Reward point = [air quality] x [activity amount]
• Feedback of user atmosphere measurements
Citizen participation: crowdsensing & ideathon events In Fukuoka and Tokyo (March-April,2018/2019) http://datathon.jp/
Smart City Deployment
41 41
International Collaboration
42
API mashup API mashup
Correlation analysis,
prediction
Prediction result
distribution
Information Portal
Applications
NICT xData Platform
Processed data required for
correlation analysis
Customize prediction models and prediction results
Heterogeneous big data
Participatory sensing
Users
Cryptography
A reusable, sharable, and transferable smart data platform for collaborative development of data-driven smart city
Conclusions
Data-driven solutions towards Smart Sustainable Cities
Transforming IoT data to Smart Data
• Data analytics platform for collection, association, prediction, navigation, feedback with multi-source, multi-domain sensing data
Human in the center of Smart Data utilization
• Collective awareness, citizen participation, crowdsensing
43
44