using huge amounts of road sensor data for official statistics · cleaning the data recursive...
TRANSCRIPT
Marco Puts, Piet DaasMartijn Tennekes, Chris de Blois
Statistics Netherlands
Session 202.6.2016
Using huge amounts of road sensor data for
official statistics
Information value of Big Data
Secondary data
Data of ‘others’- Administrative sources- Big Data
Sample Survey
Questionairs
CBS
2
3
Source (bits)
Stat
istic
s(bi
ts)
Sample Survey
Admin Big Data
How much source data do you need?
Sample Issues Source issuesConversion issues
Source issuesConversion issuesSelectivity issuesData issues
Information value of Big Data
Road sensors
Road sensor data• Passing vehicle counts for each minute
(24/7) at about 60.000 sensors in the Netherlands
• Types of sensors:• Induction loop• Camera • Bluetooth
• Length categories (e.g. small, medium, long vehicles)
• Large volume: approx. 230 mln records/day
4
Dutch highways
5
6
Dutch highways
The Signal and the Data
7
Manually
Automatic
Q Q
Proces parameters
Data Signal
Noise
Quality of the Data
88
9
X1 X2 X3 X4
Y1 Y2 Y3 Y4
observation
state
𝑥𝑥𝑘𝑘 = 𝑥𝑥𝑘𝑘−1 + 𝜀𝜀𝑝𝑝𝑦𝑦𝑘𝑘 = 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃(𝑥𝑥𝑘𝑘)
Cleaning the DataRecursive Bayesian Estimation
Cleaning the DataRecursive Bayesian Estimation
X1 X2 X3 X4
Y1 Y2 Y3 Y4
observation
state
Update:𝑃𝑃(𝑥𝑥𝑘𝑘|𝑦𝑦𝑘𝑘) ∝ 𝑃𝑃(𝑥𝑥𝑘𝑘|𝑦𝑦1..𝑘𝑘−1) 𝑃𝑃(𝑦𝑦𝑘𝑘|𝑥𝑥𝑘𝑘) 10
11
X1 X2 X3 X4
Y1 Y2 Y3 Y4
observation
state
Prediction: 𝑃𝑃(𝑥𝑥𝑘𝑘+1| 𝑦𝑦1..𝑘𝑘) = �−∞
∞𝑃𝑃(𝑥𝑥𝑘𝑘|𝑦𝑦1..𝑘𝑘)𝑃𝑃(𝑥𝑥𝑘𝑘+1|𝑥𝑥𝑘𝑘)𝑑𝑑𝑥𝑥𝑘𝑘
Cleaning the DataRecursive Bayesian Estimation
12
X1 X2 X3 X4
Y1 Y2 Y3 Y4
observation
state
Missing Data
Cleaning the DataRecursive Bayesian Estimation
Results of the filter
13
Monitoring Quality
• Number of minutes for which data is available varies per day per sensor
• Filter fills in blocks of missing values. For large blocks, the estimation of missing values is less accurate.
• Minimal deviation between original non-missing values and resulting signal.
• Smoothness of resulting signal
14
Resulting Indicators
Number of Measurements𝑀𝑀
Block indicatorFor each block:
Difference between data and signal
Smoothness of the signal
15
𝑁𝑁(𝑁𝑁 + 1)2
𝐷𝐷 =∑𝑘𝑘∈𝑀𝑀 𝑥𝑥𝑘𝑘∑𝑘𝑘∈𝑀𝑀 𝑦𝑦𝑘𝑘
− 1
𝑆𝑆 = 1𝐾𝐾�𝑘𝑘=1
𝐾𝐾𝑦𝑦𝑘𝑘 − 𝑦𝑦𝑘𝑘−1 2
𝑦𝑦𝑘𝑘 + 𝑦𝑦𝑘𝑘−1 2
Precision/Accuracy
The filter does not introduce extra errors:
Precision: 3.6% Accuracy:+0.13%
16
Card material
17
Calibration of the road sensors
Main route
Road sensors
Road segments (=weights)
3000 2000 2500
18
• Check and (if necessary) correct traffic flow direction• Projection of road sensors on roads• Group sensors with unique location• Remove outliers
Quality of locations of sensors
Raw sensor metadata Edited sensor metadata
19
Metadata synchronization
No road sensors?
Where is the road?
20
Data journalism and(almost) real time statistics
Respond tocurrent events
Within two
days!
21