using huge amounts of road sensor data for official statistics · cleaning the data recursive...

21
Marco Puts, Piet Daas Martijn Tennekes, Chris de Blois Statistics Netherlands Session 20 2.6.2016 Using huge amounts of road sensor data for official statistics

Upload: others

Post on 09-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

Marco Puts, Piet DaasMartijn Tennekes, Chris de Blois

Statistics Netherlands

Session 202.6.2016

Using huge amounts of road sensor data for

official statistics

Page 2: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

Information value of Big Data

Secondary data

Data of ‘others’- Administrative sources- Big Data

Sample Survey

Questionairs

CBS

2

Page 3: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

3

Source (bits)

Stat

istic

s(bi

ts)

Sample Survey

Admin Big Data

How much source data do you need?

Sample Issues Source issuesConversion issues

Source issuesConversion issuesSelectivity issuesData issues

Information value of Big Data

Page 4: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

Road sensors

Road sensor data• Passing vehicle counts for each minute

(24/7) at about 60.000 sensors in the Netherlands

• Types of sensors:• Induction loop• Camera • Bluetooth

• Length categories (e.g. small, medium, long vehicles)

• Large volume: approx. 230 mln records/day

4

Page 5: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

Dutch highways

5

Page 6: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

6

Dutch highways

Page 7: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

The Signal and the Data

7

Manually

Automatic

Q Q

Proces parameters

Data Signal

Noise

Page 8: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

Quality of the Data

88

Page 9: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

9

X1 X2 X3 X4

Y1 Y2 Y3 Y4

observation

state

𝑥𝑥𝑘𝑘 = 𝑥𝑥𝑘𝑘−1 + 𝜀𝜀𝑝𝑝𝑦𝑦𝑘𝑘 = 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃(𝑥𝑥𝑘𝑘)

Cleaning the DataRecursive Bayesian Estimation

Page 10: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

Cleaning the DataRecursive Bayesian Estimation

X1 X2 X3 X4

Y1 Y2 Y3 Y4

observation

state

Update:𝑃𝑃(𝑥𝑥𝑘𝑘|𝑦𝑦𝑘𝑘) ∝ 𝑃𝑃(𝑥𝑥𝑘𝑘|𝑦𝑦1..𝑘𝑘−1) 𝑃𝑃(𝑦𝑦𝑘𝑘|𝑥𝑥𝑘𝑘) 10

Page 11: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

11

X1 X2 X3 X4

Y1 Y2 Y3 Y4

observation

state

Prediction: 𝑃𝑃(𝑥𝑥𝑘𝑘+1| 𝑦𝑦1..𝑘𝑘) = �−∞

∞𝑃𝑃(𝑥𝑥𝑘𝑘|𝑦𝑦1..𝑘𝑘)𝑃𝑃(𝑥𝑥𝑘𝑘+1|𝑥𝑥𝑘𝑘)𝑑𝑑𝑥𝑥𝑘𝑘

Cleaning the DataRecursive Bayesian Estimation

Page 12: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

12

X1 X2 X3 X4

Y1 Y2 Y3 Y4

observation

state

Missing Data

Cleaning the DataRecursive Bayesian Estimation

Page 13: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

Results of the filter

13

Page 14: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

Monitoring Quality

• Number of minutes for which data is available varies per day per sensor

• Filter fills in blocks of missing values. For large blocks, the estimation of missing values is less accurate.

• Minimal deviation between original non-missing values and resulting signal.

• Smoothness of resulting signal

14

Page 15: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

Resulting Indicators

Number of Measurements𝑀𝑀

Block indicatorFor each block:

Difference between data and signal

Smoothness of the signal

15

𝑁𝑁(𝑁𝑁 + 1)2

𝐷𝐷 =∑𝑘𝑘∈𝑀𝑀 𝑥𝑥𝑘𝑘∑𝑘𝑘∈𝑀𝑀 𝑦𝑦𝑘𝑘

− 1

𝑆𝑆 = 1𝐾𝐾�𝑘𝑘=1

𝐾𝐾𝑦𝑦𝑘𝑘 − 𝑦𝑦𝑘𝑘−1 2

𝑦𝑦𝑘𝑘 + 𝑦𝑦𝑘𝑘−1 2

Page 16: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

Precision/Accuracy

The filter does not introduce extra errors:

Precision: 3.6% Accuracy:+0.13%

16

Page 17: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

Card material

17

Page 18: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

Calibration of the road sensors

Main route

Road sensors

Road segments (=weights)

3000 2000 2500

18

Page 19: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

• Check and (if necessary) correct traffic flow direction• Projection of road sensors on roads• Group sensors with unique location• Remove outliers

Quality of locations of sensors

Raw sensor metadata Edited sensor metadata

19

Page 20: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

Metadata synchronization

No road sensors?

Where is the road?

20

Page 21: Using huge amounts of road sensor data for official statistics · Cleaning the Data Recursive Bayesian Estimation. Results of the filter. 13. Monitoring Quality • Number of minutes

Data journalism and(almost) real time statistics

Respond tocurrent events

Within two

days!

21