pilot sc4 - big data europe · stream and batch processor bde workshop brussels ... apache flink is...

24
Pilot SC4 BDE Workshop Brussels 14 Sept. 2017 BDE Workshop Brussels 14.09.2017

Upload: others

Post on 12-Jul-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Pilot SC4 BDE Workshop Brussels 14 Sept. 2017

BDE Workshop Brussels 14.09.2017

Page 2: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Objective of the Pilot SC4

BDE Workshop Brussels 14 Sept. 2017

A scalable, fault-tolerant and flexible platform based on open source frameworks that can process unbounded data sets.

Page 3: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Microservice Architecture

BDE Workshop Brussels 14 Sept. 2017

Page 4: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Message Broker

BDE Workshop Brussels 14 Sept. 2017

Apache Kafka is a high-throughput distributed durable messaging system

Apache Kafka

Page 5: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Kafka Cluster

BDE Workshop Brussels 14 Sept. 2017

Apache Kafka

Page 6: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Stream and Batch Processor

BDE Workshop Brussels 14 Sept. 2017

Apache Flink is an open source platform for distributed stream and batch data processing.

Apache Flink

Page 7: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Flink Cluster

BDE Workshop Brussels 14 Sept. 2017

Apache Flink

Page 8: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Storage and Indexing

BDE Workshop Brussels 14 Sept. 2017

PostGis is a spatial database that stores the road network data. Elasticsearch is a distributed open source document database built on top of Apache Lucene. It stores the result of the workflow.

Page 9: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Elasticsearch Cluster

BDE Workshop Brussels 14 Sept. 2017

Page 10: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Pilot Architecture

BDE Workshop Brussels 14 Sept. 2017

Page 11: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

BDE Components

BDE Workshop Brussels 14 Sept. 2017

Page 12: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

The FCD Pipeline

BDE Workshop Brussels 14 Sept. 2017

Page 13: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Pilot Cluster

BDE Workshop Brussels 14 Sept. 2017

Minimum requirement for fault-tolerance and scalability ● C luster of 3 nodes (Docker swa rm) ● 4 C PU cores x node ● 1 (Flink) worker x node ● 1 (Flink) s lot x C PU core Ma x pa ra llelism = 12

Page 14: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Parallelization: map-match subtasks

BDE Workshop Brussels 14 Sept. 2017

1. source() 2. mapMatch() 3. keyBy()/window()/apply() 4. sink() The subtasks can be distributed in slots with different parallelism (e.g. from 1 to 12)

Page 15: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Parallelization: map-match subtasks

BDE Workshop Brussels 14 Sept. 2017

A slot can process all the subtasks in a pipeline

Page 16: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Parallelization: input and output data

BDE Workshop Brussels 14 Sept. 2017

device_id timestamp lat lon speed orientation transit

The mapMatch subtask keeps the time order so that the next task keyBy(road_seg)/window(15’)/apply(average_speed) will return the correct result within the time window for each road segment.

road_seg_id start_date num_vehicles avg_speed

Page 17: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

SC4 Pilot Pipeline

BDE Workshop Brussels 14 Sept. 2017

Page 18: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Data Upload

BDE Workshop Brussels 14 Sept. 2017

Page 19: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Producer and Consumer

BDE Workshop Brussels 14 Sept. 2017

Page 20: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Visualization

BDE Workshop Brussels 14 Sept. 2017

The pilot SC4 can process real-time FCD data for map-matching and classify a road segment according to the traffic level.

Page 21: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Short-term traffic forecast

BDE Workshop Brussels 14 Sept. 2017

Algorithm: Feedforward ANN

Page 22: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Short-term traffic forecast

BDE Workshop Brussels 14 Sept. 2017

Algorithm: Feedforward ANN Hyperparameters (spatial and temporal correlation): Input layer units: (Dd*24*60*Cr)/Tw Dd = number of days (e.g. working days, 5 days) Tw = time window (e.g. 30’) Cr = connected road segments (e.g. 3) -> 720 input units

Page 23: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

SANSA-Stack: Big Data + Machine Learning + Semantic Technologies

BDE Workshop Brussels 14 Sept. 2017

SANSA-Stack, part of the BDE project, and RDF data sets based on semantic technologies such as LinkedGeoData, will enable more use cases related to SC4

Page 24: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache

Thanks

BDE Workshop Brussels 14 Sept. 2017

BDE project website: https:/ /www.big-data-europe.eu/ Code repository: https:/ /github.com/big-data-europe Contact: [email protected]