big data europe: concept, platform and pilots

29
BIG DATA EUROPE: CONCEPT, PLATFORM, AND PILOTS BDE SC5 Workshop, Brussels 11 October 2016

Upload: bigdataeurope

Post on 16-Apr-2017

352 views

Category:

Technology


1 download

TRANSCRIPT

BIG DATA EUROPE:CONCEPT, PLATFORM,AND PILOTSBDE SC5 Workshop, Brussels11 October 2016

Talk outline

¥ The BigDataEurope action¥ The Big Data Integrator platform¥ Pilots across all seven H2020 challenges¥ Upcoming BDE Activities

18-oct.-16www.big-data-europe.eu

BigDataEurope Action

18-oct.-16www.big-data-europe.eu

Big Data Europe (CSA: 2015-17)¥ Show societal value of Big Data

o Across all societal challenges addressed by Horizon 2020

¥ Lower barrier for using big data technologieso Effort and resources to convert tools and workflows

o Skills and expertise

¥ Help establish data value chainso Across languages, organizations, and domains

18-oct.-16www.big-data-europe.eu

Consortium

NCSRDEMOKRITOS

Stakeholder Engagement

¥ Present action, showcase deployments

¥ Raise awareness about BDE results, what they mean for stakeholders

¥ Collect requirements to drive further development

18-oct.-16

www.big-data-europe.eu

M12M6 M18 M24 M30

Data Value Chain Evolution

18-oct.-16www.big-data-europe.eu

Extraction,Curation Quality,Linking,Integration

Publication,Visualization,Analysis

Extraction,Curation,Quality,Linking,Integration,Publication,

Visualization,Analysis

HealthTransport

Security

Extraction Curation Quality Linking Integration Publication Visualization Analysis

Data Repositories Linked Open Data Cloud

Stage 1

Stage 2

Stage 3

Food SocietiesClimate Energy

Big Data Integrator

18-oct.-16www.big-data-europe.eu

Architecture

¥ Big Data Integrator (BDI):o The prototype developed by BDE

¥ Main points of the architectureo Dockerizationo Support layer, including integrated UIo Semantification layer

18-oct.-16www.big-data-europe.eu

Big Data Integrator

18-oct.-16www.big-data-europe.eu

¥ Plug-and-play BD Platform¥ Cloud-deployment ready

¥ Domain independent, Customisable¥ Bundles Open Source solutions

¥ First Version Released!

Docker containers

18-oct.-16www.big-data-europe.eu

¥ Docker offers lightweight virtualizationo Docker containers can be shared to be provisioned on different

Linux variations and versions

¥ Identical base sys not required

¥ All BDI components: Docker containers

BDI components

18-oct.-16www.big-data-europe.eu

¥ Processing and storage componentso Re-used existing docker containers where availableo Dockerized by BDE otherwiseo Ensured all can be provisioned through Docker Swarm

¥ Components by BDE:o Support Layero Semantic Layer

Support Layer

18-oct.-16www.big-data-europe.eu

¥ BDE defines uniform UI stylesheetso Web UIs from BDE dockers (including for third party

components) follow these BDE stylesheets

¥ BDE-developed tools:o Starting containers

and dependencieso Monitoring execution

Semantic data lake

18-oct.-16www.big-data-europe.eu

¥ Minimal ingestion pre-processing

¥ Semantic layer maintains metadata

¥ Add meaning when retrieving/processing

DataLake:scalableunstructureddatastore

Relationshipdefinitionsandmetadata

JSON-LD CSVW R2RMLXML2RDF

BDE Docker Containers

18-oct.-16www.big-data-europe.eu

¥ Data serving: HDFS, Cassandra, 4store, PostGIS, Strabon, Elastic Search, Hive, Semagrow

¥ Processing: Spark, Flink, Sansa¥ Stream ingestion middleware: Flume,

Kafka

Semantic layer tools

18-oct.-16www.big-data-europe.eu

¥ BDE tooling for Semantic Data Lake:o Swagger: Semantics of RESTful APIso Semantic Analytics Stack (SANSA):

Distributed data processing for large-scale RDF data

o Semagrow: SPARQL perspective over Big Data stores

BigDataEurope Pilots

18-oct.-16www.big-data-europe.eu

SC1: Pharmacology research

18-oct.-16www.big-data-europe.eu

Life Sciences & Health

• Extensive toolset developed by OPF and others

• Query a large number of datasets, some large• Existing elaborate ingestion and homogenization

by the OpenPHACTS Foundation

SC2: Viticulture resources

18-oct.-16www.big-data-europe.eu

Food and Agriculture

• AgInfra is a major infrastructure for agriculture researchers, serving cross-linked bibliography, data, and processing services

• Pilot automates publication ingestion and thematic classification

SC3: Predictive maintenance

18-oct.-16www.big-data-europe.eu

Energy

• Wind turbine monitoring applies computational models to sensor data streams

• Models are weekly re-parameterized using week’s data from multiple turbines

SC4: Traffic conditions estimation

18-oct.-16www.big-data-europe.eu

Transport

• Estimation of real-time traffic conditions in Thessaloniki

• Combines:• Traffic modelling from

historical data• Current measurements from a

taxi fleet of 1200 vehicles

SC5: Climate modelling

18-oct.-16www.big-data-europe.eu

Climate

• Discovering and re-using previously computed derivatives• Lineage annotation: datasets and model

parameters used to compute derivative datasets

• Finding appropriate past runs avoids repeating weeks-long modelling runs

• Preparing modelling experiments• Slicing, transforming, combining datasets into new datasets• Submission to and retrieval from modelling infrastructure

SC5 Pilot: Points Demonstrated

18-oct.-16www.big-data-europe.eu

Climate

• Existing infrastructure and stable, reliable software for parallel computation of models

• BDI is deployed as an external infrastructure for preparing and managing datasets

• BDI offers:• Hive for managing data in a way that can be

retrieved and manipulated, rather than file blocks• Cassandra stores structured and textual metadata

for searching headers and lineage

SC6: Municipality budgets

18-oct.-16www.big-data-europe.eu

Social Sciences

• Ingestion of budget and budget execution data

• Multiple municipalities in varied formats and data models

• Homogenized data made available for analysis and comparison

SC7: Change detection & verification

18-oct.-16www.big-data-europe.eu

Secure Societies

• Events are extracted from text published by news agencies and on social networking sites

• Events are geo-located and relevant changes are detected by comparing current and previous satellite images

UPCOMING BDE ACTIVITIES

Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges

2nd round of Societal Workshops

18-oct.-16www.big-data-europe.eu

Transport 22 September 2016 Brussels Collocated with Big Data for Transport, Tisa workshop

Food&Agri 30 September 2016 Brussels Collocated with DG AGRI WP2018-20 stakeholder consultation

Energy 4 October 2016 Brussels Collocated with EC H2020 Info Day on “Smart Grids and Storage”

Climate 11 October 2016 (1) Brussels Collocated with Melodies Project Event – Exploiting Open Data

Health 19 October 2016 Brussels Standalone WorkshopSecurity 18 October 2016 Brussels Standalone WorkshopSocieties 5 December 2016 Cologne Collocated with EDDI16- 8th Annual

European DDI User Conference

Other Activities

¥ Hands-on BDE pilots workshopo Apache Big Data Europe, Seville, 14-16 Nov

o Enable BD technology practitioners to try out BDI & components

o To fine-tune technical BDI requirements

¥ Various SC-focussed and general hangouts, follow!o Apache Flink & BDE (20 Oct) – Free Webinar

18-oct.-16www.big-data-europe.eu

WEB: www.big-data-europe.euEMAIL: [email protected] DATA INTEGRATOR: www.github.com/big-data-europe

PROJECT COORDINATIONProf. Sören Auer, auer © cs.uni-bonn · de (Fraunhofer IAIS)> Dr. Simon Scerri, scerri © cs.uni-bonn · de (Fraunhofer IAIS)

EIS Department/Group,Fraunhofer IAIS & CS Department Uni-Bonn,Bonn, Germany

Fraunhofer IAIS: Leads Fraunhofer Big Data Alliance

Questions & Contacts

www.big-data-europe.eu18-oct.-16

#BigDataEurope