big data analytics - fakultät iv elektrotechnik und ... · big data analytics chances and...

29
Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor and Chair Database Systems and Information Management (DIMA), Technische Universität Berlin www.dima.tu-berlin.de

Upload: lydung

Post on 06-May-2018

224 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Big Data AnalyticsChances and Challenges

Volker Markl | DIMA | BDOD Big Data Chances and Challenges

Volker Markl

Professor and ChairDatabase Systems and Information Management (DIMA), Technische Universität Berlin

www.dima.tu-berlin.de

Page 2: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

More and more data is available to science and business!

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Slide 2

Drivers:Cloud ComputingInternet of ServicesInternet of ThingsCyberphysical Systems

Underlying Trends:ConnectivityCollaborationComputer generated data

video streams

web archives

sensor data

audio streams

RFID data

simulation data

Page 3: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Data are becoming increasingly complex!

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Slide 3

Size (volume)Freshness (velocity)Format/Media Type (variability)Uncertainty/Quality (veracity)etc.

Data

Page 4: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Data and analyses are becoming increasingly complex!

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Slide 4

Size (volume)Freshness (velocity)Format/Media Type (variability)Uncertainty/Quality (veracity)etc.

Data

Selection/Grouping (map/reduce)Relational Operators (Join/Correlation)Extraction & Integration, (map/reduce or dataflow systems)Data Mining (R, S+, Matlab )Predictive Models (R, S+, Matlab )etc.

Analysis

Page 5: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Data-driven applications …

… will revolutionize decision making in business and the sciences!

… have great economic potential!Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Slide 5

lifecycle management

home automation health

water management

market research traffic management

energy management

information marketplaces

Page 6: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Slide 6

General consensus seems to be that people (would like to) have lots of data, which they would like to analyze. How do we go about that?

Page 7: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Running in Circles?

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Slide 7

SQL

NoMapReduceSQL--

NoSQL

Page 8: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Slide 8

scripting wrongplatform?

XQuery?SQL--

columnstore++

a queryplan

scalable parallel sort

What is Wrong with this Picture?

Page 9: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Big Data Technologies Worldwide

■ MapReduce und Tools: Hadoop, Cloudera Impala (USA), InfoSphereBigInsights (IBM)

■ Data Mining und Informationsextraktion: SystemT (IBM),

■ Statistical Packages: R (Universität Auckland), Matlab (Mathworks;USA), SAS, SPSS

■ DBMS: Aster (Teradata), Greenplum (EMC), DB2 (IBM), SQLServer (Microsoft)Oracle Exadata (Oracle)

■ Business Analytics/Intelligence: SAS Analytics (SAS), Vertica (HP), SPSS IBM)

■ Script Sprachen: Pig, Hive, JAQL

■ Graph Databases: Neo4j (USA), AllegroGraph (Franz Inc.; Giraph (Apache)

■ Visualization: Tableau (Tableau Software; USA),

■ Suche / Indexing: Lucene (Apache), Solr (Apache)

■ Other: UIMA (IBM)

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Slide 9

Page 10: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Slide 10

System Type Stakeholder

ParStream DBMS (in-memory) ParStream

SAP Hana DBMS(in-memory) SAP

Hyper DBMS(in-memory) TU München

BigMemory DBMS(in-memory) Terracotta Inc./Software AG

Scalaris Storage (parallel) Konrad-Zuse-Zentrum fürInformationstechnik

RapidMiner Data mining toolkit Rapid-I

ImageMaster Enterprise Content Management T-Systems

SalesIntelligence Market research Implisense

Predictive-Analytics-Suite Business Analytics Blue Yonder

Stratosphere Scalable Data Analytics System TU Berlin

Big Data Technologies

Page 11: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

3 Big Trends in Big Data

■ Solving complex data analysis problems□ Predictive analytics□ Infinite data streams□ Distributed data□ Dealing with uncertainty□ Bringing the human in the loop: Visual analytics

■ Producing results in near-real time□ First results fast□ Low latency□ Exploiting new Hardware

(manycore, OpenGL/CUDA, FPGA, heterogeneous processors, NUMA, remote memory)

■ Hiding complexity□ Declarative data anaylsis programs► aggregation► relational algebra► iterative algorithms► distributed state

□ Fault tolerance (pessimistic/optimistic)

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Slide 11

Page 12: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Valu

e in

crease

Another “V“ - Value:Big Data in the Cloud - the Information Economy

A major new trend in information processing will be the trading of original and enriched data, effectively creating an information economy.

Cloud-Computing Stack The Market situation

MIA, Datamarket, …

MIA, Azure IMR, etc …

Salesforce.com,Office 2010 WebApps

Microsoft Azure,Google App Engine

Amazon Elastic Compute Cloud

Data as a Service

SaaS

PaaS

Information as a Service

IaaS

End users

System administrators

Application Developers

Analysts

Corporations

„When hardware became commoditized, software was valuable.Now that software is being commoditized, data is valuable.“ (TIM O‘REILLY )

„The important question isn’t who owns the data. Ultimately, we all do. A better question is, who owns the means of analysis?“ (A. CROLL, MASHABLE, 2011)

Slide 12

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Page 13: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Marketplace

Tru

st

Infrastructure as a Service

e.g., Social Media Monitoring

e.g., Social Media Monitoring

e.g., Media Publisher Services

e.g., Media Publisher Services

e.g., SEOe.g., SEO

Distributed Data StorageDistributed Data Storage

Massively Parallel InfrastructureMassively Parallel Infrastructure

IndexIndex

UsersDataproviders Algorithms

Data &Aggregation

RevenueSharing

Technology Licensing QueriesAnalytical

results

German Web

http://www.mia-marktplatz.dehttp://www.dopa-project.eu

Slide 13

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Information Marketplaces: Enabling SMEs to capitalize on Big Data

Page 14: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Data analysis program

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Slide 14

Programcompiler

Dataflow

Runtime operatorsHash- and sort-based out-of-core

operator implementations,memory management

Stratosphere optimizerAutomatic selection of parallelization, shipping and local strategies, operator

order and placementExecution plan

Parallel execution engineTask scheduling, network data transfers,

resource allocation, checkpointing

Job graph Execution graph

Big Data looks Tiny from the Stratosphere

Stratosphere is a European declarative, massively-pa rallel Big Data Analytics System, funded by DFG and EIT, available open-source with Apache License

http://www. stratosphere.eu

Page 15: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Germany is on the Map!

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Slide 15

“Scalable Machine Learning for Big Data” Tutorial by Microsoft at the

IEEE International Conference on Data Engeneering2013

Page 16: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

5 Mistakes Data Scientists Make When Putting Big Data To Work

Selecting a platform that scales

□ Performance (runtime, first-results fast)□ People

Selecting the right data

□ Information Quality□ Timeliness

Choosing the right model□ Assumptions about the data

Reproducability

□ Numerical stability

Trusting your results

□ Confidence and statistical significance□ Lineage

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Slide 16

Page 17: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Legal Dimension

SocialDimension

EconomicDimension

Technology Dimension

ApplicationDimension

Business ModelsBenchmarking

Impact of Open SourceDeploymentPricing

Scalable Data ProcessingSignal Processing

Statistics/MLLinguistics

HCI/Visualization

„Big Data“ is Interdisciplinary

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Slide 17

CopyrightPrivacyLiability

User BehaviourSocietal ImpactCollaboration

Digital Preservation Decision MakingVerticals

Page 18: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Call to Action

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 18

■ Educate Data Scientists to create the required tale nt□ T-shaped students□ Information „literacy“□ Data Analytics Curriculum

■ Research Big Data Analytics Technologies□ Data management (uncertainty, query processing under near real-time constraints,

information extraction)□ Programming models□ Machine learning and statistical methods□ Systems architectures□ Information Visualization

■ Innovate to maintain competetiveness□ Demonstrate flagship use-cases to raise awareness□ Promote startups in the area of data analytics□ Transfer technologies to enterprises, in particular SMEs□ Determine legal frameworks and business models

We need to ensure a European technological leadership role in „Big Data“

Page 19: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 19

for Big Data Analytics

commandments

Page 20: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

1: Thou shall use declarative languages

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 20

All-pairs shortest paths using recursive doubling in Stratosphere’s Scala front-end

Page 21: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Beyond MapReduce

3: Thou shall use rich primitives

Map

Reduce

Cross

Match

CoGroup

2: Thou shall acceptexternal (dynamic) sources

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 21

“In situ” data - no load

Page 22: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

4: Thou shall deeplyembed UDFs

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 22

Concise and flexible

5: Thou shall optimize

Page 23: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

6: Thou shall iterate

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 23

Needed for most interesting analysis cases

Pregel as a Stratosphere query plan with comparable performance.

Page 24: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

7: Thou shall use a scalable and efficient execution engine

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 24

Pipeline and data parallelism, flexible checkpointing, optimized network data transfers

Page 25: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

An Overview of Stratosphere

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 25

Page 26: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Meteor program

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 26

Meteor compiler

Pact program

RuntimeHash- and sort-based out-of-core

operator implementations, memory management

Stratosphere optimizerPicks data shipping and local

strategies, operator orderExecution plan

Nephele execution engineTask scheduling, network data transfers, resource allocation, checkpointing

Job graph Execution graph

Page 27: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Stratosphere bulk iterations

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 27

commandments

Currently making the optimizer iteration-aware and exposing PACT-level abstraction

Page 28: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

PACT4S: Embedding of PACTs in Scala

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 28

Master thesis developed language and Scala compiler plug-in, ongoing Master thesis will integrate in query optimizer, still in (very) prototypical mode

Page 29: Big Data Analytics - Fakultät IV Elektrotechnik und ... · Big Data Analytics Chances and Challenges Volker Markl | DIMA | BDOD Big Data Chances and Challenges Volker Markl Professor

Ingredients of a Big Data Project

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 29

database systems,scalable platforms

machine learning/statistics/optimization

real-worldanalysis problem

visualizationNLP

signal processing…

technology(data preparation,

scalable processing)

mathematicalanalysis methods

application