Transcript
Page 1: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Data Science and Big Data: Research landscape and impact on the

mobility domain

Martin Köhler

Dynamic Transportation Systems

Mobility Department

Austrian Institute of Technology

Salzburg Data Science Symposium – 20.11.2014

Page 2: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Data-intensive science

Enormous data archives are at hand

Various data sources

Often available in real-time

Investigating huge data volumes

and driving research and industry

Science is moving increasingly

from hypothesis-driven to data-

driven discoveries

Correlation vs. Causality

Page 3: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Science is changing

Thousand years ago

Science was empirical

describing natural phenomena

3

Last few hundred years

Theoretical branch using

generalizations

Last few decades

A computational branch

simulating complex phenomena

Today

Data-intensive science,

synthesizing theory, experiment

and computation with statistics

► new way of thinking required! Data - Intensive Science: The Fourth

Paradigm, Alex Szalay

Dept of Physics and Astronomy

The Johns Hopkins University

e.g. Ptolemy’s universe of

concentric spheres

e.g. Newtonian/Einsteinian gravity

e.g. Cosmic structure formation

e.g. Matter/energy content of the universe

Page 4: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

More data versus rocket science

“In this paper, we evaluate the

performance of different

learning methods on a

prototypical natural language

disambiguation task,

confusion set disambiguation,

when trained on orders of

magnitude more labeled data

than has previously been

used..”

„Some simple math given a

mountain of data can get

you 80% of the way.“

James Shanahan, Berkeley

4 20.11.2014 Scaling to Very Very Large Corpora for Natural Language Disambiguation,

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics

Banko & Brill, 2001

Page 5: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

5 20.11.2014 August E. Evrard, PhD. Cyberscience: Computational Science and the Rise of the

Fourth Paradigm , 2010

Page 6: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Big Data Definition

6 20.11.2014

“Big Data” is a term encompassing the use of techniques to capture, process,

analyse and visualize potentially large datasets in a reasonable timeframe

not accessible to standard IT technologies. By extension, the platform, tools

and software used for this purpose are collectively called “Big Data

technologies”. NESSI White Paper, December 2012

6

Four characteristics:

•Volume: In the last years the amount of generated data increased enormously

•Velocity: Analysing more data in shorter time frames

•Variety: Huge diversity of data formats (Arbitrary–> Relational > Freitext)

•Value: Extracting value (knowledge)

Hardware and software technologies for manageing and

Analyzing huge amounts of data

Or simply said

IF DATA IS PART OF THE PROBLEM

Page 7: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Big Data Dimensions

Legal dimension

Social dimension

Economic dimension

Technological dimension

Application dimension

Copyright

Privacy

User behaviour

collaboration

Social implikations

Business models

Benchmarking

Pricing

Scalable data processing

Signal processing

Statistics

Linguistics

HCI/Visualization

Electronic archiving

Decision support

Industry solutions

20/11/2014 7

Page 8: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Big Data Technology Stack

Hadoop

Ecosystem

Big Data

Platforms

Data

Ingestion

And

Processing

Efficiency

Trust

Workload

Governance

Tools

Platform

Programming

Parallel

Big Data

Analytics

Data

Science

Transform

question to

algorithm

Machine

Learning

Analysis

Integration

Query

Performance

Transform

Warehousing

Big Data

Utilization

Domain

Expertise

Asking the

right

question

Reporting &

Dashboards

Alerting &

Recommendat

ions

Business

Intelligence

Text Analysis

and Search

20/11/2014 8

Data

Centers

Big Data

Management

Scalable Data

Storage

IaaS

Cloud

Virtualization

Network

Compute

Storage

DBMS

NoSQL

M

an

ag

em

en

t

Se

cu

rity

P

riva

cy

Go

ve

rna

nc

e

Da

ta

Va

lue

Page 9: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Big Data Management

9

Technologies for efficient management of large amounts of data • Storage and management of data

• Provisioning and management of the infrastructure

Cloud Ressourcen Interne Datenzentren

Storage

Page 10: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Big Data Plattforms

10

Technologies for massively parallel execution of analytics on top of huge data amounts

• Provisioning of parallel and scalable execution systems

• Real-time computation of sensor data

Massive parallel

programming

Programming models

for data-intensive

applications

(e.g. MapReduce)

High-Level query

languages

Scripting languages

and abstract

representations of low-

level data-intensive

query languages

Streaming

Real-time processing of

(sensor-) data which has

to be reduced for storage

Ad-Hoc queries

Real-time access on

large data amounts

(Queryoptimization –

SQL vs. MapReduce)

Google Pregel Apache Drill

Page 11: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Big Data Analytics

11

Technologies for gaining information from large data amounts on the basis of analytical approaches

• Recognize new models

• Pattern matching

• Pattern recognition

Page 12: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Big Data Utilization

12

Technologies for extracting knowledge and gaining value • Strengthen the market position

• Simple utilization of huge data amounts

Business

Intelligence

Data-driven

provisioning of efficient

idicators

(reporting, key

performance indicator,

audit, …)

Knowledge

Management

Management and

representation of

knowledge

(Ontologies,

LinkedData,

knowledge

management systems)

Decision Support

support the decision

process; includes data

management,

modelling, innovative

and interactive user

interface

Visualization

Interactive visualization

of complex information

and networks with

multiple abstractions

(Visual Analytics)

Page 13: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Traditional versus Data-intensive Approach

– 13 –

HADOOP

Iterate over structure

Transform and analyze

Hadoop Approach• Apply schema on read

• Support range of access patterns to

data stored in HDFS: polymorphic

access

Batch Interactive Real-time

Right Engine, Right Job

In-memory

Traditional Approach• Apply schema on write

• Heavily dependent on IT

Determine list of questions

Design solution

Collect structured data

Ask questions from list

Detect additional questions

Single Query Engine

SQL

Page 14: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Technical and Scientific Challenges

Visual Analytics

Combine the strengths of human and

electronic data processing

Big Data Analytics

Techniques making use of complete

data set, instead of sampling

Real time analytics, stream

processing

Expect real-time or near real-time

responses from the systems

Content Validation

Validating the vast amount of information

in content networks, Trust

14 20/11/2014

Distributed Storage (IaaS, NoSQL)

Datacenter

Parallel Stream Processing MapReduce Extensions

Use Cases and Enterprise Services

Scientific Data Life Sciences Business Reporting

Datacenter

Datacenter

Page 15: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Future Trends in Big Data Key aspects in European research (Horizon 2020)

15 20/11/2014

Big Data

Current state

Natural Language Processing

Multi-Lingual

systems

Real-time

cross-stream

processing

European

data portals

Data

Availa

bility

Scalable

analytics

Page 16: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Data Science Application Domains

Earth

Page 17: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Potentials for Smart Cities - Urban Computing

“Urban computing connects urban sensing, data management, data

analytics, and service providing into a recurrent process for an unobtrusive

and continuous improvement of people’s lives, city operation systems, and

the environment.”

Zheng, Y., et. Al. Urban Computing: Concepts, Methodologies, and Applications,

2014.

Why bother?

Air pollution

Congestions

Noise pollution

Accidents

17

Smart

City

Urban Computing for Urban Planning

Urban Computing for Transportation

Systems

Urban Computing for

the Environment

Urban Computing for Urban Energy Consumption

Urban Computing for

Social Applications

Urban Computing for

Economy

Urban Computing for Public Safety and Security

Page 18: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Data-driven Analytics enabling Smart Mobility Solutions

Real-time integration and analytics of heterogeneous data sources

Massively parallel execution of generic data analytic workflows

Application-specific visualizations

18

Page 19: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Data-driven Analytics – Mobility Applications

19

Crowd Dynamics

Events, Airports, Stations

Dynamic Route Planning

Transport Logistics

Data Acquisition

Floating Car Data, Mobile Phone Data,…

Multi-modal Traffic Flow Modeling

Multi-modal Transport Networks

• Multi-modal data collection

• Data analysis

• Optimization

• Multi-modal traffic simulation and

prediction

Page 20: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Data-driven Analytics – inferring land use

Infer current, actual land use data

from mobile phone and WIFI data

Temporal activity patterns

Spatial clusters

Correlations

Cooperation AIT and MIT

“Inferring land use from mobile phone activity”, Jameson Toole, Michael Ulm, Dietmar Bauer, Marta Gonzalez

Best paper Award” at the the UrbComp 2012

Page 21: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

FLEET – Real-time traffic measurement and information system

21

Real time traffic information based on GPS reports of probe vehicles

Accurate short and medium term travel time prediction

Hot spot identification and estimation of traffic queue lengths

Page 22: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Big Data in Logistics

22 date

Page 23: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

M. Koehler, University of Vienna; SKG 2012, Beijing, China

EU Project VPH-Share

• FP7 Integrated Project within Virtual Physiological Human Initiative

• Duration: March 2011 – February 2015

• Cost: 14.5 M€; Funding: 10.7 M€; 20 Partners

• Coordinator: The University of Sheffield, United Kingdom

• Goal Contribute to the VPH vision of a systematic framework for understanding physiological processes in the human body in terms of anatomical structure and biophysical mechanisms across multiple length and time scales.

Page 24: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

M. Koehler, University of Vienna; SKG 2012, Beijing, China

Cloud Platform (Public / Private)

Select

Workflow

Patient Data

Workflow Inputs

Workflow Outputs

Infer

missing

items

Run

simulation

Decision Support

Patient Centred Computational Workflows

Retrieve

Existing

Data

Return

Results &

Support

Users

VPH

Outreach P

ati

en

t A

va

tar

Applic

ation

Info

str

uc

ture

HPC Infrastructure (DEISA / PRACE)

Pe

rso

na

lis

ed

Mo

de

l

Knowledge Discovery

Data Inference

Compute

Services

Storage

Services

Knowledge

Management

Data Services:

Patient/Population

euHeart

@neurIST

VPH OP

ViroLab

Partners:

CYFRONET, PL

Sheffield Teaching

Hospitals, UK

ATOS Origin, ES

Kings College

London, UK

Universitat

Pompeu

Fabra, ES

Empirica, DE

SCS SRL, IT

NHS IC, UK

INRIA, FR

IOR, IT

Open Univ., UK

Philips Elec., NL

TU Eindhoven, NL

Univ. Auckland, NZ

Uv Amsterdam, NL

UCL, UK

Univ. Vienna, AT

AATRM, ES

FCRB, ES

Project No: 269978

Coordinator: University of

Sheffield, UK

Page 25: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

EU Project TRIDEC

25 © IDC Visit us at IDC.com and follow us on Twitter: @IDC

Visit the project: http://bigdataaustria.wordpress.com

Page 26: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Code of practice for big data projects Support and orientation for the impementation of big data projects

26

Process model Maturity model

Reference architecture

Page 27: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Open Data – a key driver for data science

27 20.11.2014

European Data Innovator Award 2014 goes to

Johann Mittheisz, former CIO of the City of Vienna

& the Open Government Team of Vienna

Page 28: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Global market

IDC expects a growth of the

global market from 9,8 Billion

USD in 2012 to 32,4 Billion USD

in 2017

Yearly growth rate: 27%

Austrian market 2013:

~ 23 Mio Euro

Page 29: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Data Scientists

29

„We will soon have a huge skills shortage for data-related jobs.“

Neelie Kroes (ICT 2013, Nov.7, Vilnius)

„Data Scientist: The Sexiest Job of the 21st Century“ http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1

Page 30: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Data scientists

30

Page 31: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Steps towards a driven-driven economy

„Data is a commodity – competence is the key “

31

Page 32: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Ad

de

d V

alu

e

Mar

ket

Lead

ers

hip

Loca

tio

n a

ttra

ctiv

enes

s

Enh

ance

co

mp

ete

nce

s

Visibility

Objectives

Competence

Enable data access

Legislation

Provide infrastructure

Current status

Focus, create and provide competences

Secure competences for the long-term

Establish holistic institution

Establish (international) legal certainty

Establish general framework for data markets

Incentives for Open Data

Enhance funding for SMEs

Steps

Page 33: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

Conclusion

Emerging research field utilizing big data for various application

domains

data-intensive computing

Machine learning

Data-intensive science and big data have a huge potential to

drive the evolvement of novel applications

by integrating diverging large-scale data sources

analyzing data sources in real-time

Visualizing results meaningfully

Data-driven analytics is a key enabler for providing

more information to stakeholders in shorter time

Supporting better decisions

33

Page 34: Data Science and Big Data - Fachhochschule Salzburg...Big Data Definition 20.11.2014 6 “Big Data” is a term encompassing the use of techniques to capture, process, analyse and

AIT Austrian Institute of Technology your ingenious partner

Martin Köhler

Mobility Department

Dynamic Transportation Systems

AIT Austrian Institute of Technology GmbH

Giefinggasse 2 | 1210 Vienna | Austria

T +43(0) 50550-6054 | M +43(0) 664 815 79 60 | F +43(0) 50550-6439

[email protected] | http://www.ait.ac.at


Top Related