big data and fast data – big and fast combined, is it possible?

44
2013 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN WELCOME Big Data and Fast Data Big and Fast Combined, is it Possible? Guido Schmutz UKOUG Tech 2013 2.12.2013 2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible? 1

Upload: guido-schmutz

Post on 26-Jan-2015

119 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

WELCOME Big Data and Fast Data – Big and Fast Combined, is it Possible?

Guido Schmutz

UKOUG Tech 2013

2.12.2013

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

1

Page 2: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Guido Schmutz

•  Working for Trivadis for more than 16 years

•  Oracle ACE Director for Fusion Middleware and SOA •  Co-Author of different books •  Consultant, Trainer Software Architect for Java, Oracle, SOA

and EDA •  Member of Trivadis Architecture Board •  Technology Manager @ Trivadis

•  More than 20 years of software development experience

•  Contact: [email protected] •  Blog: http://guidoschmutz.wordpress.com •  Twitter: gschmutz

2.12.2013

2 Big Data and Fast Data – Big and Fast Combined, is it Possible?

Page 3: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany and Austria.

We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems.

Our company

O P E R A T I O N

02/12/13 Trivadis – the company

Page 4: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

With over 600 specialists and IT experts in your region

4

12 Trivadis branches and more than 600 employees   200 Service Level Agreements   Over 4,000 training participants   Research and development budget: CHF 5.0 / EUR 4 million   Financially self-supporting and sustainably profitable   Experience from more than 1,900 projects per year at over 800 customers

Trivadis – the company 02/12/13

Hamburg

Düsseldorf

Frankfurt

Freiburg München

Wien

Basel Zurich Bern

Lausanne

Stuttgart

Brugg

Page 5: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Agenda

1.  Big Data, what is it?

2.  Motivation

3.  The Lambda Architecture

4.  Implementing the Lambda Architecture

5.  Summary

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

5

Page 6: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Big Data Definition (Gartner et al)

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

6

Velocity

Tera-, Peta-, Exa-, Zetta-, Yota- bytes and constantly growing

“Traditional” computing in RDBMS is not scalable enough.

We search for “linear scalability”

“Only … structured information is not enough” – “95% of produced data in

unstructured”

Characteristics of Big Data: Its Volume, Velocity and Variety in combination

+ Veracity (IBM) - information uncertainty + Time to action ? – Big Data + Event Processing = Fast Data

Page 7: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Big Data Definition (4 Vs)

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

7

+ Time to action ? – Big Data + Event Processing = Fast Data

Characteristics of Big Data: Its Volume, Velocity and Variety in combination

Page 8: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Volume Development

0

20

40

60

80

100

0

2000

4000

6000

8000

2005 2007 2009 2011 2013 2015

Agg

rega

te U

ncer

tain

ty %

Glo

bal D

ata

Volu

me

in E

xaby

tes

Year

Sensors: “internet of things”

Social Media: video, audio, text

VoIP: Skype, MSN, ICQ, ...

Enterprise Data: data dictionary, ERD, ...

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

8

Page 9: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

9

Page 10: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Internet Of Things

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

10

There are more devices tapping into the internet than people on earth

How do we prepare our systems/architecture for the future?

Source: Cisco Source: The Economist

Page 11: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Big Data in Context

NoSQL databases •  The storage for Big Data à Polyglot Persistence

Complex Event Processing (CEP) •  An architectural style for Fast Data

Lots of new terms §  HDFS, Hive, Hadoop, MapReduce, HBase, Pig, Cascading, Flume, Oozie

Not only Open Source •  Oracle Big Data Appliance & Microsoft HD Insight

No longer a clear distinction between Software Development and Business Intelligence !?

•  Java, Python, Clojure, R, … know how needed •  Data Scientists: Natural Language Processing, Statistics, Network Analysis

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

11

Page 12: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Big Data Use Cases / Scenarios

General

•  Analyzing social media data for service optimization, sentiment analysis, ...

Retail

§  Personalized travel- and shopping guidance depending on location detection (mobile, tablets, previous purchases)

Automotive

§  Analyzing telemetric data (e.g. for insurance: „Pay how you drive“, warranty, recall, warnings etc.)

Finance

§  Fraud detection for payments (real time)

Telco

§  Mobile user location analytics for „behavior mining“

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

12

Page 13: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Velocity

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

13

§  Velocity requirement examples: §  Recommendation Engine §  Predictive Analytics §  Marketing Campaign Analysis §  Customer Retention and Churn Analysis §  Social Graph Analysis §  Capital Markets Analysis §  Risk Management §  Rogue Trading §  Fraud Detection §  Retail Banking §  Network Monitoring §  Research and Development

Page 14: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Agenda

1.  Big Data, what is it?

2.  Motivation

3.  The Lambda Architecture

4.  Implementing the Lambda Architecture

5.  Summary

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

14

Page 15: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

What is a data system?

•  A system that manages the storage and querying of data with a lifetime measured in years encompassing every version of the application to ever exist, every hardware failure and every human mistake ever made.

•  A data system answers questions based on information that was acquired in the past

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

15

Page 16: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Desired Properties of a (Big) Data System

Robust and fault-tolerant

Low latency reads and updates

Scalable

General

Extensible

Allows ad hoc queries

Minimal maintenance

Debug-able

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

16

Page 17: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Complexity in today‘s architecture/systems

24. April 2013 Big Data und Fast Data

17

Lack of Human Fault Tolerance

Same structure for write/query

Schemas done wrong

Page 18: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Typical problem in today’s architecture/systems

Bugs will be deployed to production over the lifetime of a data system

Operational mistakes will be made

Humans are part of the overall system •  Just like hard disks, CPUs, memory, software •  design for human error like you design for any other fault

Examples of human error •  Deploy a bug that increments counters by two instead of by one •  Accidentally delete data from database •  Accidental DOS on important internal service

Worst two consequences: data loss or data corruption

As long as an error doesn‘t lose or corrupt good data, you can fix what went wrong

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

18

Lack of Human Fault Tolerance

Page 19: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Mutability

The U and D in CRUD

A mutable system updates the current state of the world

Mutable systems inherently lack human fault-tolerance

Easy to corrupt or lose data

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

19

Capturing change traditionally

Lack of Human Fault Tolerance

Name City Guido Berne Albert Zurich

Name City Guido Basel Albert Zurich

Page 20: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Immutability

An immutable system captures historical records of events

Each event happens at a particular time and is always true

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

20

Capturing change by storing events

Lack of Human Fault Tolerance

Name City Timestamp Guido Berne 1.8.1999 Albert Zurich 10.5.1988

Name City Timestamp Guido Berne 1.8.1999 Albert Zurich 10.5.1988 Guido Basel 1.4.2013

Page 21: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Immutability

Immutability greatly restricts the range of errors that can cause data loss or data corruption

Vastly more human fault-tolerant

Much easier to reason about systems based on immutability

Conclusion: Your source of truth should always be immutable

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

21

Lack of Human Fault Tolerance

Page 22: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

What about traditional/today’s architectures ?

Source of Truth is mutable!

Rather than build systems like this ….

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

22

Mutable Database

Application (Query)

RDBMS NoSQL

NewSQL

Mobile Web RIA

Rich Client

Source of Truth

Source of Truth

Page 23: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

A different kind of architecture with immutable source of truth

… why not building them like this

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

23

HDFS NoSQL

NewSQL RDBMS

View on Data

Mobile Web RIA

Rich Client

Source of Truth

Immutable data

View on Data

Application (Query)

Source of Truth

Page 24: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

How to create the views on the Immutable data?

On the fly ?

Materialized, i.e. Pre-computed ?

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

24

Immutable data View

Immutable data

Pre- Computed

Views

Query

Query

Page 25: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Data = the most raw information

Data is information which is not derived from anywhere else •  The most raw form of information •  from which everything else is derived

Questions on data can be answered by running functions that take data as input

The most general purpose data system can answer questions by running functions that take the entire dataset as input

query = function (all data)

The lambda architecture provides a general purpose approach for implementing arbitrary functions on an arbitrary datasets

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

25

Page 26: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Data = the most raw information

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

26

1.2.13 Add iPAD 64GB 10.3.13 Add Sony RX-100 11..3.13 Add Canon GX-10 11.3.13 Remove Sony RX-100 12.3.13 Add Nikon S-100 14.4.13 Add BoseQC-15 15.4.13 Add MacBook Pro 15 20.4.13 Remove Canon GX10

iPAD 64GB Nikon S-100 BoseQC-15 MacBook Pro 15

4 derive derive

Favorite Product List Changes Current Favorite

Product List Current Product Count

Raw information => data Information => derived

Page 27: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Big Data and Batch Processing

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

27

Immutable data

Batch View Query ? ? Incoming

Data

How to compute the batch views ?

How to compute queries from the views ?

Page 28: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Big Data and Batch Processing

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

28

Fully processed data Last full batch period

Time forbatch job

time

now non-processed data

time

now

batch-processed data

§  Using only batch processing, leaves you always with a portion of non-processed data.

Adapted from Ted Dunning (March 2012): http://www.youtube.com/watch?v=7PcmbI5aC20

But we are not done yet …

Page 29: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Big Data and Batch Processing

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

29

HDFS

Data Store optimized for appending large

results

Queries

Stream 1

Stream 2

Event

Hadoop cluster Map/Reduce in Pig

Hadoop Distributed File System

Page 30: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Adding Real-Time Processing

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

30

Immutable data

Batch Views

Query

? Data Stream

Realtime Views

Incoming Data

How to compute queries from the views ? How to compute real-time views

Page 31: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Adding Real-Time Processing

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

31

1.2.13 Add iPAD 64GB 10.3.13 Add Sony RX-100 11..3.13 Add Canon GX-10 11.3.13 Remove Sony RX-100 12.3.13 Add Nikon S-100 14.4.13 Add BoseQC-15 15.4.13 Add MacBook Pro 15 20.4.13 Remove Canon GX10 Now Add Canon Scanner

iPAD 64GB Nikon S-100 BoseQC-15 MacBook Pro 15

5

compute

Favorite Product List Changes Current Favorite

Product List

Current Product Count

Now Canon Scanner compute Add Canon Scanner

Stream of Favorite Product List Changes

Immutable data

Views

Data Stream

Query

incoming

Page 32: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Big Data and Real Time Processing

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

32

time

Fully processed data Last full batch period

now

Time forbatch job

batch processingworked fine here

(e.g. Hadoop)

real time processingworks here

blended view for end user

Adapted from Ted Dunning (March 2012): http://www.youtube.com/watch?v=7PcmbI5aC20

Page 33: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Agenda

1.  Big Data, what is it?

2.  Motivation

3.  The Lambda Architecture

4.  Implementing the Lambda Architecture

5.  Summary

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

33

Page 34: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Lambda Architecture

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

34

Immutable data

Batch View

Query

Data Stream

Realtime View

Incoming Data

Serving Layer

Speed Layer

Batch Layer

A

B C D

E F

G

Page 35: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Lambda Architecture

A.  All data is sent to both the batch and speed layer

B.  Master data set is an immutable, append-only set of data

C.  Batch layer pre-computes query functions from scratch, result is called Batch Views. Batch layer constantly re-computes the batch views.

D.  Batch views are indexed and stored in a scalable database to get particular values very quickly. Swaps in new batch views when they are available

E.  Speed layer compensates for the high latency of updates to the Batch Views

F.  Uses fast incremental algorithms and read/write databases to produce real-time views

G.  Queries are resolved by getting results from both batch and real-time views

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

35

Page 36: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Layered Architecture

Stores the immutable constantly growing dataset Computes arbitrary views from this dataset using BigData technologies (can take hours) Can be always recreated Computes the views from the constant stream of data it receives Needed to compensate for the high latency of the batch layer Incremental model and views are transient Responsible for indexing and exposing the pre-computed batch views so that they can be queried Exposes the incremented real-time views Merges the batch and the real-time views into a consistent result

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

36

Serving Layer

Batch Layer

Speed Layer

Page 37: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Agenda

1.  Big Data, what is it?

2.  Motivation

3.  The Lambda Architecture

4.  Implementing the Lambda Architecture

5.  Summary

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

37

Page 38: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Lambda Architecture

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

38

Speed Layer

Precompute Views

query

Source: Marz, N. & Warren, J. (2013) Big Data. Manning.

Batch Layer

Precomputed information All data

Incremented information Process stream

Incoming Data

Batch recompute

Realtime increment

Serving Layer

batch view

batch view

real time view

real time view

Mer

ge

Page 39: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Lambda Architecture in Action

Implementation in ongoing Proof-of-concept (after completion of phase 1)

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

39

Speed Layer

Precompute Views

query

Batch Layer

Precomputed information All data

Incremented information Process stream

Incoming Data

Batch recompute

Realtime increment

Serving Layer

batch view

batch view

real time view

real time view

Mer

ge

Page 40: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Lambda Architecture in Action

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

40

Cloudera Distribution

•  Distribution of Apache Hadoop: HDFS, MapReduce, Hive, Flume, Pig, Impala

Cloudera Impala

•  distributed query execution engine that runs against data stored in HDFS and HBase

Apache Zookeeper

•  Distributed, highly available coordination service. Provides primitives such as distributed locks

Apache Storm & Trident

•  distributed, fault-tolerant realtime computation system

Apache Cassandra

•  distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure

Twitter Horsebird Client (hbc)

•  Twitter Java API over Streaming API

Spring Framework

•  Popular Java Framework used to modularize part of the logic (sensor and serving layer)

Apache Kafka

•  Simple messaging framework based on file system to distribute information to both batch and speed layer

Apache Avro

•  Serialization system for efficient cross-language RPC and persistent data storage

JSON

•  open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs.

Page 41: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Lambda Architecture with Oracle Product Stack

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

41

Possible implementation with Oracle Product stack

Speed Layer

Precompute Views

query

Batch Layer

Precomputed information All data

Incremented information Process stream

Incoming Data

Batch recompute

Serving Layer

batch view

batch view

real time view

real time view

Mer

ge

Oracle NoSQL

Oracle RDBMS Oracle Coherence

Oracle BigData Appliance

Oracle NoSQL Oracle Coherence

Oracle Event Processing Oracle GoldenGate

Oracle Data Integrator

Oracle GoldenGate

Oracle Event Processing

Oracle Service Bus

Ora

cle

Web

Log

ic S

erve

r O

racl

e A

DF

OBI

EE

Ora

cle

Ende

ca

Ora

cle

Big

Dat

a C

onne

ctor

s

Page 42: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Agenda

1.  Big Data, what is it?

2.  Motivation

3.  The Lambda Architecture

4.  Implementing the Lambda Architecture

5.  Summary

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

42

Page 43: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

Summary – The lambda architecture

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

43

§  The Lambda Architecture §  Can discard batch views and real-time views and recreate everything from

scratch §  Mistakes corrected via re-computation §  Data storage layer optimized independently from query resolution layer §  Still in a very early …. But a very interesting idea!

-  Today a zoo of technologies are needed => Operations won‘t like it

§  The technology/implementation §  Different query language for batch and real time §  An abstraction over batch and speed layer needed

-  Cascading and Trident are already similar §  Not everything works out-of-the-box and together §  Industry standards needed!

Page 44: Big Data and Fast Data – Big and Fast Combined, is it Possible?

2013 © Trivadis

BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

THANK YOU. Trivadis AG

Guido Schmutz

Europa-Strasse 5CH-8095 Glattbrugg

[email protected] www.trivadis.com

2.12.2013 Big Data and Fast Data – Big and Fast Combined, is it Possible?

44