big data and fast data - big and fast combined, is it possible?

59
2013 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN WELCOME Big Data and Fast Data big and fast combined – is it possible? Guido Schmutz und Albert Blarer 24. April 2013 24. April 2013 Big Data und Fast Data 1

Upload: guido-schmutz

Post on 10-May-2015

3.136 views

Category:

Technology


1 download

DESCRIPTION

Big Data (volume) and real-time information processing (velocity) are two important aspects of Big Data systems. At first sight, these two aspects seem to be incompatible. Are traditional software architectures still the right choice? Do we need new, revolutionary architectures to tackle the requirements of Big Data. This presentation discusses the idea of the so-called lambda architecture for Big Data, which acts on the assumption of a bisection of the data-processing: in a batch-phase a temporally bounded, large dataset is processed either through traditional ETL or MapReduce. In parallel, a real-time, online processing is constantly calculating the values of the new data coming in during the batch phase. The combination of the two results, batch and online processing is giving the constantly up-to-date view. This talk presents how such an architecture can be implemented using Oracle products such as Oracle NoSQL, Hadoop and Oracle Event Processing.

TRANSCRIPT

Page 1: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

WELCOME Big Data and Fast Data big and fast combined – is it possible?

Guido Schmutz und Albert Blarer

24. April 2013

24. April 2013 Big Data und Fast Data

1

Page 2: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Guido Schmutz

•  Working for Trivadis for more than 16 years

•  Oracle ACE Director for Fusion Middleware and SOA •  Co-Author of different books •  Consultant, Trainer Software Architect for Java, Oracle, SOA

and EDA •  Member of Trivadis Architecture Board •  Technology Manager @ Trivadis

•  More than 25 years of software development experience

•  Contact: [email protected] •  Blog: http://guidoschmutz.wordpress.com •  Twitter: gschmutz

14.06.2012

2 Where and When should I use the Oracle Service Bus (OSB)

Page 3: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

Page 4: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Mit über 600 IT- und Fachexperten bei Ihnen vor Ort.

4

11 Trivadis Niederlassungen mitüber 600 Mitarbeitenden

200 Service Level Agreements

Mehr als 4'000 Trainingsteilnehmer

Forschungs- und Entwicklungs-budget: CHF 5.0 / EUR 4 Mio.

Finanziell unabhängig und nachhaltig profitabel

Erfahrung aus mehr als 1'900 Projekten pro Jahr bei über 800 Kunden

Stand 12/2012

Hamburg

Düsseldorf

Frankfurt

Freiburg München

Wien

Basel

Zürich Bern Lausanne

4

Stuttgart

Datum Trivadis – das Unternehmen

Page 5: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Credits

Nathan Marz

Author of „Big Data – Principles and best practics of scalable realtime data systems“ – Manning Press

Used to be working at Backtype and Twitter

Creator of

•  Storm

•  Cascalog

•  ElephantDB

24. April 2013 Big Data und Fast Data

5

Page 6: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Agenda

1.  Big Data, what is it?

2.  Motivation

3.  The Lambda Architecture

4.  Implementing the Lambda Architecture

5.  Summary

24. April 2013 Big Data und Fast Data

6

Page 7: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Big Data Definition (Gartner et al)

14.02.2013 Big Data 4 Sales

7

Velocity

Tera-, Peta-, Exa-, Zetta-, Yota- bytes and constantly growing

“Traditional” computing in RDBMS is not scalable enough.

We search for “linear scalability”

“Only … structured information is not enough” – “95% of produced data in

unstructured”

Characteristics of Big Data: Its Volume, Velocity and Variety in combination

+ Veracity (IBM) - information uncertainty + Time to action ? – Big Data + Event Processing = Fast Data

Page 8: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Big Data Emerging Technologies

24. April 2013 Big Data und Fast Data

8

§  MapReduce (e.g. Apache Hadoop)

§  Event Stream Processing & CEP (e.g. Storm or Esper)

§  New messaging systems (e.g. Apache Kafka)

§  Integration tools (e.g. Spring or Camus)

§  New database paradigms (e.g. NoSQL or NewSQL)

§  Data mining tools (e.g. Apache Mahout )

§  Data extraction and detection tools (e.g. Apache Tika )

Page 9: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

14.02.2013 Big Data 4 Sales

9

Page 10: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Volume Development

0

20

40

60

80

100

0

2000

4000

6000

8000

2005 2007 2009 2011 2013 2015

Agg

rega

te U

ncer

tain

ty %

Glo

bal D

ata

Volu

me

in E

xaby

tes

Year

Sensors: “internet of things”

Social Media: video, audio, text

VoIP: Skype, MSN, ICQ, ...

Enterprise Data: data dictionary, ERD, ...

24. April 2013 Big Data und Fast Data

10

Page 11: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Velocity

24. April 2013 Big Data und Fast Data

11

§  Velocity requirement examples: §  Recommendation Engine §  Predictive Analytics §  Marketing Campaign Analysis §  Customer Retention and Churn Analysis §  Social Graph Analysis §  Capital Markets Analysis §  Risk Management §  Rogue Trading §  Fraud Detection §  Retail Banking §  Network Monitoring §  Research and Development

Page 12: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Agenda

1.  Big Data, what is it?

2.  Motivation

3.  The Lambda Architecture

4.  Implementing the Lambda Architecture

5.  Summary

24. April 2013 Big Data und Fast Data

12

Page 13: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

What is a data system?

•  A system that manages the storage and querying of data with a lifetime measured in years encompassing every version of the application to ever exist, every hardware failure and every human mistake ever made.

•  A data system answers questions based on information that was acquired in the past

•  Not all bits of information are equal •  Some information is derived from other

24. April 2013 Big Data und Fast Data

13

Page 14: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Desired Properties of a (Big) Data System

Robust and fault-tolerant

Low latency reads and updates

Scalable

General

Extensible

Allows ad hoc queries

Minimal maintenance

Debuggable

24. April 2013 Big Data und Fast Data

14

Page 15: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Typical problem in today’s architecture/systems

Bugs will be deployed to production over the lifetime of a data system

Operational mistakes will be made

Humans are part of the overall system •  Just like hard disks, CPUs, memory, software •  design for human error like you design for any other fault

Examples of human error •  Deploy a bug that increments counters by two instead of by one •  Accidentally delete data from database •  Accidental DOS on important internal service

Worst two consequences: data loss or data corruption

As long as an error doesn‘t lose or corrupt good data, you can fix what went wrong

24. April 2013 Big Data und Fast Data

15

Lack of Human Fault Tolerance

Page 16: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Mutability

The U and D in CRUD

A mutable system updates the current state of the world

Mutable systems inherently lack human fault-tolerance

Easy to corrupt or lose data

24. April 2013 Big Data und Fast Data

16

Capturing change traditionally

Lack of Human Fault Tolerance

Name City Guido Berne Albert Zurich

Name City Guido Basel Albert Zurich

Page 17: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Immutability

An immutable system captures historical records of events

Each event happens at a particular time and is always true

24. April 2013 Big Data und Fast Data

17

Capturing change by storing events

Lack of Human Fault Tolerance

Name City Timestamp Guido Berne 1.8.1999 Albert Zurich 10.5.1988

Name City Timestamp Guido Berne 1.8.1999 Albert Zurich 10.5.1988 Guido Basel 1.4.2013

Page 18: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Immutability

Immutability greatly restricts the range of errors that can cause data loss or data corruption

Vastly more human fault-tolerant

Much easier to reason about systems based on immutability

Conclusion: Your source of truth should always be immutable

24. April 2013 Big Data und Fast Data

18

Lack of Human Fault Tolerance

Page 19: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

What about traditional/today’s architectures ?

Source of Truth is mutable!

Rather than build systems like this ….

24. April 2013 Big Data und Fast Data

19

Mutable Database

Application (Query)

RDBMS NoSQL

NewSQL

Mobile Web RIA

Rich Client

Source of Truth

Source of Truth

Page 20: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

A different kind of architecture with immutable source of truth

… why not building them like this

24. April 2013 Big Data und Fast Data

20

HDFS NoSQL

NewSQL RDBMS

View on Data

Mobile Web RIA

Rich Client

Source of Truth

Immutable data

View on Data

Application (Query)

Source of Truth

Page 21: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

How to create the views on the Immutable data?

On the fly ?

Materialized, i.e. Pre-computed ?

24. April 2013 Big Data und Fast Data

21

Immutable data View

Immutable data

Pre- Computed

Views

Query

Query

Page 22: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Data = the most raw information

Data is information which is not derived from anywhere else •  The most raw form of information •  Data is the special information from which everything else is derived

Questions on data can be answered by running functions that take data as input

The most general purpose data system can answer questions by running functions that take the entire dataset as input

query = function (all data)

The lambda architecture provides a general purpose approach for implementing arbitrary functions on an arbitrary datasets

24. April 2013 Big Data und Fast Data

22

Page 23: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Data = the most raw information

24. April 2013 Big Data und Fast Data

23

1.2.13 Add iPAD 64GB 10.3.13 Add Sony RX-100 11..3.13 Add Canon GX-10 11.3.13 Remove Sony RX-100 12.3.13 Add Nikon S-100 14.4.13 Add BoseQC-15 15.4.13 Add MacBook Pro 15 20.4.13 Remove Canon GX10

iPAD 64GB Nikon S-100 BoseQC-15 MacBook Pro 15

4 derive derive

Favorite Product List Changes Current Favorite

Product List Current Product Count

Raw information => data Information => derived

Page 24: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Big Data and Batch Processing

24. April 2013 Big Data und Fast Data

24

Immutable data

Batch View Query ? ? Incoming

Data

How to compute the batch views ?

How to compute queries from the views ?

Page 25: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Big Data and Batch Processing

24. April 2013 Big Data und Fast Data

25

Fully processed data Last full batch period

Time forbatch job

time

now non-processed data

time

now

batch-processed data

§  Using only batch processing, leaves you always with a portion of non-processed data.

Adapted from Ted Dunning (March 2012): http://www.youtube.com/watch?v=7PcmbI5aC20

But we are not done yet …

Page 26: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Adding Real-Time Processing

24. April 2013 Big Data und Fast Data

26

Immutable data

Batch Views

Query

? Data Stream

Realtime Views

Incoming Data

How to compute queries from the views ? How to compute real-time views

Page 27: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Adding Real-Time Processing

24. April 2013 Big Data und Fast Data

27

1.2.13 Add iPAD 64GB 10.3.13 Add Sony RX-100 11..3.13 Add Canon GX-10 11.3.13 Remove Sony RX-100 12.3.13 Add Nikon S-100 14.4.13 Add BoseQC-15 15.4.13 Add MacBook Pro 15 20.4.13 Remove Canon GX10 Now Add Canon Scanner

iPAD 64GB Nikon S-100 BoseQC-15 MacBook Pro 15

5

compute

Favorite Product List Changes Current Favorite

Product List

Current Product Count

Now Canon Scanner compute Add Canon Scanner

Stream of Favorite Product List Changes

Immutable data

Views

Data Stream

Query

Page 28: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Big Data and Real Time Processing

24. April 2013 Big Data und Fast Data

28

time

Fully processed data Last full batch period

now

Time forbatch job

batch processingworked fine here

(e.g. Hadoop)

real time processingworks here

blended view for end user

Adapted from Ted Dunning (March 2012): http://www.youtube.com/watch?v=7PcmbI5aC20

Page 29: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Agenda

1.  Big Data, what is it?

2.  Motivation

3.  The Lambda Architecture

4.  Implementing the Lambda Architecture

5.  Summary

24. April 2013 Big Data und Fast Data

29

Page 30: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Lambda Architecture

24. April 2013 Big Data und Fast Data

30

Immutable data

Batch View

Query

Data Stream

Realtime View

Incoming Data

Serving Layer

Speed Layer

Batch Layer

A

B C D

E F

G

Page 31: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Lambda Architecture

A.  All data is sent to both the batch and speed layer

B.  Master data set is an immutable, append-only set of data

C.  Batch layer pre-computes query functions from scratch, result is called Batch Views. Batch layer constantly re-computes the batch views.

D.  Batch views are indexed and stored in a scalable database to get particular values very quickly. Swaps in new batch views when they are available

E.  Speed layer compensates for the high latency of updates to the Batch Views in the Serving layer.

F.  Uses fast incremental algorithms and read/write databases to produce real-time views

G.  Queries are resolved by getting results from both batch and real-time views

24. April 2013 Big Data und Fast Data

31

Page 32: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Layered Architecture

Stores the immutable constantly growing dataset Computes arbitrary views from this dataset using BigData technologies (can take hours) Can be always recreated Responsible for indexing and exposing the pre-computed batch views so that they can be queried Exposes the incremented real-time views Merges the batch and the real-time views into a consistent result Computes the views from the constant stream of data it receives Needed to compensate for the high latency of the batch layer Incremental model and views are transient

24. April 2013 Big Data und Fast Data

32

Serving Layer

Batch Layer

Speed Layer

Page 33: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Agenda

1.  Big Data, what is it?

2.  Motivation

3.  The Lambda Architecture

4.  Implementing the Lambda Architecture

5.  Summary

24. April 2013 Big Data und Fast Data

33

Page 34: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Lambda Architecture

24. April 2013 Big Data und Fast Data

34

Speed Layer

Precompute Views

query

Source: Marz, N. & Warren, J. (2013) Big Data. Manning.

Batch Layer

Precomputed information All data

Incremented information Process stream

Incoming Data

Batch recompute

Realtime increment

Serving Layer

batch view

batch view

real time view

real time view

Mer

ge

Page 35: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Lambda Architecture

24. April 2013 Big Data und Fast Data

35

one possible product/framework mapping

Speed Layer

Precompute Views

query

Batch Layer

Precomputed information All data

Incremented information Process stream

Incoming Data

Batch recompute

Realtime increment

Serving Layer

batch view

batch view

real time view

real time view

Mer

ge

Page 36: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Implementing Batch Layer

Immutable Data

•  Append only

•  Normalized

•  Stores master copy of all data

Pre-computed information

•  Function that takes all data as input

query = function(all-data)

•  High Latency, Batch processing

•  Unrestrained computation

•  Horizontal scalable

24. April 2013 Big Data und Fast Data

36

Immutable data

Batch Views compute

Precompute Views

Batch Layer

Precomputed information All data

Batch recompute

Batch Layer Serving Layer

Page 37: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Apache Hadoop HDFS

HDFS = the Hadoop Distributed File System

A distributed file storage system

Redundant storage

Designed to reliably store data using commodity hardware

Designed to expect hardware failures

Intended for large files

Designed for batch inserts

24. April 2013 Big Data und Fast Data

37

Batch Layer

Page 38: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Apache Hadoop Map Reduce

24. April 2013 Big Data und Fast Data

38

§  Hadoop Map Reduce is an open source implementation of the MapReduce framework.

§  Map Reduce is §  a programming model, introduced by Google, for processing large data sets,

in a distributed environment §  De-facto standard to compute huge amounts of data §  An execution framework for organizing and performing such computations

MAP

master node

REDUCE

worker node 1

worker node 2

worker node 3

problem data

solution data

Batch Layer

Page 39: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Hadoop MapReduce Flow

24. April 2013 Big Data und Fast Data

39

Source: Bill Graham, Twitter Inc.

Batch Layer

Page 40: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Hadoop MapReduce

24. April 2013 Big Data und Fast Data

40

Batch Layer

Page 41: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Cascading

Application framework for Java developers to simply develop robust Data Analytics and Data Management applications on Apache Hadoop

adds an abstraction layer over the Hadoop API

core concepts of the cascading API: •  Pipe: a series of processing steps (parsing, looping, filtering, etc) defining the

data processing to be done •  Flow: association of a pipe (or set of pipes) with a data-source and data-sink

24. April 2013 Big Data und Fast Data

41

Batch Layer

Page 42: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Casading

24. April 2013 Big Data und Fast Data

42

Page 43: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Apache Pig

Apache Pig is a platform for analyzing large data sets

Key Properties

•  Ease of programming

•  Optimization opportunities

•  Extensibility

24. April 2013 Big Data und Fast Data

43

Batch Layer

Page 44: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Implementing Serving Layerfor Batch Views

Need a database that •  Is batch-writable •  Adding new information is atomic •  Has fast random reads •  Is scalable •  Is highly available •  Can be optimized for Storage

•  Information can be de-normalized

•  But no Random writes required!

•  Can be a simple database

24. April 2013 Big Data und Fast Data

44

Serving Layer

batch view

batch view

Batch Layer

Precomputed information

Immutable data

Batch Views compute

Batch Layer Serving Layer

Page 45: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

SploutSQL

Full SQL => unlike NoSQL

For BigData => unlike RDBMS

Web latency & throughput => unlike Apache Hive, Apache Drill

Why does it scale •  Data is partitioned •  Partitions are distributed

across nodes •  Adding more nodes

increase capacity •  Generation does not

impact serving

24. April 2013 Big Data und Fast Data

45

Serving Layer

Source: Datasalt.

Page 46: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

SploutSQL

24. April 2013 Big Data und Fast Data

46

Serving Layer

Page 47: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Implementing Speed Layer

Stream Processing

Continuous computation

Transactional

Storing a limited window of data •  Compensating for the last few

hours of data

All the complexity is isolated in the speed layer

•  If anything goes wrong, it‘s autocorrected by the next batch run

24. April 2013 Big Data und Fast Data

47

Speed Layer

Incremented information Process stream

Realtime increment

Data Stream

RealtimeViews derive

Speed Layer Serving Layer

Page 48: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Apache Kafka

A high throughput distributed messaging system

Originated at LinkedIn

Sequential disk access

24. April 2013 Big Data und Fast Data

48

Page 49: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Twitter Storm – the “real-time Hadoop”

24. April 2013 Big Data und Fast Data

49

§  Strom is a distributed and fault-tolerant real-time computing platform §  data flow model, data flows through network of transformation entities

§  Key concepts §  Tuple: ordered list of elements §  Streams: unbounded sequence of tuples §  Spouts: Source of streams §  Bolts: Process tuples and create new streams §  Topologies: directed graph of Spouts and Bolts

§  Use Cases §  Stream Processing §  Continuous Computation §  Distributed RPC

SPOUT

BOLT

„MAP“ „REDUCE“

„PERSIST“

problem data

data source

solution data

Speed Layer

Serving Layer

BOLT

BOLT

Page 50: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Twitter Storm

24. April 2013 Big Data und Fast Data

50

Speed Layer

Serving Layer

Page 51: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Twitter Trident

Higher level abstraction over Storm

Trident State

Grouped Stream

Functions, Filters

Aggregators

Query

Similar to Pig and Cascading

24. April 2013 Big Data und Fast Data

51

Speed Layer

Serving Layer

Page 52: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Twitter Trident

24. April 2013 Big Data und Fast Data

52

Speed Layer

Serving Layer

Page 53: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Implementing Serving Layerfor Real-Time Views

Incremental updates are made available as real-time views

Requires a database that support random read and random writes •  Relational, NoSQL or NewSQL (in memory) databases can be used •  Here we are typically not in the BigData range

Results are only needed until the data made it through the batch layer

Complexity isolation

24. April 2013 Big Data und Fast Data

53

Data Stream

RealtimeViews derive

Speed Layer Serving Layer

Speed Layer Serving Layer

real time view

real time view

Incremented information

Page 54: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Cassandra

Fully distributed, no single-point-of-failure

Linearly scalable

Fault tolerant

Performant

Durable

Integrated caching

Tunable consistency

24. April 2013 Big Data und Fast Data

54

Serving Layer

Page 55: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Implementing Serving LayerMerge of Batch and Realtime Views

An interesting feature of Storm / Trident is the ability to execute distributed RPC (DRPC) calls in parallel

This can be used to implement the merge functionality when a query is executed

24. April 2013 Big Data und Fast Data

55

Serving Layer

batch view

batch view

real time view

real time view

RealtimeViews

Serving Layer

Batch Views

Mer

ge

query

Page 56: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Storm / Trident DRPC

24. April 2013 Big Data und Fast Data

56

Serving Layer

Page 57: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Agenda

1.  Big Data, what is it?

2.  Motivation

3.  The Lambda Architecture

4.  Implementing the Lambda Architecture

5.  Summary

24. April 2013 Big Data und Fast Data

57

Page 58: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

Summary – The lambda architecture

24. April 2013 Big Data und Fast Data

58

§  The Lambda Architecture §  Can discard batch views and real-time views and recreate everything from

scratch §  Mistakes corrected via re-computation §  Data storage layer optimized independently from query resolution layer §  Still in a very early …. But a very interesting idea!

-  Today a zoo of technologies are needed => Operations won‘t like it §  Different query language for batch and real time §  An abstraction over batch and speed layer needed

-  Cascading and Trident are already similar §  Industry standards needed!

Page 59: Big Data and Fast Data - big and fast combined, is it possible?

2013 © Trivadis

BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

THANK YOU. Trivadis AG

Guido Schmutz & Albert Blarer

Europa-Strasse 5CH-8095 Glattbrugg

[email protected] www.trivadis.com

24. April 2013 Big Data und Fast Data

59