big containers, big orchestration, big data · big containers, big orchestration, big data william...

50
BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

Upload: vothien

Post on 09-Sep-2018

258 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

BIG CONTAINERS, BIG ORCHESTRATION, BIG DATAWilliam Benton Red Hat, Inc.@willb

Page 2: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

BACKGROUND

Page 3: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

Mesos

WHAT OUR CLUSTER LOOKED LIKE IN 2014

Networked POSIX FS

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

1

2

3

4

1

1

2

3

3

4

Page 4: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

Analytics is no longer a separate workload.Analytics is an essential component of modern data-driven applications.

Page 5: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

OUR GOALS

git

Page 6: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

FORECAST

Spark and microservices

Architectures for analytics and applications

Scheduling and storage

Future work (and how to get involved)

Page 7: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

SPARK AND MICROSERVICES

Page 8: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

Apache Spark is a fast and general framework for distributed data processing.

Page 9: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

Resilient Distributed Datasets are partitioned, lazy, and immutable homogeneous collections.

Page 10: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

RESILIENT DISTRIBUTED DATASETS

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

2 3 4 6 7 8 10 11 121 5 9 13 14 15 16

Page 11: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

RESILIENT DISTRIBUTED DATASETS

Page 12: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

1 2 3 λ x: x % 2 != 0 λ x: x * 3

FILTER MAP

λ x: [x, x+1]

FLATMAP

Page 13: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

3 λ x: x % 2 != 0 λ x: x * 3

FILTER MAP

λ x: [x, x+1]

FLATMAP

3 4 9 10COLLECT

Page 14: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

1 2 3 λ x: x % 2 != 0λ x: x * 3

FILTERMAP

λ x: [x, x+1]

FLATMAP

3 4 9 10SAVE AS TEXT FILE

CACHE

Page 15: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

executor1

1 2 3

executorn

10 11 12

cluster manager

2 4 6 20 22 24

λ x: x * 2λ x: x * 2

driver

CACHCACH

Page 16: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

Spark core

Graph SQL ML Streaming

ad hoc Mesos YARN

Page 17: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

Spark core

Graph SQL ML Streaming

ad hoc Mesos YARNk8s

Page 18: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

A microservice architecture employs lightweight, modular, and typically stateless components with well-defined interfaces and contracts.

Page 19: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

BENEFITS OF MICROSERVICE ARCHITECTURES

Page 20: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

BENEFITS OF MICROSERVICE ARCHITECTURES

Page 21: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

BENEFITS OF MICROSERVICE ARCHITECTURES

2 + 2 5

Page 22: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

MICROSERVICES AND SPARK

executor

1 2 3

executor

4 5 6

executor

7 8 9

executor

10 11 12

master

λ x: x * 22 4 6 8 10 12 14 16 18 20 22 24

λ x: x * 2 λ x: x * 2 λ x: x * 2 λ x: x * 2

Page 23: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

ARCHITECTURES FOR ANALYTICS AND APPLICATIONS

Page 24: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

APPLICATION RESPONSIBILITIES

archive

trainmodels

transform

transform

transform

aggregate

events

databases

file, object storage

Page 25: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

APPLICATION RESPONSIBILITIES

archive

trainmodels

transform

transform

transform

aggregate

events

databases

file, object storage

management

web and mobile

reporting

developer UI

Page 26: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

LEGACY ARCHITECTURES

Page 27: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS

Page 28: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS analytic processing

RDBMS

analysis

interactive queryreporting

Page 29: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

HADOOP-STYLE “DATA LAKE”

HDFS

events

HDFS HDFS HDFS HDFS

Page 30: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

HADOOP-STYLE “DATA LAKE”

HDFS

compute

events

HDFS

compute

HDFS

compute compute compute

HDFS HDFS

Page 31: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

MODERN ARCHITECTURES

Page 32: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

serving layerspeed layer

THE LAMBDA ARCHITECTURE

events

batch layer

UIfederate

(precise)analysistransform

(imprecise)analysistransform

DFS

Page 33: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

queue for “raw data” topic

THE KAPPA ARCHITECTURE

events

transform analysis

queue for “preprocessed data” topic

queue for “analysis results” topic

reporting end-user UI

Page 34: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

DATA FEDERATION IN THE COMPUTE LAYER

aggregate

trainmodels

archive

events

databases

file, object storage

management

web and mobile

reporting

developer UItransform

transform

transform

Page 35: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

PRACTICALITIES AND POTENTIAL PITFALLS

Page 36: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

Cluster scheduler

SIDEBAR: THE MONOLITHIC SPARK ANTIPATTERN

Shared FSSpark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Resource manager

app 1 app 2

app 4app 3

Page 37: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

OpenShift

ONE CLUSTER PER APPLICATION

Object storesapp 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

Databases

Page 38: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

OpenShift

app 1 app 2

app 5app 4

app 3

app 6

app 1

Page 39: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

OpenShift

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

POSIX FS

HDFS HDFS

HDFS HDFS

HDFS

HDFS

Page 40: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

OpenShift

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

object store

✓ interoperability✓ fine-grained AC✓ many implementations

✗ consistency model✗ performance

Page 41: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

“For the workloads from Facebook and Bing, we see that 96% and 89% of the active jobs respectively can have their data entirely fit in memory, given an allowance of 32GB memory per server for caching”

—“PACMan: Coordinated Memory Caching for Parallel Jobs.” G. Ananthanarayanan et al., in Proceedings of NSDI ’12.

Page 42: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

“Recent studies have shown that reading data from local disks is only about 8% faster than reading it from remote disks over the network … [and] this 8% number is decreasing.”

—Tom Phelan, “The Elephant in the Big Data Room: Data Locality is Irrelevant for Hadoop” (goo.gl/MnCKuM)

Page 43: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

“Three out of ten hours of job runtime were spent moving files from the staging directory to the final directory in HDFS…We were essentially compressing, serializing, and replicating three copies for a single read.”

—“Apache Spark @Scale: a 60+ TB production use case”Facebook Engineering Blog Post

Page 44: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

executor1 executornCACHCACH

driver

Page 45: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

COLOCATED COMPUTE AND STORAGE: YAGNI

Disk locality is just another kind of caching, but memory is much faster than disk and working set sizes typically fit in cluster memory after ETL.

The I/O-heavy behavior of frameworks designed for colocated compute and storage performs worse than iterative processing in memory.

Colocating compute and storage prevents independent scale-out of compute and turns “cattle” into “pets.”

Page 46: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

…BUT IF YOU DOOpenShift

app 1 app 2 app 3app 1 app 2 app 3

Storage

Page 47: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

…BUT IF YOU DOOpenShift

app 1 app 2 app 3app 1 app 2 app 3

Storage Storage Storage

Page 48: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

PLAYING ALONG AT HOME

Page 49: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

TRY IT OUT YOURSELF

Enabling Spark on OpenShift: https://github.com/radanalyticsio

Video demo: https://vimeo.com/189710503

Meet the teams at lunch!

Page 50: BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA · BIG CONTAINERS, BIG ORCHESTRATION, BIG DATA William Benton Red Hat, Inc. @willb

@willb • [email protected] https://chapeau.freevariable.com

THANKS!