spark summit eu talk by william benton

Post on 06-Jan-2017

295 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

William Benton Red Hat, Inc. @willb • willb@redhat.com

CONTAINERIZED SPARK ON KUBERNETES

BACKGROUND

BACKGROUND

BACKGROUND

BACKGROUND

BACKGROUND

BACKGROUND

BACKGROUND

BACKGROUND

WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014

WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014

Networked POSIX FS

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Mesos

WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014

Networked POSIX FS

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Mesos

WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014

Networked POSIX FS

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

1

2

3

4

Mesos

WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014

Networked POSIX FS

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

1

2

3

4

1

1

2

3

3

4

Mesos

WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014

Networked POSIX FS

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

1

2

3

4

1

1

2

3

3

4

Analytics is no longer a separate workload.

Analytics is an essential component of modern data-driven applications.

OUR GOALS

OUR GOALS

OUR GOALS

OUR GOALS

OUR GOALS

git

OUR GOALS

git

FORECAST

Motivating containerized microservices

Architectures for analytics and applications

Spark clusters in containers: practicalities and pitfalls

Play along at home

Future work

MOTIVATING MICROSERVICES

A microservice architecture employs lightweight, modular, and typically stateless components with well-defined interfaces and contracts.

BENEFITS OF MICROSERVICE ARCHITECTURES

BENEFITS OF MICROSERVICE ARCHITECTURES

BENEFITS OF MICROSERVICE ARCHITECTURES

BENEFITS OF MICROSERVICE ARCHITECTURES

BENEFITS OF MICROSERVICE ARCHITECTURES

BENEFITS OF MICROSERVICE ARCHITECTURES

BENEFITS OF MICROSERVICE ARCHITECTURES

BENEFITS OF MICROSERVICE ARCHITECTURES

2 + 2

BENEFITS OF MICROSERVICE ARCHITECTURES

2 + 2 5

BENEFITS OF MICROSERVICE ARCHITECTURES

2 + 2 5

BENEFITS OF MICROSERVICE ARCHITECTURES

BENEFITS OF MICROSERVICE ARCHITECTURES

BENEFITS OF MICROSERVICE ARCHITECTURES

?

BENEFITS OF MICROSERVICE ARCHITECTURES

?

BENEFITS OF MICROSERVICE ARCHITECTURES

?

MICROSERVICES AND SPARK

executor

1 2 3

executor

4 5 6

executor

7 8 9

executor

10 11 12

master

MICROSERVICES AND SPARK

executor

1 2 3

executor

4 5 6

executor

7 8 9

executor

10 11 12

master

λ x: x * 2

MICROSERVICES AND SPARK

executor

1 2 3

executor

4 5 6

executor

7 8 9

executor

10 11 12

master

λ x: x * 22 4 6 8 10 12 14 16 18 20 22 24

λ x: x * 2 λ x: x * 2 λ x: x * 2 λ x: x * 2

MICROSERVICES AND SPARK

executor

1 2 3

executor

4 5 6

executor

7 8 9

executor

10 11 12

master

λ x: x * 22 4 6 8 10 12 14 16 18 20 22 24

λ x: x * 2 λ x: x * 2 λ x: x * 2 λ x: x * 2

MICROSERVICES AND SPARK

executor

1 2 3

executor

4 5 6

executor

7 8 9

executor

10 11 12

master

λ x: x * 22 4 6 8 10 12 14 16 18 20 22 24

λ x: x * 2 λ x: x * 2 λ x: x * 2 λ x: x * 2

ARCHITECTURES FOR ANALYTICS AND APPLICATIONS

APPLICATION RESPONSIBILITIES

transform

transform

transform

events

databases

file, object storage

APPLICATION RESPONSIBILITIES

transform

transform

transform

aggregate

events

databases

file, object storage

APPLICATION RESPONSIBILITIES

trainmodels

transform

transform

transform

aggregate

events

databases

file, object storage

APPLICATION RESPONSIBILITIES

trainmodels

transform

transform

transform

aggregate

events

databases

file, object storage

APPLICATION RESPONSIBILITIES

archive

trainmodels

transform

transform

transform

aggregate

events

databases

file, object storage

APPLICATION RESPONSIBILITIES

archive

trainmodels

transform

transform

transform

aggregate

events

databases

file, object storage

APPLICATION RESPONSIBILITIES

archive

trainmodels

transform

transform

transform

aggregate

events

databases

file, object storage

web and mobile

reporting

developer UI

APPLICATION RESPONSIBILITIES

archive

trainmodels

transform

transform

transform

aggregate

events

databases

file, object storage

management

web and mobile

reporting

developer UI

LEGACY ARCHITECTURES

CONVENTIONAL DATA WAREHOUSE

events

CONVENTIONAL DATA WAREHOUSE

transformevents

CONVENTIONAL DATA WAREHOUSE

transformevents

UI

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS RDBMS

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS RDBMS

analysis

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS RDBMS

analysis

reporting

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS RDBMS

analysis

reporting

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS RDBMS

analysis

interactivequeryreporting

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS analyticprocessing

RDBMS

analysis

interactivequeryreporting

HADOOP-STYLE “DATA LAKE”

HDFS HDFS HDFS

HADOOP-STYLE “DATA LAKE”

HDFS HDFS HDFS HDFS HDFS

HADOOP-STYLE “DATA LAKE”

HDFS

events

HDFS HDFS HDFS HDFS

HADOOP-STYLE “DATA LAKE”

HDFS

events

HDFS HDFS HDFS HDFS

HADOOP-STYLE “DATA LAKE”

HDFS

compute

events

HDFS

compute

HDFS

compute compute compute

HDFS HDFS

MODERN ARCHITECTURES

THE LAMBDA ARCHITECTURE

events

(imprecise)analysistransform

THE LAMBDA ARCHITECTURE

events

(imprecise)analysistransform

DFS

speed layer

THE LAMBDA ARCHITECTURE

events

(imprecise)analysistransform

DFS

speed layer

THE LAMBDA ARCHITECTURE

events

(precise) analysistransform

(imprecise)analysistransform

DFS

speed layer

THE LAMBDA ARCHITECTURE

events

batch layer

(precise) analysistransform

(imprecise)analysistransform

DFS

speed layer

THE LAMBDA ARCHITECTURE

events

batch layer

federate

(precise) analysistransform

(imprecise)analysistransform

DFS

speed layer

THE LAMBDA ARCHITECTURE

events

batch layer

UIfederate

(precise) analysistransform

(imprecise)analysistransform

DFS

serving layerspeed layer

THE LAMBDA ARCHITECTURE

events

batch layer

UIfederate

(precise) analysistransform

(imprecise)analysistransform

DFS

THE KAPPA ARCHITECTURE

events

queue for “raw data” topic

THE KAPPA ARCHITECTURE

events

queue for “raw data” topic

THE KAPPA ARCHITECTURE

events

transform

queue for “preprocessed data” topic

queue for “raw data” topic

THE KAPPA ARCHITECTURE

events

transform analysis

queue for “preprocessed data” topic

queue for “analysis results” topic

queue for “raw data” topic

THE KAPPA ARCHITECTURE

events

transform analysis

queue for “preprocessed data” topic

queue for “analysis results” topic

reporting end-user UI

DATA FEDERATION IN THE COMPUTE LAYER

aggregate

trainmodels

archive

events

databases

file, object storage

management

web and mobile

reporting

developer UItransform

transform

transform

DATA FEDERATION IN THE COMPUTE LAYER

aggregate

trainmodels

archive

events

databases

file, object storage

management

web and mobile

reporting

developer UItransform

transform

transform

DATA FEDERATION IN THE COMPUTE LAYER

aggregate

trainmodels

archive

events

databases

file, object storage

management

web and mobile

reporting

developer UItransform

transform

transform

Cluster scheduler

SIDEBAR: THE MONOLITHIC SPARK ANTIPATTERN

Shared FSSpark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Resource manager

app 1 app 2

app 4app 3

Cluster scheduler

SIDEBAR: THE MONOLITHIC SPARK ANTIPATTERN

Shared FSSpark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Resource manager

app 1 app 2

app 4app 3

Resource manager

ONE CLUSTER PER APPLICATION

Object storesapp 1 app 2

app 5app 4

app 3

app 6

Databases

Resource manager

ONE CLUSTER PER APPLICATION

Object storesapp 1 app 2

app 5app 4

app 3

app 6

app 2

app 4

Databases

PRACTICALITIES AND POTENTIAL PITFALLS

SCHEDULING

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

Object stores

Databases

SCHEDULING

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1

SCHEDULING

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1

SCHEDULING

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1

SCHEDULING

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1

SCHEDULING

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1

SECURITY

SECURITY

SECURITY

$SPARK_HOME/bin/spark-class \ org.apache.spark.deploy.worker.Worker \ master:7077

pid

root

net

SECURITY

$SPARK_HOME/bin/spark-class \ org.apache.spark.deploy.worker.Worker \ master:7077

pid

root

net

/tmp/foo

SECURITY

SECURITY

k8s namespace

SECURITY

k8s namespace*

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

POSIX filesystem

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

POSIX filesystem

✓ familiar interface

✓ interoperability with other programs

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

POSIX filesystem

✓ familiar interface

✓ interoperability with other programs

✗ unnecessary semantic guarantees

✗ difficult to manage

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

HDFS HDFS

HDFS HDFS

HDFS

HDFS

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

HDFS HDFS

HDFS HDFS

HDFS

HDFS

✓ support for legacy Hadoop installations

✗ inelastic ✗ stateful ✗ can’t collocate

compute and data

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

HDFS HDFS

HDFS HDFS

HDFS

HDFS

✓ support for legacy Hadoop installations

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

object store

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

object store

✓ interoperability

✓ fine-grained AC

✓ many implementations

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

object store

✓ interoperability

✓ fine-grained AC

✓ many implementations

✗ consistency model ✗ performance (?)

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

object store

✓ interoperability

✓ fine-grained AC

✓ many implementations

✗ consistency model ✗ performance

“…in a cloud native architecture, the benefit of HDFS is actually very small and that is why many cloud-first organizations no longer run HDFS, or only run it as a caching layer for S3.”

—Reynold Xin on Quora (http://qr.ae/TAF4cN)

NETWORKING

NETWORKING

NETWORKING

NETWORKINGhttp://app1:8080

NETWORKINGhttp://app1:8080

✗ can’t access worker web UI (but wait for Spark 2.1!)

NETWORKINGhttp://app1:8080

✗ can’t access worker web UI (but wait for Spark 2.1!)

NETWORKINGhttp://app1:8080

✗ can’t access worker web UI (but wait for Spark 2.1!)

http://app1:80

NEXT STEPS: FUTURE WORK & PLAYING ALONG AT HOME

NEXT STEPS

Further performance evaluation

Better developer experience

Improved scheduling of Spark tasks on Kubernetes

TRY IT OUT YOURSELF

Kubernetes standalone Spark example: https://github.com/kubernetes/kubernetes/tree/master/examples/spark

Enabling Spark on OpenShift: https://github.com/radanalyticsio

Native Spark on Kubernetes proposal: https://github.com/kubernetes/kubernetes/issues/34377

@willb • willb@redhat.comhttps://chapeau.freevariable.com

THANKS!

top related