leveraging docker for hadoop build automation and big … · leveraging docker for hadoop build...

54
Leveraging Docker for Hadoop Build Automation and Big Data Stack Provisioning PRESENTED BY Evans YeMay 16, 2017 Apache Big Data North America 2017

Upload: lythuan

Post on 15-May-2018

246 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Leveraging Docker for Hadoop Build Automation and Big Data Stack Provisioning

PRESENTED BY Evans Ye| May 16, 2017

Apache Big Data North America 2017

Page 2: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Who am I

2

▪Software Engineer @ Y! APAC Data Team

▪Building data products for...

▪Apache Bigtop PMC chair

Page 3: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Outline

3

▪Quick Intro to Apache Bigtop

▪Docker for Bigtop Packaging

▪Docker for Bigtop Provisioner

▪Docker for Bigtop Sandbox

▪Release

Page 4: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Quick Intro to Apache Bigtop

Page 5: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Linux Distributions

5

Page 6: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Hadoop Distributions

6

Page 7: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

7

But there're some other great Hadoop ecosystem components..

Page 8: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

8

How do I add patches?

Page 9: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

9

Page 10: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

From source code to packages

10

BigtopPackaging

Page 11: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Supported components

11

Page 12: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Bigtop feature set

12

Packaging Testing Deployment Virtualization

for you to easily build your own Big Data Stack

Page 13: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Docker for Bigtop Packaging

Page 14: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Preparing build environment

14

Page 15: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Preparing build environment

15

…Seriously ?

Page 16: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Bigtop Toolchain

16

▪Puppet recipes to install required libraries, build tools

▪To prepare a build environment:

▪Prerequisite :

▪Java

git clone https://github.com/apache/bigtop.git cd bigtop ./bigtop_toolchain/bin/puppetize.sh ./gradlew toolchain

Page 17: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

CI Infrastructure

17

CentOS slave

Fedora slave

Ubuntu slave

Debian slave

OpenSuSE slave

Page 18: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

CI Infrastructure

18

CentOS slave

Fedora slave

Ubuntu slave

Debian slave

OpenSuSE slave

Bigtop Toolchain

Bigtop Toolchain

Bigtop Toolchain

Bigtop Toolchain

Bigtop Toolchain

Page 19: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

CI Infrastructure

19

CentOS slave

Fedora slave

Ubuntu slave

Debian slave

OpenSuSE slave

Bigtop Toolchain

Bigtop Toolchain

Bigtop Toolchain

Bigtop Toolchain

Bigtop Toolchain

Page 20: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Dockerlized CI Infrastructure

20

CentOS slave

Fedora slave

Ubuntu slave

Debian slave

OpenSuSE slave

• Immutable env • Fault tolerance

Page 21: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Dockerlized CI Infrastructure

21

CentOS slave

Fedora slaveUbuntu slave

Debian slave

OpenSuSE slave

• Immutable env • Fault tolerance

Page 22: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

▪Execute shell

▪Bigtop CI Setup Guide

How to build packages

22

# OS=debian-8 # COMPONENT=hadoop

docker run -u jenkins --rm \ -v `pwd`:/bigtop --workdir /bigtop \ bigtop/slaves:trunk-$OS \ bash -l -c "./gradlew allclean $COMPONENT-pkg"

Page 23: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

23

Bigtop master

https://ci.bigtop.apache.org/view/Packages/job/Bigtop-trunk-packages/

Page 24: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Bigtop early mission accomplished

24

Leveraged by app providers…

Page 25: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Get out from the Apache dome

25

Page 26: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

New focus and target user

26

▪Data engineers vs Distro. builders

▪Solution diversity:

▪Streaming: Flink, Apex

▪ In-memory cache: Alluxio, Ignite

▪Non apache: QFS, GPDB

▪User/developer tools:

▪Bigtop Provisioner

▪Bigtop Sandbox

▪Big data stack references

Page 27: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Docker for Bigtop Provisioner

Page 28: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Bigtop Provisioner

28

▪A tool to demonstrate full life cycle of Bigtop

Packaging TestingDeploymentVirtualization

Create resources Run Bigtop Puppet Run Bigtop Tests

Bigtop Provisioner

Page 29: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

One click Hadoop provisioning(Bigtop 1.0.0)

29

bigtop/deploy image on Docker hub

./docker-hadoop.sh -c 3

puppet apply

puppet apply

puppet apply

Page 30: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

What’s the problem with Vagrant’s Docker Provider?

30

▪Need to add vagrant public key into docker images

▪Too many issues with auto-created boot2docker VM

▪A bug for docker provider keep opening for 2ys

▪Waiting for machine to boot' hangs infinitely

▪Can not share same code for different providers anyway

▪Not all the docker options supported in Vagrantfile

▪^#?& slow

Page 31: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Replaced by docker-compose (Bigtop 1.2.0)

31

bigtop/deploy image on Docker hub

./docker-hadoop.sh -c 3

puppet apply

puppet apply

puppet apply

Page 32: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Advantages

32

▪No need to create customized image beforehand

▪Better compatibility with Docker’s native solutions

▪Clear, simple yaml file for orchestration settings

▪Supports new features such as overlay network

▪Leverage Swarm for multi-node cluster deployment

▪Fast —> better user experience

Page 33: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

▪Execute shell

▪Bigtop CI Setup Guide

How to run Docker Provisioner

33

# See bigtop/provisioner/docker/*.yaml CONFIG=YOUR_CUSTOM_CONF.yaml

# provision ./gradlew -Pconfig=${CONFIG} -Pnum_instances=1 \ docker-provisioner

# destroy provisioned cluster ./gradlew docker-provisioner-destroy

Page 34: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

34

Visibility for deployments

Page 35: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Use Cases

35

▪For application developers, cluster admins, users

▪Run a Hadoop cluster to test your code on

▪Try & test configurations before applying to Production

▪Play around with Bigtop Big Data Stacks

▪For contributors

▪Easy to test your packaging, deployment, testing code

▪For Distro. builders

▪CI matrix —> patch upstream code made easier

Page 36: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Docker for Bigtop Sandbox

Page 37: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Introducing Bigtop Sandbox

37

▪Easiest way to get started

▪Docker images that has Bigtop stacks installed and configured

▪Pseudo cluster up & running w/ zero installation

▪Command-line tool for you to build your own stack

Page 38: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Docker Image layer Interface

38

Customizedbigdatastack

Deploy&managementtool

Baseimage(OS)

Page 39: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Docker Image layer Concrete implementation

39

HDFS+YARN+Spark

BigtopPuppet

bigtop/puppet:ubuntu-16.04

Page 40: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Building images

40

CentOS

BigtopPuppet

HDFS+YARN+Spark

+site.yaml

$ puppet apply

Page 41: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

How to build

41

▪Specify custom conf:

git clone https://github.com/apache/bigtop.git cd bigtop/docker/sandbox

./build.sh -a evansye -o ubuntu-16.04 \ -c hdfs,yarn,spark

./build.sh-a evansye -o ubuntu-16.04 \ -f site.yaml -t apache_big_data_2017_miami

Page 42: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Running images

42

Hadoop+Hbase+Spark

$ puppet apply

Page 43: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

How to run

43

docker run --name sandbox -d \ -p 50070:50070 -p 8088:8088 \ bigtop/sandbox:apache_big_data_2017_miami

docker logs -f sandbox

docker exec sandbox spark-example SparkPi

Page 44: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

44

Bigtop Provisioner Bigtop Sandbox

Scalable V X

Portable X V

Flexibility High Medium

Speed > 2 mins > 15 secs

Requires Network V X

Page 45: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

45

Bigtop Provisioner Bigtop Sandbox

Data engineers Multi-node cluster testing

Build/use sandboxes

for dev & test

Ops Multi-node cluster testing

Single node testing

ContributorsTest packages, puppet recipes,

test cases

Test packages, puppet recipes,

test cases

Distro. BuildersTest packages, puppet recipes,

test casesProvide Sandboxes

Page 46: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Integration test in CI/CD pipeline

46

UnitTest

Sourcecode

Compile

BuildImage

Integra7ontestwithSandbox

SandboxService

CDpipelinewithBigtopSandbox

DockerRegistry

PushImage

Deploy

FINISHED

Data

Page 47: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Future

47

▪Production deployment using Sandbox image

▪ --net host or SDN

▪External volumes for fsimage, data, logs, etc

▪Cluster orchestration

▪Kubernetes?

Page 48: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Release

Page 49: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

▪New components:

▪Ambari 2.5.0

▪GPDB 5.0.0-alpha.0(Greenplum)

Bigtop 1.2.0 Released Apr., 2017

49

▪Featured upgrade:

▪Hadoop 2.7.3

▪Spark 2.1.0

▪Kafka 0.10.1.1

▪HBase 1.1.3

▪and more

Page 50: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

▪New features:

▪Juju bigtop charms

▪Bigtop Sandbox (alpha)

▪ Improvement:

▪Bigtop Docker Provisioner made faster

What's new in Bigtop 1.2.0?

50

Page 51: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

Juju Cloud Weather Report

51 http://bigtop.charm.qa/

Page 52: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

▪AARCH 64 support

▪Enhance support set in Bigtop Puppet

▪Extend the CI matrix to Bigtop Tests

▪Ambari Bigtop integration

▪Big data stack references

Road ahead

52

Page 53: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

We want you!

53

▪Join mailing list, ask questions, suggest features, etc

▪Contribute (components, tutorials, docs)

▪Report bugs

▪ Reference

▪ Home page: http://bigtop.apache.org/

▪ mailing list: http://bigtop.apache.org/mail-lists.html

▪ Document: https://cwiki.apache.org/confluence/display/BIGTOP/Index

▪ Source code: https://github.com/apache/bigtop

▪ Packages: https://www.apache.org/dist/bigtop/bigtop-1.2.0/repos/

▪ JIRA: https://issues.apache.org/jira/browse/BIGTOP

Page 54: Leveraging Docker for Hadoop Build Automation and Big … · Leveraging Docker for Hadoop Build Automation and Big Data Stack ... Leverage Swarm for multi-node cluster deployment

54

Thank you !

Questions?