gerrit + jenkins = continuous delivery for big data

Gerrit + Jenkins = Continuous Delivery for Big Data

Mountain View, CA, November 2015

Stefano GalarragaGerritForge

stefano@gerritforge.comhttp://www.gerritforge.com

Real-life case study and future developments

The Team Luca Milanesio• Co-founder and Director of GerritForge • over 20 years in Agile Development and ALM• OpenSource contributor to many projects

(BigData, Continuous Integration, Git/Gerrit)

Antonios Chalkiopulos• Author of Programming MapReduce with Scalding• Open source contributor to many BigData projects• Working on the "land-of-Hadoop' (landoop.com)

Tiago Palma• Data Warehouse & Big Data Development• Senior Data Modeler• Big Data infrastructure specialist

Stefano Galarraga• 20 years of Agile Development• Middleware, Big Data, Reactive Distributed Systems. • Open Source contributor to BigData projects.

Agenda

• What’s special in Big Data – General lack of support for Unite/Integration testing– Testing the "real thing" (aka the Cluster)

• Why Gerrit for continuous deployment on BigData?• Our Development Lifecycle ingredients

– Gerrit, Jenkins, Mesos, Marathon, CDH / Spark• Gerrit Role and Components

– What did we use, why, what we would like to have • New developments

– Usint Topics with microservices for “atomic” multi-service changes• Live (minimised) Demo• Open points and discussion

WHY Gerrit?

• Fast Paced• Distributed team• Relatively a “niche” technology

– A lot of “junior” developers– Need for strong ownership– Validation rules– CD => We need to be have green build and consistent code

quality

Code-Review Lifecycle

• GIT used by distributed teams (UK, Israel, India)• Topics and Code Review• Jenkins build on every patch-set• Commits reviewed / approved via Gerrit Submit• Submitting a Topic automatically does:

– all patch-sets merged (semi-atomically)– trigger a longer chain of CI steps– automatically promote a RC if everything passes

• Jenkins automation via Gerrit Trigger Plugin

Build Steps and Solutions

• Unit tests abstracting from dependencies • Integration Tests:

– Using Docker to run dependencies on the CI• “Micro” Hadoop cluster or other dependencies (DBs,

messaging) => Jenkins docker plugin• When possible “dockerizing” just the required

components and driving them from the test framework • Performance/Acceptance required a real cluster

Fitting CDH Into this Picture

• Acceptance / performance test with short-lived CDHs• Solution: Mesos, Marathon and Docker:

– Ephemeral clusters with defined capacity– Automatic cluster-config– All controlled via Docker/Mesos

• This was quite a long process!! – mostly because of CDH cluster configuration

Mesos + Marathon

• Apache Mesos– Abstracts CPU, memory, storage, other compute

resources away from machines

• Marathon Framework– Runs on top of Mesos – Guarantees that long-running applications never

stop– REST API for managing and scaling services

CDH Components

• CDH 5.4.1 distribution– Apache Spark– Hadoop HDFS– YARN

Slave Host

Integration/Performance Test Flow on CDH Cluster

Jenkins Master

MesosMasterMarathon Private

Docker RegistryMesosSlave Docker

POST to Marathon REST API to start 1 docker container with Cloudera Manager and N docker containers with cloudera agents

Marathon Framework receives resource offers from Mesos Master and submits the tasks

The task is sent to the Mesos Slave

Mesos slave starts the docker container

Docker image is fetched from Docker registry if not present in Slave hostW

Install Cloudera packages via Cloudera Manager API using Python

Deploy the ETL, run the ETL and the Acceptance Tests

Unit and Integration Tests sample

• Test project:– Test Spark project – ETL from Oracle to HDFS

• Unit-test directly on Spark logic• Integration tests for every patch-set:

– VERY small dataset just for this demo– CDH and Oracle Docker Images

Unit and Integration Tests

Hadoop Pseudo-distributed mode

Spark Standalone

Jenkins

Oracle

Build Jobinit

Submit job

Init/read HDFS

Open Point and Discussion

• Topic based build of multiple artifacts– Demo implementation is naïve and difficult to maintain– Race conditions on build of dependent artifacts

• Need more advanced triggering system (zuul might fit)– Race condition on submit of topic

• Stream event: “topic-submitted” instead/in addition of many “patch-submitted” event

• Gerrit Trigger plugin should listen to this event to coordinate

Questions?

gerrit + jenkins = continuous delivery for big data

Software

rebuilding the airplane at 10 000m - strongspace.com ·...

( jenkins, docker ) -> { continuous delivery }

continuous integration using docker & jenkins

jenkins: from continuous integration to continuous delivery

continuous integration with jenkins and ant

continuous delivery using jenkins

continuous integration with jenkins

continuous integration with jenkins · pdf filemichael-...

jenkins - from continuous integration to continuous delivery

jenkins + docker = continuous improvement

running tests for every commit: gerrit, jenkins, docker, aws

gerrit & jenkins workflow: an integrated ci demonstration

preventing craziness · unit tests integration tests ......

continuous integration using jenkins and sonar

continuous integration mit jenkins

jenkins configuration details - gitlab · 2019-03-01...

webinar: "continuous delivery with jenkins"

continuous delivery with jenkins and stackato

introduction to continuous integration with jenkins

orientpkt 2013 gerrit und jenkins - ein traumpaar für pre...