continuous deployment applied at myheritage
TRANSCRIPT
Continuous Deployment AppliedRan Levy, Backend DirectorElad Shmitanka, Operations engineer
Agenda
● Overview about MyHeritage
● Background – the days before CD
● Why switching to CD?
● CD
● Wins
Family history for Families
Building next generation tools for family history enthusiasts and their families
Discover Preserve Share
Challenge: Scale
79 million registered users
1.9 billion tree profiles
6.2 billion historical records
200 million photos
42 languages
1 million daily emails
Agenda
● Overview about MyHeritage
● Background – the days before CD
● Why switching to CD?
● CD
● Wins
Background – the days before CD
● Working in branches (many).
● Weekly service pack (dedicated branch).
● Emergencies and HOT Service Pack.
Background – the days before CD
● Advantages:○ Intensively tested and monitored.
● Disadvantages:○ Delivering value to user only on weekly basis. ○ Unstable deliveries to QA without clear owner to problems.○ Developers needs to get back to previous work.○ Huge time waster across the entire R&D.○ Difficult rollbacks in case a problem reached production.
Agenda
● Overview about MyHeritage
● Background – the days before CD
● Why switching to CD?
● CD
● Wins
What is Continuous Deployment ?
Continuous Deployment is a set of practices aimed at, building, testing, and releasing software frequently.
These principles help reduce the cost, time and risk of delivering changes to customers by allowing for more incremental changes to applications in production.
Why switching to CD?
● Fast feedback loop.
● Risk reduction.
● Better coding.
● Increase velocity.
● Easy and fast recovery.
● Bridges the gap between QA (team) and Dev.
Agenda
● Overview about MyHeritage
● Background – the days before CD
● Why switching to CD?
● CD ○ Transition phase
○ The early days
○ The future is here
● Wins
The transition phase
Before switching to CD
● Learn from others (like we did).● Several engineering practices and tools MUST be in
place.
The transition phase
The transition phase
The transition phase
The transition phase
The transition phase
● Gradually skipping Service Pack○ No actual gain for SPCs (manual dists).○ We gave up SPCs and the sky didn’t fall.○ Still coding in branches.
● Small gradual steps:○ Applying CD in completely new code by a single dev.○ Applying CD in a single agile team.○ Applying CD in two agile teams.
The transition phase
● What have we learned?○ Fewer bugs.○ More stability in production.○ Better velocity.
CD – the early days
● More frequent commits. ● Branches have gradually disappeared.● Manual procedure for updating production
○ Prone to human errors○ Required dist synchronization○ Time waster○ …
● Let’s improve and automate the process
CD – the future is here
What did we have?
● Servers list - Static list● Scripts - Mixture of PHP and bash● Error handling - Manual ● SVN problems - Calculating deltas, long processes, conflicts● Dist method - Rsync , only delta of files● Queue
● Scripts - Jenkins with a few scripts
Ok, So what did we change?
● Servers list - Mcollective using Puppet filters
● Error handling - Jenkins Flow plugin, catch● SVN problems - Working on trunk, revert & update● Dist method - RPM, Mcollective● Queue - Builtin in Jenkins
What did we add?
● Tests● Apache configuration changes● Notifications - In Hipchat, with mentioning● Daily digest of changes● Automatic cleanup of the build machine
So, how does it looks like? (Hipchat)
And in jenkins?
Flow schema
Flow schema
Flow schema
Prepareworkspace
Flow schema
Prepareworkspace
Run Tests
Prepare assets
Flow schema
Run Tests
Prepare assets
Suit 1
Suit 2
Suit n
Build RPM
IntegrationCanary
Flow schema
Run Tests
Suit 1
Suit 2
Suit n
Integration
Dist
Flow schema
Suit 1
Suit 2
Suit n
Integration
Dist
CleanupHandle flow results
Flow schema
Prepare workspace
Parse commit message
Run Tests
Build RPM
Canary Integration
Handleflow results
Dist Cleanup
Suit 1Suit 2Suit n
Prepare assets
Drilldown
● Jenkins & Groovy hacks● RPM● MCollective● Hipchat integration● Emergency job
Jenkins & Groovy hacks
● Accessing all the classes of jenkins● How do we make sure the SVN revision will be static across all the jobs?
Jenkins & Groovy hacks
● Accessing all the classes of jenkins● How do we make sure the SVN revision will be static across all the jobs?● How do we know which files changed?
Flow #9 Flow #8 Flow #7 Flow #6
Prepare workspace
Prepare workspace
Prepare workspace
Prepare workspace
Flow #5
Prepare workspace
RPM
RPM (RedHat Package Manager) - Package management system for RedHat (Originally). Contains arbitrary set of files, configurations files and pre & post scripts.
RPM (continue)
● Why RPM? (In short? a lot)○ Mature○ Config files are managed/tracked○ Version tracking○ Dependency management○ Native OS tools to manage lifecycle (install/query/update/uninstall/downgrade)○ Rich ecosystem and toolchain○ Always contains the entire codebase (easier to recover from missed updates)○ Doesn’t touch unmanaged files (i.e PID files)
● Problems we have encountered..○ Large packages (Reduced from a ~700M to currently ~450M)○ I/O & Network usage on the repo machine (simple HTTP server)○ Yum locking mechanism in Puppet
MCollective
MCollective - a framework for building server orchestration or parallel job-execution systems. Most users programmatically execute administrative tasks on clusters of servers.
MCollective (Continue)
● Packages plugin - https://github.com/myheritage/mcollective-plugin-packages
● Distributor plugin - In-house○ Used for emergency dists (explained later)○ clear cache/reload apache
● Dynamic host list○ Easier to manage - Given free by Mcollective○ Host in maintenance - Simply stop Mcollective service
● Scaleable
HipChat
Group and private chat, file sharing, and integrations.
● Has API● Web, Mobile & desktop clients● Mentioning● History● Rooms
HipChat (Continue)
● Using HipChat plugin V0.1.8● Plugin allows only limited functionality (0.1.9 offers more), No
customized messages, no mentioning● Groovy for the rescue!● HuBot for the rescue!
Emergency job
We have problems in the site, what do we do?
1. Put a stop flag - Disabling new dists2. Committing a fix and disting emergency
Emergency job
Get changed files Compress Upload to httpd
“Go, download and extract”
Additional problems we’ve encountered
● Parallelism of UnitTests● Minify failures● Stop flag job● Clear cache
○ PHP is script based language○ Cache is used to improve performance○ requires cache invalidation
CD 2.0 / Lessons learned
● Improving visibility of the root cause● Break the Groovy to files and methods● Yum locking (Should be resolved at Puppet 4.x)● RPM has it’s disadvantages
○ MCollective RSync plugin (https://github.com/myheritage/mcollective-rsync-agent)
Agenda
● Overview about MyHeritage
● Background – the days before CD
● Why switching to CD?
● CD
● Wins
Wins
● Around 20-30 dists per day to deliver close feedback and higher business value.
● Reduced maintenance time for dist procedure.● Higher quality:
○ Less bugs.○ Better coding.○ Increased testing coverage.
Wins
● Reduced code base and assets separation from code base.
● Higher velocity.
● Easy and fast recovery.
● Satisfaction or R&D, DevOps and the organization.
We are hiring!