continuous integration at scale
TRANSCRIPT
Continuous Integration at scale
Vivek Singh, ThoughtWorks
CI Design, more than technology
• application architecture• development process• human behavior• art of compromise
What’s creates scale for CI?
• number of committers• frequency of commit• size of codebase• frequency of release
Experience from a project
• 100 committers• ~ 2 commits a day pet commiter• distributed• 45 days release cycle
Some context
• C#, .NET• inherited (non-buildable) codebase• one of the busiest website in UK and its
complete backend• distributed team (Bangalore, London, Pune)• used Go (formerly Cruise)
Best way to understand
• starting point• witness the evolution• what worked and didn’t
Server under the desk
• small team• lot of external dependencies• carefully and painfully setup environment
And soon long build times
• but before that……
CI Users
• developers• analysts• project manager
Developers want
• fast• reliable• and it always passes
QA
• want it to provide good builds
…so that they can test new things and verify known issues
Project Manager
• should catch important bugs…so that software is closer to be shipped
CI can be quick, cheap and useful
pick two
Continuous…
• …integration• …(automated) testing• …deployment
Continuous Integration
• meaningful handover to next stage of delivery process
• running only unit tests misses the point
back to, Long build times
• lot of code and tests• multiple teams working on different part of
codebase
Multiple Single Jobs Build?
Source Control(s)
Job B
Output
Job C
Output
Job A
Output
MaterialsMaterials
Materials
e.g. Hudson Slaves
Which is green build?
• material (x,y) => (a, b) Green• material (x) => (c) Green• material (y) => (c) Red
Multiple Single Jobs Build
• provides wrong build to downstream (e.g. QA)• reason: no synchronization on materials
Pipelined Builds
Pipeline 2
Pipeline 11
Pipeline 1
Source Control
Job BJob A
Output(s)
Job D Job E
Output(s)
MaterialsMaterials
Materials
Pipelined Builds, Why
• mimics component dependency, hence feels right
• no unnecessary builds, optimum use of resources
Pipelined Builds, Why Not?
• material sync issue• complex to understand• longer build time• difficult to track material flow• different from developer build
Staged Team Commit
Continuous IntegrationContinuous Integration
SourceControl
Local Source Control
Release DeliverableLocal Output
Commiters
Local testing
Continuous Integration
Local Output
Commiters
Local testing
Local Source Control
ManualPeriodicMerge
ManualPeriodicMerge
Staged Team Commit, Why?
• provides isolation• no need to build everything
Staged Team Commit, Why Not?
• huge merge problems• increase in testing effort
(we tried with SVN it might be better with GIT)
Parallel Jobs Build
Developer
Source Control(s)
Job B Job CJob A
Material Synchronizer
Materials
Materials
Materials Materials
Regression Firefox Chrome
A
B C
E F
A => a.compile, a.testB => a.compile, b.compile, b.testE => a.compile, b.compile, e.compile, e.testF => a.compile, b.compile, c.compile, f.compile, f.testSmoke => all.compile, smoke
Dependency Build
Parallel Jobs Build
• all CI issues cannot be solved without changing architecture– modularization– testability without external dependencies
• cannot do this with any other tool than Go
Continuous integration and virtualization
• clean build• Subversion• Git
I am a developer
• want to do the right thing• I don’t understand the CI design• I also forget to check the build status before
commiting/pushing• I don’t want delay fixing of build
Commit Gate
Continuous Integration
Source Control
Commiters
Pre Commit Hook
CheckStatus
GreenYellow