#atlassian
WOJCIECH SELIGA • SENIOR DEV MANAGER • ATLASSIAN • @WSELIGA
Heavenly HellAutomated Tests at Scale
• Coding since 6 yo• Agile Practices (inc. TDD) since 2003• Dev Nerd, Tech Leader, Agile Coach,
Speaker, PHB• 7 years with Atlassian
(JIRA Senior Dev Manager)• Spartez Co-founder & CEO
About me
XP Promise
Cos
t of
Cha
nge
Time
WaterfallXP
XP Promise
Cos
t of
Cha
nge
Time
WaterfallXP
The Story
About 2.5 years ago
Almost 10 years of accumulating
legacy automatic tests
About 20 000 tests on all levels of abstraction
*just in core JIRA
Very slow (even hours)and fragile feedback loop
Serious performance and reliability issues
Dispirited devs accepting RED as a norm
FeedbackSpeed
`Test
Quality
Test Code is Not Trash
Design
MaintainRefactor
Share
Review
Prune
Respect
Discuss
Restructure
Rewrite
Test Pyramid
Unit Tests (including QUnit)
REST / HTML Tests
Selenium
Fastest, lowest overall confidence
Slowest, highest overall confidence
Test Pyramid
Unit Tests (including QUnit)
REST / HTML Tests
Selenium
90%
9%
1%
Optimum Balance
Optimum Balance
Isolation
Optimum Balance
Isolation Speed
Optimum Balance
Isolation Speed Coverage
Optimum Balance
Isolation Speed Coverage Level
Optimum Balance
Isolation Speed Coverage Level Access
Optimum Balance
Isolation Speed Coverage Level Access Effort
Dangerous to temper with
Dangerous to temper with
Quality / Determinism
Dangerous to temper with
MaintainabilityQuality / Determinism
Almost two years later…
People
People - Motivation Making GREEN the norm
Shades of Red
Build Tiers and Policy
Tier A1 - green soon after all commits
Tier A2 - green at the end of the day
Tier A3 - green at the end of the iteration
unit tests and functional* tests
WebDriver and bundled plugins tests
supported platforms tests, compatibility tests
Wallboards: Constant
Awareness
Training
• Favouring assertThat over assertTrue/False and assertEquals
• Avoiding races - Atlassian Selenium with its TimedElement
• Favouring unit tests over functional tests (including QUnit over WebDriver)
• Promoting Page Objects
• Brownbags, blog posts, code reviews
Quality
Automatic Flakiness Detection Quarantine
Re-run failed tests and see if they pass
Quarantine - Healing
SlowMo - expose races
Selenium 1
Selenium 1
Selenium ditching Sky did not fall in
Ditching - benefits
• Freed build agents - better system throughput
• Boosted morale
• Gazillion of developer hours saved
• Money saved on infrastructure
Ditching - due diligence
• conducting the audit - analysis of the coverage we lost
• determining which tests needs to rewritten (e.g. security related)
• rewriting the tests (good job for new hires + a senior mentor)
Flaky Browser-based Tests
Playing with "loading" CSS class does not really help
Races between test code and asynchronous page logic
Races Removal with Tracing
// in the browser:!function mySearchClickHandler() {! doSomeXhr().always(function() {! // This executes when the XHR has completed (either success or failure)! JIRA.trace("search.completed");" });!}!// In production code JIRA.trace is a no-op
// in my page object:!@Inject!TraceContext traceContext;! !public SearchResults doASearch() {! Tracer snapshot = traceContext.checkpoint();! getSearchButton().click(); // causes mySearchClickHandler to be invoked! // This waits until the "search.completed" // event has been emitted, *after* previous snapshot ! traceContext.waitFor(snapshot, "search.completed"); ! return pageBinder.bind(SearchResults.class);!}!
Speed
Can we halve our build times?
Parallel Execution - Theory
End of Build
Batches
Start of Build
Parallel Execution
End of Build
Batches
Start of Build
Parallel Execution - Reality Bites
End of Build
Batches
Start of Build
Agent availability
Dynamic Test Execution Dispatch - Hallelujah
Dynamic Test Execution Dispatch - Hallelujah
"You can't manage what you can't measure."
not by W. Edwards Deming
"You can't manage what you can't measure."
not by W. Edwards Deming
If you believe just in it
you are doomed.
You can't improve the systemif you can't measure it
You can't improve the systemif you can't measure itProfiler, Build statistics, Logs, statsd → Graphite
Anatomy of Build*
CompilationPackaging
Executing Tests
Anatomy of Build*
CompilationPackaging
Executing Tests
Fetching Dependencies
Anatomy of Build*
CompilationPackaging
Executing Tests
Fetching Dependencies
*Any resemblance to maven build is entirely accidental
Anatomy of Build*
CompilationPackaging
Executing Tests
Fetching Dependencies
*Any resemblance to maven build is entirely accidental
SCM Update
Anatomy of Build*
CompilationPackaging
Executing Tests
Fetching Dependencies
*Any resemblance to maven build is entirely accidental
SCM Update
Agent Availability/Setup
Anatomy of Build*
CompilationPackaging
Executing Tests
Fetching Dependencies
*Any resemblance to maven build is entirely accidental
SCM Update
Agent Availability/Setup
Publishing Results
JIRA Unit Tests Build
Compilation (7min)
JIRA Unit Tests Build
Compilation (7min)
Packaging (0min)
JIRA Unit Tests Build
Compilation (7min)
Packaging (0min)
Executing Tests (7min)
JIRA Unit Tests Build
Compilation (7min)
Packaging (0min)
Executing Tests (7min)
Publishing Results (1min)
JIRA Unit Tests Build
Compilation (7min)
Packaging (0min)
Executing Tests (7min)Fetching Dependencies (1.5min)
Publishing Results (1min)
JIRA Unit Tests Build
Compilation (7min)
Packaging (0min)
Executing Tests (7min)Fetching Dependencies (1.5min)
SCM Update (2min)
Publishing Results (1min)
JIRA Unit Tests Build
Compilation (7min)
Packaging (0min)
Executing Tests (7min)Fetching Dependencies (1.5min)
SCM Update (2min)
Agent Availability/Setup (mean 10min)
Publishing Results (1min)
Decreasing test execution time to
ZERO alone would not let us achieve our goal!
Agent Availability/Setup
• starved builds due to busy agents building very long builds
• time synchronization issue - NTPD problem
• Proximity of SCM repo
• shallow git clones are not so fast and lightweight + generating extra git server CPU load
• git clone per agent/plan + git pull + git clone per build (hard links!)
• Much less load on Stash server (no need to queue up)
SCM Update - Checkout time
• Proximity of SCM repo
• shallow git clones are not so fast and lightweight + generating extra git server CPU load
• git clone per agent/plan + git pull + git clone per build (hard links!)
• Much less load on Stash server (no need to queue up)
SCM Update - Checkout time
2 min → 5 seconds
• Fix Predator
• Sandboxing/isolation agent trade-off: rm -rf $HOME/.m2/repository/com/atlassian/*
intofind $HOME/.m2/repository/com/atlassian/ -name “*SNAPSHOT*” | xargs rm
• Network hardware failure found (dropping packets)
Fetching Dependencies
• Fix Predator
• Sandboxing/isolation agent trade-off: rm -rf $HOME/.m2/repository/com/atlassian/*
intofind $HOME/.m2/repository/com/atlassian/ -name “*SNAPSHOT*” | xargs rm
• Network hardware failure found (dropping packets)
Fetching Dependencies
1.5 min → 10 seconds
Compilation
• Restructuring multi-pom maven project and dependencies
• Maven 3 parallel compilation FTW!
-T 1.5C *optimal factor thanks to scientific trial and error research
Compilation
• Restructuring multi-pom maven project and dependencies
• Maven 3 parallel compilation FTW!
-T 1.5C *optimal factor thanks to scientific trial and error research
7 min → 1 min
Unit Test Execution
• Splitting unit tests into 2 buckets: good and legacy (much longer)
• Maven 3 parallel test execution (-T 1.5C)
3000 poor tests (5min)
11000 good tests (1.5min)
Rewritten entirely over next year
Unit Test Execution
• Splitting unit tests into 2 buckets: good and legacy (much longer)
• Maven 3 parallel test execution (-T 1.5C)
7 min → 5 min
3000 poor tests (5min)
11000 good tests (1.5min)
Rewritten entirely over next year
Functional Tests
• Selenium 1 removal did help
• Faster reset/restore (avoid unnecessary stuff, intercepting SQL operations for debug purposes - building stacktraces is costly)
• Restoring via Backdoor REST API (JIRA TestKit)
• Using REST API for common setup/teardown operations
Functional Tests
Publishing Results
• Server log allocation per test → using now Backdoor REST API (was Selenium)
• Bamboo DB performance degradation for rich build history
Publishing Results
• Server log allocation per test → using now Backdoor REST API (was Selenium)
• Bamboo DB performance degradation for rich build history
1 min → 40 s
Unexpected Problem
• Stability Issues with our CI server (hardware)
• The bottleneck changed from I/O to CPU
• Too many agents per physical machine
JIRA Unit Tests Build Improved
Compilation (1min)
JIRA Unit Tests Build Improved
Compilation (1min)
Packaging (0min)
JIRA Unit Tests Build Improved
Compilation (1min)
Packaging (0min)
Executing Tests (5min)
JIRA Unit Tests Build Improved
Compilation (1min)
Packaging (0min)
Executing Tests (5min)
Publishing Results (40sec)
JIRA Unit Tests Build Improved
Compilation (1min)
Packaging (0min)
Executing Tests (5min)
Fetching Dependencies (10sec)
Publishing Results (40sec)
JIRA Unit Tests Build Improved
Compilation (1min)
Packaging (0min)
Executing Tests (5min)
Fetching Dependencies (10sec)
SCM Update (5sec)
Publishing Results (40sec)
JIRA Unit Tests Build Improved
Compilation (1min)
Packaging (0min)
Executing Tests (5min)
Fetching Dependencies (10sec)
SCM Update (5sec)
Agent Availability/Setup (3min)*
Publishing Results (40sec)
Improvements Summary
Tests Before After Improvement %
Unit tests 29 min 17 min 41%
Functional tests 56 min 34 min 39%
WebDriver tests 39 min 21 min 46%
Overall 124 min 72 min 42%
* Additional ca. 5% improvement expected once new git clone strategy is consistently rolled-out everywhere
Better speed increases responsibility
Fewer commits (authors) per single build
vs.
The Quality Follows
The Quality Follows
The Quality Follows
But that's still bad
We want CI feedback loop in a few minutes maximum
Splitting The Codebase
Inevitable Split - Fears
• Organizational concerns - understanding, managing, integrating, releasing, coordinating
• Mindset change - if something worked for 10+ years why to change it?
• Trust - does this library still work?
• We damned ourselves with big buckets for all tests - where do they belong to?
Splitting code base
• Step 0 - JIRA Importers Plugin (3.5 years ago)
• Step 1- New Issue View and Navigator
• Step 2 - now everything else follows (e.g. Workflow Designer)JIRA 6.0
Getting back from hell to heaven is difficult. Hell sucks in your soul.
Key takeaways:
• Visibility and problem awareness help• Maintaining huge testbed is difficult and costly• Measure the problem - to baseline• No prejudice - no sacred cows• Automated tests are not one-off investment, it's a continuous journey• Performance is a damn important feature
#atlassian
Test performance is a damn
important feature!
XP vs Sad Reality
Cos
t of
Cha
nge
Time
WaterfallXP - idealSad Reality
• Green Traffic Light - by flrnt, CC-BY-SA-2.0
• Turtle - by Jonathan Zander, CC-BY-SA-3.0
• Loading - by MatthewJ13, CC-SA-3.0
• Merlin Tool - by L. Mahin, CC-BY-SA-3.0
• Flashing Red Light - by Chris Phan, CC BY 2.0
• In Heaven - by Daniel Pascoal, CC BY-NC-ND 2.0
Images - Credits
Thank you!
WOJCIECH SELIGA • SENIOR DEV MANAGER • ATLASSIAN • @WSELIGA