build quality in: stop the line - peter antman

How to build quality in – a tale from the trenches

Peter Antman 2013-05-23

Who am I?

Peter Antman 2

Background… §  Developer since 1995 §  Linux, Open Source and

Enterprise Java §  Leader and manager for

software developer since 2000

§  Head of Product Development, Polopoly Atex 2007 - 2011

§  Media business §  Drive: Help organizations and

people fully realize their potential (doing software development)

Peter Antman 0760 140 150 [email protected] @peterantman www.crisp.se/konsulter/peter.antman blog.crisp.se/peterantman, antman.se

Crisp is an employee owned company known for agile courses with internationally renowned teachers and experienced agile

developers and coaches.

h"p://blog.crisp.se

Persistent improvements

§  Polopoly – Enterprise WebCMS

§  International take of 2008

§  Started Agile transformation 2007

§  Existed for 16 year

§  Large code base

§  Large user base

§  Thousands of editors, millions of users, billions of page views

Development Manager @ Polopoly

Peter Antman 4

Organized around lean principles

Cult of quality

http://commons.wikimedia.org/wiki/File:1924_Non-Stop_Shuttle_Change_Toyoda_Automatic_Loom,_Type_G_2.jpg

The Type-G Toyoda Automatic Loom, the world's first automatic loom with a non-stop shuttle-change motion, was invented by Sakichi Toyoda in 1924. This loom automatically stopped when

it detected a problem such as thread breakage.

Time Boxed releases

No junk on trunk �

9

REL

Team

Demoed

Team test branch (convinience)

Story branch

Allways releaseable!

Test everything!

http://www.makefive.com/categories/news-business/business/top-5-items-i-sold-on-ebay

Continous integration

§  1.5 million lines of code §  10 000 tests §  PEAR §  PAF §  Upgrade, and more configurations §  Five database vendors * multiple versions §  Four browsers * multiple versions §  Three web containers * multiple versions §  Two JDK:s * multiple versions §  Two core OS:es * multiple versions §  Two EJB containers * multiple versions

Plattform to validate

Linux Windows

Mysql Oracle Msql Postgres Derby

Sun JDK IBM JDK

Tomcat JBossWeb WS Web

JBoss Websphere

Chrome Firefox Safari IE

1.5 million

10 000 tests

Multiple configurations

Multiple versions

That’s a lot of tests

Had to build our own test cloud

§  Started with RedHat kickstart on old machines under our tables –  Fragile, hard to upgrade, electricity out takes

§  Went to static installs on blades –  Lots of sys admin. One machine = one setup

§  Tried VMWare –  Clock issues (went backwards)

§  Started using Amazon –  To slow to upload our builds

§  Static Xen on local blades –  One machine = one setup

§  Eucalytus (kvm) on local blades, elastic cloud –  Very buggy (more than 6 month to stabilize)

History�

596 570 For example when merging a bugfix to 9.10

Number of tests (november 2011)

Our version of the automatic loom

http://geekandpoke.typepad.com/.a/6a00d8341d3df553ef014e8adc7838970d-pi

Compile class

Test class

Compile module

Test module

Build product

Test product

Acceptance test product

Test on specific plattform

Test with specific configuration

At what stage is it possible to continue working even if

it’s broken?

Incremental compile

Software development

http://fixedgear.se/forum/viewtopic.php?f=1&t=2650&start=20

Urban decay

http://www.codinghorror.com/blog/2005/06/the-broken-window-theory.html

Broken Windows

A sociological theory from 1982

http://huntington.patch.com/articles/volunteers-clean-up-in-the-station#photo-5852615

The New York Theorem:

Fighting big crime by picking litter

http://www.makefive.com/categories/news-business/business/top-5-items-i-sold-on-ebay

Broken builds are broken windows

The line must be stoped manually

Tools

Culture

§  Everything must be tested

§  Examine each failed test, always

§  New bugs are, –  either fixed when standing on them

–  handled as a ”fix next sprint” (first in line)

–  put into drawer

Stop the line and fix it

§  Bugs discovered by non automated tests are fixed in next sprint

§  Philosophy: We strive not to produce software with errors

§  Policy: If it had been covered by a test it would have been fixed immediately

§  Practically: Old bugs tends to never get fixed (adapt to capacity)

FixNextSprint – never let a windows be broken

§  Integrate –  first on ticket branch, –  then team test branch, –  then team branch, –  then branch

§  Hide known bugs (KnownBugs) §  Run only when changed (600 000 -> 100 000) §  Nightly test check responsibility (rotate between teams) §  Daily summary mail on test faults §  Tickets for test faults §  Indexed in Solr §  Annotate in Jenkins

Tools and policies are necesary

Analyzing Test Faults

Clean up GoGreen (adapt to capacity)

FixNextSprint (Stop-the-line)

It’s a never ending work

Here’s a song to sing

I keep a close watch on these tests of mine

I keep my Jenkins open all the time

I see a defect coming down the line

Becuse you're mine, I stop the line

build quality in: stop the line - peter antman

Technology