daimi(c) henrik bærbak christensen1 integration testing

DAIMI (c) Henrik Bærbak Christensen 1

Integration Testing


Burnstein

Systems are hierarchies of levels, so testing is as well...

Unit test

Integrationtest

System test

Accept test

Individual units

Groups / Clusters

Whole System - Technical

Whole System - Requirements


Intuition

The intuition is that we can– test individual “atoms” of behaviour on their own

(“sandboxing”) and strengthen our belief in their reliability

– test the interaction between “atoms” when combined and strengthen our belief that they interact reliably

– test that when combining all parts into a complete system then it works reliably

• from a technical point of view• from a user/requirement point of view


Interpretation

But what are the “atoms”?– atom = undividable part

In procedural languages: – the procedure

In OO languages– method? class?

And what does “integrate” mean?


Burnstein

Burnstein makes a definition of unit like

Unit = smallest possible testable software component

– so – what is a component ?


Binder

Binder focus on a hierarchical definition– System is composed of components– A component is a system of smaller components

Component

*


Discussion

As testing is execution-based then software behaviour is what we ultimately study– we execute some software and study “what it does”

Thus – I find that what we intuitively call a unit (=“atom”) must be defined in terms of the resulting behaviour

Unit = smallest meaningful behaviour exhibited by executing software. Often associated with implementing a responsibility...


Integration testing

Integration testing deals with finding defects in the way individual parts work together.

As we use composition at almost any level in software engineering one may argue (as does Binder) that integration testing is occurs at all levels.


Class testing

Burnstein notes that if a method is considered the smallest unit of testing, the “ideally” you would have to test it in isolation from all other methods in the class– otherwise it is integration testing

Thus: you have to put it into an artificial test harness = code many of the remaining methods in the class as “stub” implementations

Often more efficient: consider the class as the unit...


Relations to OO

A sound definition of what constitutes object oriented programs is (I know there are others ):

A program execution is the collective behavior of a set of collaborating objects.

Thus, integration testing is at the heart of testing object-oriented programs.

A common source of defects is indeed complex interactions between collaborating objects as these collaboration patterns are less visible in the code and therefore more difficult to overview.


War Story

In a system, there were hardware-near machines and operator-near machines, the latter received temperature data from the former.

We had agreed that the data was packed as integers coded as the temperature * 10– i.e. T = 23,4 Celsius was coded as integer “234”

However they were sent by the hardware-near machine using another coding due to a misunderstanding.

Thus– both software units worked perfectly well in isolation– when integrating we detected some really odd defects...


Why integration testing

Why test component interactions???

If we provide– ‘sufficient’ component testing– and– ‘sufficient’ system testing– is it not OK then???


Why integration testing?

Burnstein refers to some axioms in §5.6:

Antidecomposition axiom:– There exists a program P and component Q [of P] such that T is

adequate for P, T’ is the set of vectors of values that variables can assume on entrance to Q for some t of T, and T’ is not adequate for Q

– System scope coverage does not necessarily achieve component coverage (“you do not test components by testing the system”)

– Why? Examples? Consequences?

Anticomposition axiom:– There exist programs P and Q, and test set T, such that T is adequate

for P, and the set of vectors of values that variables can assume on entrance to Q for inputs in T is adequate for Q, but T is not adequate for P;Q

– Adequate testing at component level is not equivalent to adequate testing of a system of components (“you do not test the system by testing the components”)

– Why? Examples?


Terminology

Test harness: Auxiliary code developed to support testing of units.

Consists of drivers that call target code and stubs that represent units it calls.

:UUT:driver :stub1. 2.


Exercise

Describe from mandatory exercise examples of

Drivers

Stubs


Requirements

Why replace real units with drivers/stubs?

What is the premise that makes drivers/stubs interesting?

Hint: Consider code complexity...


Premise for integration testing

At any given level: System consisting of components.

Testing interactions between components require that these are stable.– stable = sufficient dependability to allow integration

testing– threshold reflects practical considerations

• has passed unit testing

– In case of unstable component – then what?


Lessons learned

Key lessons from early 1970s is that incremental integration is most effective.

Integrating all components at the same time is problematic:– debugging is difficult as no idea where defect is– last-minute changes are necessary but no time for

adequate testing– testing often not systematic


Lessons learned

Advantages in incremental testing

– interfaces systematically exercised and shown stable before new unproven interfaces are exercised

– observed failures more likely to come from most recently added components making debugging more efficient.


Coupling

Relating the discussion of coupling: A measure of the strength of dependencies

between two subsystems Which is preferable from a testing perspective?

high coupling low coupling


Discussion

Thus – what is good for design is also good for testing.

How fortunate


Planning integration testing

Integration test plans must answer questions:– What interfaces will be the focus of integration

testing?– In what sequence will components/interfaces be

exercised?– Which stubs and drivers must be developed?– Which test pattern/strategy should be used?– When is integration testing considered adequate?– Component stability

All are of course very dependent upon particular project characteristics.


Component stability/volatility

An important aspect is of course to preserve our investment in testing. Spending staff hours to test a class that is later fundamentally changed is a waste of time. This observation is of course in opposition to our wish for early integration testing. Postpone integration of volatile components!


Stubs

So, the doctrine says integration test often but how do we do integration with components that are not stable or not developed at all?

Stubs: partial or surrogate implementation.


Reasons for Stubs

Stubs are powerful tools for many reasons.

Make testing of certain conditions possible– hardware simulator that outputs data that are seldom

occurring in practice• example: wind directions over north from wind sensor

– replace random behavior with predictable behavior• example: testing backgammon move validation

Make testing economically– stub for a database that is costly to reset/set up


Reasons for Stubs

Often stubs are relevant to decouple interactions– stub for complex algorithm with ‘simple’ answer

• ensures that the defect does not lie in the algorithm impl.

– stub for a component in a cycle• allows one in the cycle to be tested in isolation

There is no free lunch!– stubs can be costly to produce– … and sometimes you chase defects in production

code – that are actually embedded in the stubs

Morale: Keeps stubs simple !


Planning integration sequence

Which order should we integrate modules?– A with B before (AB) with C?– B with C before A with (BC)?

The decision is usually based upon a dependency analysis.– A (depends-on) B


Dependencies

A (depends-on) B is the super-relation a vast number of relations between software abstractions:– composition and aggregation– delegation to objects / API calls– inheritance– global variables– instance variable access– objects as message parameters– RMI / socket communication


Dependencies

Many are however implicit dependencies that are much harder to spot– database access / persistent store access– initialization sequences– timing constraints– file formatting / coding of untyped byte streams


Dependency analysis

Anyway: Explicit dependencies often dictate the sequence of testing – i.e. in which order are component interactions tested?

Thus a very helpful tool is to do dependency analysis – similar to what build systems do to determine compilation order.


Example from Binder

Root level (level 0)– unit not used by any other

unit in cluster under test– often there are several

roots

Leaf level– units that do not use any

other units in cluster under test.

Cycles – either be tested as a ‘unit’ – or stubs introduced to

break cycle

Arrows are uses relations


Example from Binder

Terminology


Integration defects

Binder list typical interface defects…

My war stories are covered by it


Integration Strategies

Binder lists nine integration strategies, documented in pattern form, for doing integration testing.

We focus on the classic strategies:– Big Bang (actually an anti-pattern )– Bottom-up– Top-down

and a costly but successful one:– continuous integration


Big Bang


Big Bang

Intent– demonstrate stability by attempting to exercise an

entire system with a few test runs

Context– bring all components together all at once. All

interfaces tested in one go.– usually ends in ‘big-bang’ – system dies miserably…

Entry criteria– all components have passed unit testing

Exit criteria– Test suite passes


Big Bang

Consequences– If nothing happens – then what? Failure diagnosis is

very difficult.– Even if exit criteria met, many interface faults can still

hide.– On the plus side: if it works, no effort has been spent

on test drivers and writing stubs.

– may be the course of action for• small systems with adequate unit testing

• existing system with only minor modifications

• system made from certified high quality reusable components


Bottom Up


Bottom Up

Intent– Demonstrate system stability by adding components

to the SUT in uses-dependency order, starting with component having the fewest dependencies.

Context– stepwise verification of tightly coupled components.


Bottom up

Strategy: 1. stage: leafs


Bottom up

2. stage..


Bottom up

Last stage..


Bottom up

Entry criteria– components pass unit tests

Exit criteria– interface for each subcomponent has been exercised

at least once– complete when all root-level components pass test

suites


Bottom up

Consequences– Actually most unit tests in OO is actually integration tests

• BU testing implicitly takes place in a bottom up development process

Disadvantages– Driver development cost significant

• but if the JUnit test cases are maintained this cost is worthwhile!

– Fix in lower level component may require revisions up the chain– interfaces only indirectly exercised– upper levels testing may require stubs in the bottom to test

special conditions– High level testing very late in cycle


Bottom up

Advantages– parallel implementation and testing possible– little need for stub writing


Top-Down


Top Down

Intent– Demonstrate stability by

adding components to the SUT in control hierarchy order, beginning with the top-level control objects

Strategy– 1 stage:

• test control objects


Top Down

2. stage


Top Down

Final Stage


Top Down

Entry– each component to be integrated has passed unit test

Exit– interface of each component has been exercised at

least once.– complete when leaf-level components passes system

scope test suite.


Top Down

Disadvantages– Large number of stubs necessary which is costly– Stubs are brittle– Fix in lower level component may require revisions up

the chain– difficult to get lower level components sufficiently

exercised (antidecomposition axiom)

Advantages– Low driver development costs– early demonstration of user-related behaviour


Variations

There are many variations of these strategies– sandwich testing

• moving from both top and bottom

– collaboration testing• taking outset from collaborations diagrams over use cases

Bottom line– there is no substitute from being clever and utilize a

combination of techniques that is most cost-efficient for the project at hand.


High-frequency integration



Binder also describes high-frequency integration that is a strategy whose characteristics lies in the timing, not the ordering, of testing. It is an intrinsic part of the process pattern daily build.

Intent– Integrate new code with stabilized baseline frequently

to prevent integration bugs from going undiscovered for a long time; and to prevent divergence from the baseline



Context– A stable baseline must be present; increments are the

focus of HF integration– Increments size must match integration frequency –

i.e. if daily builds, then increments must be deliverable in a day

– Test code developed in parallel with code– Testing must be automated– Software Configuration Management must be in place



Procedure– revise code + test code on private branch– desk check code and test– when all component testing passes, check-in to the

integration branch

Second– integration tester builds system of increments– testing using

• smoke tests and as much additional as time permits

Any increments that break HFI are corrected immediately



Disadvantages– automated tests must be in place– high commitment to maintaining code as well as tests– be aware of adequacy criteria – the suite that found

the old bugs may not find the new ones

Advantages– focus on maintaining tests is effective bug prevention

strategy– defects found early; debugging easier– morale high as system works early and keeps doing it

daimi(c) henrik bærbak christensen1 integration testing

Documents

smallest unit of testing

software behaviour

software engineering

complete system

definition of unit likeunit

individual atoms of

thusboth software units

hardwarenear machines