Object-OrientedSoftware Testing


An Initial Challenge

Myers’s Famous Problem

• Read 3 integers taken to represent the lengths of the sides of a triangle. Decide if triangle is isosceles, equilateral or scalene.

• Math: A valid triangle must meet 2 conditions. No side may have a length of zero, and each side must be longer than the sum of all sides divided by 2. If s is this sum:

– s = ( a + b + c) / 2

– then s > a, s > b, and s > c must hold

– if a == b == c then equilateral, if 2 sides are equal then isosceles, else scalene

• Experienced programmers find 7.8 test cases on average

• Myers suggests 14, Binder: 65 wrt a Java implementation:– Figure 1.1 p.4: class hierarchy

– Figure 1.3 p.6: Java interface

– Table 1.1:

» permutations, invalid and boundary inputs are important

» one must exercise all ways of violating a condition

– Tables 1.2 and 1.3: code (esp. drawing and inheritance) considerations

On Quality Attributes

Lessons from Lecture 1

• Quality is frequently downplayed in the software development industry

– despite the appearance of concern through standards such as ISO 9000…

– just look at lack of concern of design books, and list of known bugs in commercial software...

• Quality can be approached from different viewpoints:– we will emphasize product quality engineering, and downplay process

issues and measurement.

» but Myers’s question remains: how to derive test cases?

– it is widely accepted that quality must be customer-oriented

» Eric Yu discusses how to capture the goals of a customer: more on this later!

• Code cannot be tested on its own: you test against some specification– but, in reality, specs and models are seldom kept in sync with code…

traceability is mostly inexistent...

– and we must test as early and frequently as possible!

In the Eyes of the Beholder

• A product has several stakeholders, each with their view of quality and its attributes:

– end-users: functional requirements, usability, reliability, etc.

– operators: ease of integration with other systems, etc.

– administrators: ease of configuration

– purchasers: cost, added-value, return on investment…

– sales people: needs assessment, ease of contract creation

– architects: scalable and understandable modeling to be used by designers, traceability to requirements, reusability (libraries and frameworks)

– designers: similar to architects’ viewpoint but with a concern for the implementation

– implementers: completeness, consistency and correctness of the design, availability of languages and tools (CASE, debugging, etc.)

– testers: testability (e.g., controllability and observability), and possibly integrability

– managers: ease of planning and tracking, confidence assessment, etc.

• Correctness (a.k.a validity): extent to which specifications are satisfied and user’s goals fulfilled

• Reliability: extent to which repeated correct behavior is obtained

• Robustness: extent to which correct behavior is obtained in any context

• Responsiveness: extent to which timing requirements for responses are satisfied

• Efficiency: extent to which use of computing resources is acceptable (e.g., memory leaks, complexity of algorithms wrt time and space)

• Integrity (a.k.a security): extent to which invalid access is prevented

• Usability: extent to which learning and interpretative times are minimized

• Maintainability: extent to which correction times can be minimized

• Testability: extent to which quality can be assessed

• Flexibility: extent to which it is easy to modify the system

• Portability: extent to which the system can be ported to another environment

• Reusability: extent to which a subset of the system can be reused

• Interoperability: extent to which the system can interact with others

Some Quality Attributes

Before Going Any Further

• One must keep in mind that testing is an endless task…

– Prioritization with respect to stakeholder ’s goals and quality attributes is crucial

• If models, code and tests are to be traced back to stakeholders ’s goals and quality attributes, then some satisfaction criterion must be associated with each of these goals and attributes!!!

– Prioritization is not enough: we must also have some approach that allows us to verify that the goals of stakeholders are met.

– It ’s one thing to measure, it ’s another to know what an actual measurement tells us or does not tell us about quality…

» beware of metrics defined in vacuum...

Typical Metrics

• For reliability: serious failure rate per year

• For maintainability:% of inserted fault detected

• For responsiveness: % of responses that satisfy the constraints

• For capacity: call capacity (actual vs.. targeted)– definition: maximum load that can be processed while all

performance parameters are simultaneously met

• For stress tolerance: stress capacity vs.. normal capacity

– definition: extent to which critical functions are supported when capacity is exceeded

• For fault tolerance: % of recovered faults– definition: extent to which the product can recover from


Some Definitions

• Recall testing is only one aspect of SQE.

• IEEE Spectrum (1992):– an error results in a fault in the software, which can lead to a defect in

the product, which can result in a failure of the function

• Testing is specifically concerned with code:– Failure: observation of incorrect system behavior

» failure ‘intensity’ (or density) should decrease rapidly over the duration of each iteration

– Faults or defects: root cause for a failure» often called a bug» reliability and robustness may be assessed by tracking a metric

such as “number of defects per million lines of code”» what constitutes a defect versus a set of defects is not clear!» Defect elimination does NOT guarantee quality:

• defect/MLOCS is not a customer-oriented metric

• number of hours of failure-free behavior is more customer-oriented

• Binder’s glossary is VERY impressive!

About Testing

– Faults and defects are detected by a successful test case

– Testing is about detecting defects

» you cannot prove the absence of faults…

– A test specifies a fault or set of faults to detect

– A test case is a specific setup-execute-report-teardown context associated to a test:

» a test is typically associated with several test cases

» tests and test cases can be organized into test suites

– A test driver is software to run test cases

– A test manager is software to track the success of test cases

– OA & M: operations, administration and maintenance

» a phase typically at the end of release cycles

» may involve tracking a set of metric for each quality attribute

Some Definitions for Testing

Validation & Verification

• From Probert: – an activity is said to be a validation activity if it involves the

construction of part of the binary relation conforms <actual behaviors, worthwhile behaviors>

– validation: are we building the right things?

» we are establishing whether the actual behaviors we observe correspond to the ones deem to be of value wrt goals/reqs/specifications for what is under test

» e.g., black-box behavior of the system, or of a procedure

– an activity is said to be a verification activity if it involves checking the subset relation <instances of invalid constructs, component constructs> [component verification] OR <actual transform sequence, legal sequences of specified transform rules> [transform verification]

– verification: are we building things right?

» Includes component verification (e.g., model-checking) and transform verification (e.g., traceability between scenarios and FSMs)

• From Probert:– Formal: A certification activity decides (gives a yes/no answer

to) whether an actual measure is at least as great as a preset metric. Actual measure is denoted the certification measure, preset metric is denoted the certification objective.

– Informal: A certification activity measures the completeness of a quality assurance activity or program against stated requirements.

– Example: extent of code coverage achieved by the execution of a specific test suite.

A Certification Process

Probert ’s SQE process:1. set quality objectives

2. define measurable quality (product & process) metrics

3. identify process certification points

4. apply metrics at each point

5. review exit criterion for the current iteration

6. assess quality objectives and decide on continuous improvement activities

What ’s downplayed?– Step 2… It ’s easier said than done… For example, we would like to be

sure that one iteration is traceable to the previous one in order to measure the convergence entailed by an incremental process… But how do we do this?

An Overview of Software Testing

(Binder chapter 3)

What is Software Testing?

• Binder (p.41): it is the design of a special kind of s/w system that is:

– fault-directed: target system is to exercise another s/w system with the intent of finding bugs and/or

– conformance-directed: target system is to demonstrate satisfaction of stakeholders ’s goals

• Combinational Logic (ch. 6) and FSMs (ch. 7) provide general test models for which systematic approaches exist for the generation of test suites.

• UML(-RT) provides a modeling language for application-specific capturing. We will also consider UCMs.

– Binder has interesting stuff to say about UML in ch.8!

• Figure 3.1 p.43 gives the overall strategy.

The Steps of Test Design

For Binder, p. 41, test design involves: – identifying, modeling and analyzing the responsibilities of the

system under test

– designing tests based on these models

– deriving test cases from these tests» responsibility-based: expected behavior from models» implementation-based: expected behavior from code

– adding test cases based on code analysis, suspicions and heuristics

– developing expected results or some other pass/fail criterion for each test case

Test design aims for interesting test cases, that is ones that have a good chance of revealing a failure.

The Steps of Test Execution

According to Binder, p. 43:• Establish that the implementation under test is minimally

operational by exercising the interfaces between its parts.

• Execute the test suite: the result of each test case is evaluated as pass or no pass

• Use a coverage tool to instrument the implementation under test. Rerun the test suite and evaluate the reported coverage.

• If necessary, develop additional tests to exercise uncovered code.

• Stop testing when the coverage goal is met and all test cases pass.

Two issues:– we need to categorize the expected results and failures

– we need to study code coverage later: lines, branches?

Classifying Failures

As one executes test cases, one may stumble on, figure 3.3, p. 49:

• a bug (i.e., a fault)

• an omission: the absence of some required functionality

• a surprise: the execution of behavior that is not required– e.g., reusing an inherited capability when one should not

Fault-based testing is usually a luxury:• purposely introduce faults in code (called mutations) to see if these

faults are revealed by the test suite...

Debugging is not part of testing:• debugging is about finding the cause of a failure...

What Can be Achieved?

• Limiting factors, p. 54:– The size of the input/state space

– The number of possible execution sequences (or paths):

» loops, conditions, and dynamic binding contribute to the combinatorial explosion of the number of possible paths

– Fault sensitivity:

» does the test suite hide faults?

– Coincidental correctness:

» faulty code can still produce on occasion correct behavior

– Absolute limitations:

» exhaustive testing is intractable

» spurious test may be produced if reqs and goals are incorrect

» the test cases themselves can be incorrect

» without trusted expected results to compare to actual tests, pass. no pass evaluation is dubious.

About Execution Sequences

(from Binder section 3.3.2)

Consider: for ( int i = 0; i < n; ++i) {

if ( a.get(i) == b.get(i) )

x [ i ] = x [ i ] + 100;


x [ i ] = x [ i ] - 2;


How many paths if n = 2?

Loop Header


+ 100 line

- 2 line

About Coincidental Correctness

(from Binder section 3.3.3)

• x + x and x * x both work for x = 2…

• considerint scale (int j) {

j = j - 1; //should be j = j + 1;

j = j / 30000;

return j; }

According to Binder:

For j = -30001, -30000, -1, 0, 29999, and 30000 the answer is wrong!

That is 99.9908% of the input space works!

Amusingly enough, Binder is wrong!!!

About Coincidental Correctness

• a subclass may override a method and in doing so introduce a fault in a method it inherits.

public class Account extends Object { public class TimeDepositAccount extends Account

Protected Date lastTxDate, today; int daysSinceLastTx() {

return ( –


// … }

Int quartersSinceLastTx() {

return (90/daysSinceLastTx());


Int daysSinceLastTx() {

return ( –

lastTxDay.txDate + 1);



Bugs that Testing Can Catch

• Figure 3.6 p.60: Some faults (bugs) and when they can be addressed.

Fault Models

(Binder chapter 4)

Fault Model

• Any rational testing strategy is guided by a fault model

• Answers question: Why do the features called out by a technique warrant our effort

– Common sense

– Experience

– Suspicion

– Analysis

– Experiment

• Identifies relationships and components of the system under test that are most likely to have faults.

• Software testing strategies are effective to the extent that their fault models is a good predictor of faults.

Fault Models

• Conformance-directed testing– Conformance to requirements or specifications

– Relies on nonspecific fault model

– Establish a test suite that is sufficiently representative of the requirements of the system

– Should be fault sufficient (exercise specified features)

• Fault-directed testing– Seeks to reveal implementation faults

– A specific fault model is required to direct potentially large probing of the implementation

– Should be fault efficient (high probability of revealing a fault)

Bug Hazards of OOP

• New written code is 48.8 times more likely to have a bug [Basili+96a]

– Verbatim reused code, 0.125 faults per KLOC (thousand lines of code)

– Code slightly modified, 1.500 faults per KLOC – Code extensively modified, 4.89 faults per KLOC – No reused, newly written code, 6.11 faults per KLOC

• On average bug found in every 150 lines of code [Fiedler 89]

• Classes– That send more messages to instance variables and message

parameter objects are more likely to be buggy– That have more superclasses and higher specialization are more

likely to be buggy

Bug Hazards of OOP

• Encapsulation– Obstacle for testing

• Inheritance

– Weakens encapsulation, creating global data problems

– Overloading, reuse, specialization

– Incorrect Initialization and Forgotten Methods

– Inheritance structure

– Multiple Inheritance

– Abstract classes, Interfaces.

Bug Hazards of OOP

• Polymorphism– Dynamically bound messages hard to understand, error-prone

– Can’t change polymorphic server without regards to client

– Code is deceptively simple, but complex

– Can produce strange results class hierarchy not defined well

– Messages can be bound to wrong server

• Dynamic Binding– Many classes may use the same method name creates bugs

– Methods are typically small

• Message Sequence and State– Cooperative control bugs

– Delocalization bugs

Bug Lists/Errors and Failures

• Errors and Failures: Binder, p.87

• Method Scope Fault Taxonomy: Binder, p.88-89

• Class Scope Fault Taxonomy: Binder, p.90-91

• Cluster/Subsystem Scope Fault Taxonomy: Binder, p.92

An OO Testing Manifesto

• Binder, p.103-107

About Test Models

(Binder chapter 5)

Model-Based Testing

• Test models must ideally support:– the systematic enumeration of input and state combinations– automated, systematic and repeatable generation of tests

• But Beizer reduces models to intuition joggers:– “ it does not matter that they are imperfect as long as the resulting tests are good… ”– even checklists are considered by some to be test models...

• If testing is to proceed from models, then models must be validated and verified:– validation: tracing back to stakeholders’ goals– verification:

» intra-model: syntax and semantics are ok (wrt a meta-model)» inter-model: a model is consistent with the others of this iteration

• inconsistency allows the derivation of a statement AND of its negation

» inter-iteration: a model is traceable to its previous version

• Verification also requires tracing back the code to the models of its iteration and to its previous version.

Binder ’s Model-Based Testing

Consider Figure 5.1 p.115:• A meta-model is the definition of a modeling technique:

symbols used in its notation, rules for using these symbols, concepts associated with the symbols, and composition of symbols.

• Consistency checking (i.e., model verification) requires a meta-model and a traceability model. It is not part of Binder ’s concerns.

• Nor is model validation, which also requires a traceability model.

• Binder ’s definition of verification is restricted to code and is not addressed...

– « Verification attempts to show that implementation is correct with respect to its representation, without executing it. This effort may be either informal (using a checklist) or formal (constructing a proof).

About Cartoon-Based Testing

• Binder (p.116): – “Most OOA/D methodologies provide a loose graphical syntax

and symbol set. This is accompanied by minimal guidance for impressionistic rendering of behavior and structure that happen to come to the designer’s attention. These are cartoons: they do not demand complete information, consistent usage […] Cartoons are useful for sketching, refining and documenting solutions, but they are not test-ready: they lack content and consistency necessary to produce executable test cases.”

– Most OOA/D methods and models are ambiguous, fragmentary, and incomplete: “no explicit definition exists for the necessary components of a well-formed behavior model”.

– “CASE tools contribute to this problem. Nearly all CASE implementations of methodologies are incorrect, distorted, and incomplete.

Requirements for a testable model

From Binder p.117:• It is a complete and accurate reflection of the kind of implementations to be

tested. The model must represent all features to be exercised.

• It abstracts details that would make the cost of testing prohibitive.

• It preserves detail that is essential for revealing faults and demonstrate conformance

• It represents all events (of a state model) so that we can generated these events, typically as messages sent to the IUT.

• It represents all actions (of the state model) so that we can determine whether a required action has been produced.

• It represents state so that we have an executable means to determine what state has (or has not) been achieved.

From this perspective, Barber argues for the usefulness of formal specifications!

