apresentação do powerpoint - ubingpombo/units/docs/sq/sq_t10_continuoust… · • design errors...

Continuous Testing

(adapted from lecture notes of the CSCI 3060U - Software Quality Assurance unit, J.S. Bradbury, J.R. Cordy, 2018)

Nuno Pombo, Qualidade de Software, 2019/20 1

Outline

•Today we look at the role of testing in software maintenance,and the need for continuous testing methods

•We’ll look at:

• software maintenance and evolution

• corrective, adaptive, perfective and preventivemaintenance

• continuous testing methods

• Maintaining functionality, failure, operational testsuites

• regression testing

• testability, controllability, observability

2

Continuous Testing

Evolution

•Maintenance is the phase of development in which the

software is in production day to day by real users

•For successful software, this is almost all of its lifetime, and the

software evolves in response to observed failures and new

requirements

•Usual estimate is that over 85% of the total software effort is in

maintenance

•Four kinds of software maintenance: corrective, adaptive,

perfective and preventive

3

Software Maintenance

Kinds of Errors

•Corrective maintenance is concerned with fixing reportedfailures (errors) observed in the software

•These can themselves come in three varieties:• Coding errors

• typically easy and inexpensive to correct since they are confinedwithin one unit

• Design errors

• more expensive since they may involve changes to several units

• Requirements errors

• most expensive since they often involve extensive systemredesign (re-architecting) to correct

•Often a short-term fix is implemented first until a morepermanent change can be made later

4

Corrective Maintenance

New Environments

•Adaptive maintenance:

• Involves changing the software to work in some new

environment such as a different machine/hardware or

operating system

• involves changing one part of a software system due to

changes in another component or subsystem

•Characterized by no change in functionality, just a secondary

change due to other parts of the system evolving or a new

environment.

5

Adaptive Maintenance

Improving Performance and more…

•Perfective maintenance includes changes made to improve the

performance, maintainability, etc. of the program

•It does not always occur as the result of failures or faults in the

source code.

• e.g., improving architecture to enhance performance

• e.g., improving documentation or code formatting to

increase readability

6

Perfective Maintenance

Changes to Prevent Failure

•Preventive maintenance involves changes that may result from

faults uncovered by a programmer who addresses the fault so

that it does not become a failure.

e.g., better exception handling in a Java program

7

Preventive Maintenance

Maintaining Quality

•As time goes on, the software is often maintained by

programmers not involved in the initial design and

development

• These programmers tend to be more focused on the

changes than the whole product

•For this reason, testing has an even more important role in

quality assurance in the maintenance phase than it does in

initial development and delivery

• it helps to make sure that changes have not broken

anything

8

Maintenance Tests

Testing as a Maintenance Activity

•Thus testing is not a one-time thing - we’re never “done”

testing

•As software is maintained, if we are to maintain consistent

quality, we must continue testing – both of old existing

functionality, and of new introduced functionality

•For this reason, XP calls for continuous testing

• test every version, every day

•At a minimum, we must re-test thoroughly after every set of

changes, before the changed software is released

9

Continuous Testing Methods

Tests Are Part of the Product

•Most projects maintain test suites, sets of tests to be run onevery release of the software

•Maintained in parallel with the software - often with at least asmuch effort as the software itself

• as we have already seen, automation is essential to make thispractical

Kinds of Test Suites

•Three (related) kinds of continuous tests are normallyperformed and maintained continuously in softwaremaintenance

• Functionality (or acceptance) testing, failure testing,operational testing

10

Test Suites

Functionality Suites

•We have already seen functionality and acceptance testing suites (you've

built one!)

•When used continuously over the evolution of the software, we maintain the

functionality tests by:

• Every time a new feature is added, new tests specifically aimed at

testing that feature are permanently added to the test suite

(Recall that in XP, we must have these new tests since they form the

specification for the new software capabilities)

• Every time a feature is changed or extended, we change or extend

the corresponding functionality tests to correspond

11

Continuous Functionality Testing

Failure Suites

•Failure tests are suites of examples that have been observed to cause afailure of the software in the past

•To be effective, failure tests must be maintained over the evolution of thesoftware, by:

• Before correcting any observed failure, characterize the failure bycreating a "failure test" that causes it

• This becomes the specification of the fix - the changed software mustat a minimum correct the error for the test

• The failure test must cause the error in the old software and not causethe error in the new software

•We keep all such tests in a failure test suite to be re-run on all futureversions of the software, to insure that the failure does not reappear due to afuture change

12

Failure Testing

There’s No Substitute for the Real Thing

•Operational test suites must be created early in the production lifeof the software by sampling actual production runs

•Must be updated to add new real operational tests each timesignificant new or changed features are added to the software

•These tests form a sort of sanity check on the software… to make

sure that when we are about to release a new version, it will not

only still run our artificial tests but will also still handle real

customer input (could be embarrassing otherwise)

13

Operational Testing

Comprehensive Continuous Testing

•Regression testing refers to an automated continuous testingstrategy, whose purpose is to make sure that the software does not"regress" - that is, become less effective than it has been in the past

•Regression test suites are normally comprehensive, including (atleast) the three components:

• Functionality tests, to make sure that we still meet the basicrequirements

• Failure tests, to make sure that we haven’t recreated a past failure

• Operational tests, to make sure that we can still process realproduction runs

•Each of these is maintained, either together or separately, aspreviously described

14

Regression Testing

Purpose

•Insure that existing functionality and behaviour is not broken

by changes in new versions

•Insure that intended changes to functionality and behaviour

are actually observed

•Catch accidental or unintentional changes in functionality and

behaviour before deployment, reducing costs

15

Regression Testing

Method

•Maintain a regression set of test inputs designed to exhibit

existing functionality and behaviour

•Choose a set of observable artifacts of computation that

demonstrate desired aspects of functionality and behaviour (not

just output!)

•Maintain a history of the observable artifacts for each version

of the software

•Compare observable artifacts of each new version of software

to previous version to insure that differences are intentional

16

Regression Testing

Regression Series

•It’s really called regression testing because we incrementally

compare the results (functionality and behaviour) of the tests

for each new version of the software only to the previous

version

•And that one was compared to the one before it, and so on,

forming a regression series based on the original software

• It’s a sort of induction proof that we still have the behaviour

we want to maintain

Original New Vers 1 New Vers 2 New Vers 3

17

Regression Testing

Another Regression Series

•It’s also called regression testing because in order to keep the total

number of tests to be run at a practical level, we replace old tests with

new ones to “cover” the same cases but to include testing of new or

changed functionality

•This sequence of replaced tests covering previous tests also forms a

(more complex) regression series of test cases based on the original

test set, where old tests are retired from the set as new tests are

added to “cover” them

•The reasoning that the tests have not lost anything is also an

induction: new tests cover retired old tests, which in turn cover

previous older tests, and so on, back to the original validated test set

18

Regression Testing

Establishing a Baseline

•Begin with the original functionality test suite, plus early failure

tests (if any), plus first operational tests

•Validate that these tests all run correctly

•Choose the set of observable artifacts to be tracked - these

should characterize the functionality and behaviour we want to

maintain across versions (more on this later)

•Run these first tests and save the observable artifacts in an

easy to compare form (more on this later also)

19

Establishing a Regression Set

Adding and Retiring Tests

•Whenever functionality is added or changed in the software,

add and validate new tests for the new or changed functionality,

and retire the tests for the replaced old functionality

•Some practitioners retire failure tests after a fixed number of

new versions do not exhibit the failure, as a way to keep the

number of failure tests from growing too large

•Operational tests must also be maintained, and retired or

replaced when they no longer reflect current functionality

20

Maintaining a Regression Set

Time

Number

of

Tests

As software ages,

test set grows

out of control

Unless we retire

old tests covered

by new ones

21

Maintaining a Regression Set

Observable Artifacts

• What are examples of observable artifacts?

22

Choosing Observable Artifacts


•Observable artifacts include at least the direct outputs of the

software, but also other indicators of behaviour

•Because many programs have multiple kinds, streams or files

of output, we normally include all of them together in the

observable artifacts

•Because subtle unintended changes in behaviour may not be

immediately visible in direct test output, we normally turn on all

debugging, tracing and instrumenting flags the software may

have when running regression tests, in order to have more

detail in observable artifacts

23



•Because performance is part of the user-visible behaviour of

software, we normally measure time and space performance

when running regression tests, and add these to the observable

artifacts in order to observe unintended changes in

performance

•Most systems provide some kind of external performance

measuring tools such as the Unix “time” command, which can

be used give us this information

•In order to allow easy differencing, we normally translate all

observable artifacts to text in the stored test results

24


Combining Artifacts

•To allow easy differencing and archival, the entire set of

observable artifacts resulting from running all of the tests in the

entire set of regression tests is often combined into a single text

file

•This file includes the direct and indirect output, tracing and

debugging information, time and space statistics and all other

observable artifacts resulting from running each test, all

concatenated together in a fixed order into one text file

•This file forms a kind of behavioural signature for the version of

the software, storing every observable characteristic of its

behaviour on the test set in one file

25

Maintaining Observable Artifacts

Normalizing Signatures

•To allow easy differencing, it is important that irrelevant orintentional differences between versions be factored out

•Since the signature file is all text, this can be automated usingeditor scripts to normalize signature files to reduce or eliminatenon-behavioural or intended differences

•Example:

If the previous version of the software did all output in uppercase and the new version (intentionally) outputs mixed caseinstead, the new signature can be normalized to upper casebefore differencing

26

Differencing Artifacts

Establishing the Baseline

•The baseline is the signature file of the version used toestablish regression testing (the “original” version)

•The baseline signature must be carefully examined line by lineby hand to ensure that every artifact is as it should be (a lot ofwork)

•Once established, only differences need be examined forfuture versions

27


The Regression Test Harness

•The test harness is the implementation of a procedure for

automating the running, collection of observable artifacts and

differencing of versions for regression testing a product

•Should be developed such that it adapts automatically to

addition or deletion of test cases or individual tests

•Again, requires care in planning and implementation, but once

established requires very little work

28


Advantages

•Previous functionality never accidentally lost

•Previously fixed bugs never reappear in production

•Virtually all accidental bugs are caught before deployment

•Virtually no unintentional changes in behaviour slip into

production

•Users observe very high level of quality

Regression Testing

Disadvantages

•Regression set must be maintained with a high degree of

discipline and care - at least as carefully as the software itself

•Establishing the baseline and regression testing harness

requires significant effort - but it pays off in ease of use later

Bottom Line

•Every high quality software shop does it, because the

difference in confidence and observed quality is worth it!

Regression Testing

Testability

▪ The degree to which a system or component facilitates the

establishment of test criteria and the performance of tests

to determine whether those criteria have been met.

▪ The likelihood of exposing a fault through tests.

“assuming that a given software artifact contains a

fault, how likely is it that testing will reveal that fault”

• Testability is dominated by two practical problems

▪ How to provide the test values to the software

▪ How to observe the results of test execution31

Measuring Testability?

▪ If a program has a fault, how difficult will it be to find a test that causes

a failure ?

▪ Testability = | failure causing | / | Input | X 100 %

IMPRACTICAL TO MEASURE

▪ Testability can be approximated with the RIP model and mutation

▪ Given a location X in a program P

▪ R = % inputs from some distribution that reach X

▪ I = % inputs that cause a fault to infect (average over N faults)

▪ P = % infected states that propagate to output

▪ Sensitivity (X) = R * I * P

▪ Testability (P) = F (Sensitivity (X)), for all X in P

32

Controllability & Observability

▪ Controllability: Ability to affect the software behaviour (in

particular, replicate that behaviour).

“How easy it is to provide a program with the needed

inputs, in terms of values, operations, and behaviours.”

▪ Observability: Ability to observe software behaviour

“How easy it is to observe the behaviour of a program

in terms of its outputs, effects on the environment, and

other hardware and software components.

• Low observability and/or controllability imply low testability.

33

• Software maintenance, consisting of corrective, adaptive,

perfective and preventive steps, is the longest phase of

software development

• Continuous testing is essential to maintain quality during

software maintenance

• Regression testing combines functionality, failure and

operational testing in an automated continuous testing

framework

34

Summary

apresentação do powerpoint - ubingpombo/units/docs/sq/sq_t10_continuoust… · • design errors...

Documents