apresentação do powerpoint - ubingpombo/units/docs/sq/sq_t10_continuoust… · • design errors...
TRANSCRIPT
Continuous Testing
(adapted from lecture notes of the CSCI 3060U - Software Quality Assurance unit, J.S. Bradbury, J.R. Cordy, 2018)
Nuno Pombo, Qualidade de Software, 2019/20 1
Outline
•Today we look at the role of testing in software maintenance,and the need for continuous testing methods
•We’ll look at:
• software maintenance and evolution
• corrective, adaptive, perfective and preventivemaintenance
• continuous testing methods
• Maintaining functionality, failure, operational testsuites
• regression testing
• testability, controllability, observability
2
Continuous Testing
Evolution
•Maintenance is the phase of development in which the
software is in production day to day by real users
•For successful software, this is almost all of its lifetime, and the
software evolves in response to observed failures and new
requirements
•Usual estimate is that over 85% of the total software effort is in
maintenance
•Four kinds of software maintenance: corrective, adaptive,
perfective and preventive
3
Software Maintenance
Kinds of Errors
•Corrective maintenance is concerned with fixing reportedfailures (errors) observed in the software
•These can themselves come in three varieties:• Coding errors
• typically easy and inexpensive to correct since they are confinedwithin one unit
• Design errors
• more expensive since they may involve changes to several units
• Requirements errors
• most expensive since they often involve extensive systemredesign (re-architecting) to correct
•Often a short-term fix is implemented first until a morepermanent change can be made later
4
Corrective Maintenance
New Environments
•Adaptive maintenance:
• Involves changing the software to work in some new
environment such as a different machine/hardware or
operating system
• involves changing one part of a software system due to
changes in another component or subsystem
•Characterized by no change in functionality, just a secondary
change due to other parts of the system evolving or a new
environment.
5
Adaptive Maintenance
Improving Performance and more…
•Perfective maintenance includes changes made to improve the
performance, maintainability, etc. of the program
•It does not always occur as the result of failures or faults in the
source code.
• e.g., improving architecture to enhance performance
• e.g., improving documentation or code formatting to
increase readability
6
Perfective Maintenance
Changes to Prevent Failure
•Preventive maintenance involves changes that may result from
faults uncovered by a programmer who addresses the fault so
that it does not become a failure.
e.g., better exception handling in a Java program
7
Preventive Maintenance
Maintaining Quality
•As time goes on, the software is often maintained by
programmers not involved in the initial design and
development
• These programmers tend to be more focused on the
changes than the whole product
•For this reason, testing has an even more important role in
quality assurance in the maintenance phase than it does in
initial development and delivery
• it helps to make sure that changes have not broken
anything
8
Maintenance Tests
Testing as a Maintenance Activity
•Thus testing is not a one-time thing - we’re never “done”
testing
•As software is maintained, if we are to maintain consistent
quality, we must continue testing – both of old existing
functionality, and of new introduced functionality
•For this reason, XP calls for continuous testing
• test every version, every day
•At a minimum, we must re-test thoroughly after every set of
changes, before the changed software is released
9
Continuous Testing Methods
Tests Are Part of the Product
•Most projects maintain test suites, sets of tests to be run onevery release of the software
•Maintained in parallel with the software - often with at least asmuch effort as the software itself
• as we have already seen, automation is essential to make thispractical
Kinds of Test Suites
•Three (related) kinds of continuous tests are normallyperformed and maintained continuously in softwaremaintenance
• Functionality (or acceptance) testing, failure testing,operational testing
10
Test Suites
Functionality Suites
•We have already seen functionality and acceptance testing suites (you've
built one!)
•When used continuously over the evolution of the software, we maintain the
functionality tests by:
• Every time a new feature is added, new tests specifically aimed at
testing that feature are permanently added to the test suite
(Recall that in XP, we must have these new tests since they form the
specification for the new software capabilities)
• Every time a feature is changed or extended, we change or extend
the corresponding functionality tests to correspond
11
Continuous Functionality Testing
Failure Suites
•Failure tests are suites of examples that have been observed to cause afailure of the software in the past
•To be effective, failure tests must be maintained over the evolution of thesoftware, by:
• Before correcting any observed failure, characterize the failure bycreating a "failure test" that causes it
• This becomes the specification of the fix - the changed software mustat a minimum correct the error for the test
• The failure test must cause the error in the old software and not causethe error in the new software
•We keep all such tests in a failure test suite to be re-run on all futureversions of the software, to insure that the failure does not reappear due to afuture change
12
Failure Testing
There’s No Substitute for the Real Thing
•Operational test suites must be created early in the production lifeof the software by sampling actual production runs
•Must be updated to add new real operational tests each timesignificant new or changed features are added to the software
•These tests form a sort of sanity check on the software… to make
sure that when we are about to release a new version, it will not
only still run our artificial tests but will also still handle real
customer input (could be embarrassing otherwise)
13
Operational Testing
Comprehensive Continuous Testing
•Regression testing refers to an automated continuous testingstrategy, whose purpose is to make sure that the software does not"regress" - that is, become less effective than it has been in the past
•Regression test suites are normally comprehensive, including (atleast) the three components:
• Functionality tests, to make sure that we still meet the basicrequirements
• Failure tests, to make sure that we haven’t recreated a past failure
• Operational tests, to make sure that we can still process realproduction runs
•Each of these is maintained, either together or separately, aspreviously described
14
Regression Testing
Purpose
•Insure that existing functionality and behaviour is not broken
by changes in new versions
•Insure that intended changes to functionality and behaviour
are actually observed
•Catch accidental or unintentional changes in functionality and
behaviour before deployment, reducing costs
15
Regression Testing
Method
•Maintain a regression set of test inputs designed to exhibit
existing functionality and behaviour
•Choose a set of observable artifacts of computation that
demonstrate desired aspects of functionality and behaviour (not
just output!)
•Maintain a history of the observable artifacts for each version
of the software
•Compare observable artifacts of each new version of software
to previous version to insure that differences are intentional
16
Regression Testing
Regression Series
•It’s really called regression testing because we incrementally
compare the results (functionality and behaviour) of the tests
for each new version of the software only to the previous
version
•And that one was compared to the one before it, and so on,
forming a regression series based on the original software
• It’s a sort of induction proof that we still have the behaviour
we want to maintain
Original New Vers 1 New Vers 2 New Vers 3
17
Regression Testing
Another Regression Series
•It’s also called regression testing because in order to keep the total
number of tests to be run at a practical level, we replace old tests with
new ones to “cover” the same cases but to include testing of new or
changed functionality
•This sequence of replaced tests covering previous tests also forms a
(more complex) regression series of test cases based on the original
test set, where old tests are retired from the set as new tests are
added to “cover” them
•The reasoning that the tests have not lost anything is also an
induction: new tests cover retired old tests, which in turn cover
previous older tests, and so on, back to the original validated test set
18
Regression Testing
Establishing a Baseline
•Begin with the original functionality test suite, plus early failure
tests (if any), plus first operational tests
•Validate that these tests all run correctly
•Choose the set of observable artifacts to be tracked - these
should characterize the functionality and behaviour we want to
maintain across versions (more on this later)
•Run these first tests and save the observable artifacts in an
easy to compare form (more on this later also)
19
Establishing a Regression Set
Adding and Retiring Tests
•Whenever functionality is added or changed in the software,
add and validate new tests for the new or changed functionality,
and retire the tests for the replaced old functionality
•Some practitioners retire failure tests after a fixed number of
new versions do not exhibit the failure, as a way to keep the
number of failure tests from growing too large
•Operational tests must also be maintained, and retired or
replaced when they no longer reflect current functionality
20
Maintaining a Regression Set
Time
Number
of
Tests
As software ages,
test set grows
out of control
Unless we retire
old tests covered
by new ones
21
Maintaining a Regression Set
Observable Artifacts
• What are examples of observable artifacts?
22
Choosing Observable Artifacts
Observable Artifacts
•Observable artifacts include at least the direct outputs of the
software, but also other indicators of behaviour
•Because many programs have multiple kinds, streams or files
of output, we normally include all of them together in the
observable artifacts
•Because subtle unintended changes in behaviour may not be
immediately visible in direct test output, we normally turn on all
debugging, tracing and instrumenting flags the software may
have when running regression tests, in order to have more
detail in observable artifacts
23
Choosing Observable Artifacts
Observable Artifacts
•Because performance is part of the user-visible behaviour of
software, we normally measure time and space performance
when running regression tests, and add these to the observable
artifacts in order to observe unintended changes in
performance
•Most systems provide some kind of external performance
measuring tools such as the Unix “time” command, which can
be used give us this information
•In order to allow easy differencing, we normally translate all
observable artifacts to text in the stored test results
24
Choosing Observable Artifacts
Combining Artifacts
•To allow easy differencing and archival, the entire set of
observable artifacts resulting from running all of the tests in the
entire set of regression tests is often combined into a single text
file
•This file includes the direct and indirect output, tracing and
debugging information, time and space statistics and all other
observable artifacts resulting from running each test, all
concatenated together in a fixed order into one text file
•This file forms a kind of behavioural signature for the version of
the software, storing every observable characteristic of its
behaviour on the test set in one file
25
Maintaining Observable Artifacts
Normalizing Signatures
•To allow easy differencing, it is important that irrelevant orintentional differences between versions be factored out
•Since the signature file is all text, this can be automated usingeditor scripts to normalize signature files to reduce or eliminatenon-behavioural or intended differences
•Example:
If the previous version of the software did all output in uppercase and the new version (intentionally) outputs mixed caseinstead, the new signature can be normalized to upper casebefore differencing
26
Differencing Artifacts
Establishing the Baseline
•The baseline is the signature file of the version used toestablish regression testing (the “original” version)
•The baseline signature must be carefully examined line by lineby hand to ensure that every artifact is as it should be (a lot ofwork)
•Once established, only differences need be examined forfuture versions
27
Differencing Artifacts
The Regression Test Harness
•The test harness is the implementation of a procedure for
automating the running, collection of observable artifacts and
differencing of versions for regression testing a product
•Should be developed such that it adapts automatically to
addition or deletion of test cases or individual tests
•Again, requires care in planning and implementation, but once
established requires very little work
28
Differencing Artifacts
Advantages
•Previous functionality never accidentally lost
•Previously fixed bugs never reappear in production
•Virtually all accidental bugs are caught before deployment
•Virtually no unintentional changes in behaviour slip into
production
•Users observe very high level of quality
Regression Testing
Disadvantages
•Regression set must be maintained with a high degree of
discipline and care - at least as carefully as the software itself
•Establishing the baseline and regression testing harness
requires significant effort - but it pays off in ease of use later
Bottom Line
•Every high quality software shop does it, because the
difference in confidence and observed quality is worth it!
Regression Testing
Testability
▪ The degree to which a system or component facilitates the
establishment of test criteria and the performance of tests
to determine whether those criteria have been met.
▪ The likelihood of exposing a fault through tests.
“assuming that a given software artifact contains a
fault, how likely is it that testing will reveal that fault”
• Testability is dominated by two practical problems
▪ How to provide the test values to the software
▪ How to observe the results of test execution31
Measuring Testability?
▪ If a program has a fault, how difficult will it be to find a test that causes
a failure ?
▪ Testability = | failure causing | / | Input | X 100 %
IMPRACTICAL TO MEASURE
▪ Testability can be approximated with the RIP model and mutation
▪ Given a location X in a program P
▪ R = % inputs from some distribution that reach X
▪ I = % inputs that cause a fault to infect (average over N faults)
▪ P = % infected states that propagate to output
▪ Sensitivity (X) = R * I * P
▪ Testability (P) = F (Sensitivity (X)), for all X in P
32
Controllability & Observability
▪ Controllability: Ability to affect the software behaviour (in
particular, replicate that behaviour).
“How easy it is to provide a program with the needed
inputs, in terms of values, operations, and behaviours.”
▪ Observability: Ability to observe software behaviour
“How easy it is to observe the behaviour of a program
in terms of its outputs, effects on the environment, and
other hardware and software components.
• Low observability and/or controllability imply low testability.
33
• Software maintenance, consisting of corrective, adaptive,
perfective and preventive steps, is the longest phase of
software development
• Continuous testing is essential to maintain quality during
software maintenance
• Regression testing combines functionality, failure and
operational testing in an automated continuous testing
framework
34
Summary
35