traditional software testing is a failure! linda shafer university of texas at austin ...

Traditional Software Testing is a Failure!

Linda ShaferUniversity of Texas at Austin

[email protected]

Don ShaferChief Technology Officer, Athens Group, [email protected]

Copyright © 2002 Linda and Don Shafer


Seminar DedicationEnrico Fermi referred to these brilliant Hungarian scientists as “the Martians,” based on speculation that a spaceship from Mars dropped them all off in Budapest in the early 1900’s.



And let us not forget….

1994 John Harsányi (b. 5/29/1920, Budapest, Hungary, d. 2000) "For his pioneering analysis of equilibrium in the theory of non cooperative games.“Shared prize with A Beautiful Mind‘’s John Nash



Special Thanks to:

Rex BlackRBCS, Inc.31520 Beck RoadBulverde, TX 78163 USA

Phone: +1 (830) 438-4830Fax: +1 (830) 438-4831

[email protected]

Gary Cobb’s testing presentations for SWPM as excerpted from: SOFTWARE ENGINEERING: a Practitioner’s Approach, by Roger Pressman, McGraw-Hill, Fourth Edition, 1997Chapter 16: “Software Testing Techniques”Chapter 22: “Object-Oriented Testing”

UT SQI SWPM Software Testing Sessions from 1991 to 2002

… for all they provided in past test guidance and direct use of some material from the University of Texas at Austin Software Project Management Certification Program.



Traditional Development Life Cycle Model

REVIEW DETAIL DESIGN

Process Steps Process Gates Prototypes

REVIEW REQUIREMENTS DEFINITION

REVIEW HIGH LEVEL DESIGN

REVIEW SYSTEM CONSTRUCTION

REVIEW VERIFICATION & VALIDATION

REVIEW SYSTEM DELIVERY

PROTOTYPE1

PROTOTYPE2

PROTOTYPE3

POSTIMPLEMENTATION

REVIEW

Project Management Support ProcessesRisk Reduction Training Planning Configuration Management Estimating Metrics Quality Assurance



Software Failures – NIMBY!

It took the European Space Agency 10 years and $7 billion to produce Ariane 5, a giant rocket capable of hurling a pair of three-ton satellites into orbit with each launch and intended to give Europe overwhelming supremacy in the commercial space business. All it took to explode that rocket less than a minute into its maiden voyage last June, scattering fiery rubble across the mangrove swamps of French Guiana, was a small computer program trying to stuff a 64-bit number into a 16-bit space. http://www.around.com/ariane.html 1996 James Gleick

Columbia and other space shuttles have a history of computer glitches that have been linked to control systems, including left-wing steering controls, but NASA officials say it is too early to determine whether those glitches could have played any role in Saturday's shuttle disaster. … NASA designed the shuttle to be controlled almost totally by onboard computer hardware and software systems. It found that direct manual intervention was impractical for handling the shuttle during ascent, orbit or re-entry because of the required precision of reaction times, systems complexity and size of the vehicle. According to a 1991 report by the General Accounting Office, the investigative arm of Congress, "sequencing of certain shuttle events must occur within milliseconds of the desired times, as operations 10 to 400 milliseconds early or late could cause loss of crew, loss of vehicle, or mission failure." http://www.computerworld.com/softwaretopics/software/story/0,10801,78135,00.html After the hydraulic line on their V-22 Osprey aircraft ruptured, the four Marines on board had 30 seconds to fix the problem or perish. Following established procedures, when the reset button on the Osprey's primary flight control system lit up, one of the pilots — either Lt. Col. Michael Murphy or Lt. Col. Keith Sweaney — pushed it. Nothing happened. But as the button was pushed eight to 10 times in 20 seconds, a software failure caused the tilt-rotor aircraft to swerve out of control, stall and then crash near Camp Lejeune, N.C. …The crash was caused by "a hydraulic system failure compounded by a computer software anomaly," he said. Berndt released the results of a Corps legal investigation into the crash at an April 5 Pentagon press briefing. http://www.fcw.com/fcw/articles/2001/0409/news-osprey-04-09-01.asp



Software Failures – Offshore

Material Handling Drive Off under Dynamic Positioning

Computer science also differs from physics in that it is not actually a science. It does not study natural objects. Neither is it, as you might think, mathematics; although it does use mathematical reasoning pretty extensively. Rather, computer science is like engineering - it is all about getting something to do something, rather than just dealing with abstractions as in pre-Smith geology.

Richard Feynman, Feynman Lectures on Computation, 1996, pg xiii



Material Handling Under Software Control



Drive Off

Drive-off is caused by a fault in the thrusters and/or their control system and a fault in the position reference system. As for the drift-off incidents, the failure frequencies are derived from the DPVOA database, which consists of mainly DP Class 2 vessels. For the drive-off incident, these figures are representative also for the DP Class 3 vessels. The major difference between DP Class 2 and 3 is that only the latter is designed to withstand fire and flooding in any one compartment. These two incident types do not affect the drive-off frequency significantly due to the fact that drive-off is caused by logical failures in the thrusters, their control system and/or the position reference system. Hence, the DP Class 2 figures are used directly also for the DP Class 3 vessel.The frequency of fault in the thrusters and/or their control system can be obtained from data in Ref. /A-4/. The control system is here meant to include all relevant software and hardware. The statistics show that the software failure is about four times as frequent as the hardware failure and slightly more frequent than the pure thruster failure.

KONGSBERG OFFSHORE A.S. DEMO 2000 - LIGHT WELL INTERVENTION HSE EVALUATION REPORT NO. 00-4002, REVISION NO. 00



IMCA* Software Incidents as a Percentage of the Total

0%

20%

40%

60%

80%

100%

1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Average

Total Software Other Incidents

*International Marine Contractors Association



Class 1: “Serious loss of position with serious actual or potential consequences.”



General Testing Remarks

a critical element of software quality assurance represents the ultimate review of specification,

design and coding may use up to 40% of the development resources the primary goal of testing is to think of as many

ways as possible to bring the newly-developed system down

demolishing the software product that has just been built

requires learning to live with humanness of practitioners

requires an admission of failure in order to be good at it



Organizing for Software Testing

The software developer is always responsible for testing the modulesAn independent test group removes the COI of expecting the developer perform other testsCustomer witnessed testing

System Engineering

Software Requirements

Software Design

Code Unit Test

Validation test

Integration test

System test



Cost Impact of Software Defects

0

500

1000

1500

2000

2500

3000

3500

REQ PD DD+CUT I&T

Req Anal Prod Des Prog. Test Plan V &V Proj Off. CM/QA Manuals

Model Output

1 4 70

5

10

15

1 4 7 10 13 16

PDR

CDRTRR

FCA/PCA

Sta

ffin

g

mo.

Benefits of formal reviews early discovery of software

defects design activities introduce

between 50%-65% or all defects of the development process

formal reviews can find up to 75% of the design flaws

design1%

coding8%release

73% testing16%



CostScheduleProduct QualityEase-of-ManagementCustomer LoyaltyConfidence Among DevelopersCapitalized Reuse

0

20

40%

RE

Q

PD

DD

CU

T IT

O&

M

OO

Conventional

Percent Effort by PhaseCost Differential

FutureToday

Adjusting the Software Adjusting the Software Development Life CycleDevelopment Life Cycle



Questions People Ask About TestingShould testing instill guilt? Is testing really destructive?Why aren’t there methods of eliminating bugs at the time they are being injected?Why isn’t a successful test one that inputs data correctly, gets all the right answers and outputs them correctly?Why do test cases need to be designed?Why can’t you test most programs completely?Why can’t the software developer simply take responsibility for finding/fixing all his/her errors?How many testers does it take to change a light bulb?



Testing ObjectivesGlenford J. Myers (IBM, 1979)

1. Testing is a process of executing a program with the intent of finding an error

2. A good test case is one that has a high probability of finding an as yet undiscovered error

3. A successful test is one that uncovers an as yet undiscovered error

Test Information Flow

Testing

Evaluation

ReliabilityModel

Debug

SoftwareConfiguration

TestConfiguration

TestResults

ExpectedResults

Errorrate data

Errors

Correctness

PredictedReliability



Testing Principles

All tests should be traceable to customer requirementsTests should be planned long before testing beginsThe Pareto principle applies to software testing 80% of all errors will likely be in

20% of the modulesTesting should begin “in the small” and progress toward testing “in the large”Exhaustive testing is not possibleTesting should be conducted by an independent third party

“The test team should always

know how many errors still

remain in the code

- for when the manager comes

by.”

“The test team should always

know how many errors still

remain in the code

- for when the manager comes

by.”



About Software TestingSoftware testing - a multi-step strategy

series of test case design methods that help ensure effective error detection

testing is not a safety net or replacement of software quality SQA typically conducts independent audits of

a product’s compliance to standards SQA assures that the test team follows the

policies & procedures testing is not 100% effective in

uncovering defects in software A real good testing effort will ship no more

than 2.5 defects/KSLOC(thousand source lines of code)



Test PlanningTest Planning

RequirementsInspections

SoftwareArchitectureInspections

DesignInspections

Code Inspections

Test Inspections

I. Scope of testingII. Test Plan

A. Test phases and buildsB. ScheduleC. Overhead softwareD. Environment and resources

III. Test procedure nA. Order of integration

1. Purpose2. Modules to be tested

B. Unit tests for modules in build1. Description of tests for module m2. Overhead software description3. Expected results

C. Test Environment1. Special tools or techniques2. Scaffolding software description

D. Test case dataE. Expected results for build n

IV. Actual test resultsV. ReferencesVI. Appendices

Test Specification Outline

Glenford Myers’Sandwich Method



Sample Review Checklist

Have major test phases properly been identified and sequenced?

Has traceability to validation criteria/requirements been established as part of software requirements analysis?

Are major functions demonstrated early? Is the test plan consistent with the overall

project plan? Has a test schedule been explicitly defined? Are test resources and tools identified and

available? Has a test record-keeping mechanism been

established? Have test drivers and stubs been identified and

has work to develop them been scheduled? Has stress testing for software been specified



Sample Review Checklist

Software Test Procedure Have both white and black box tests

been specified? Have all the independent logic paths

been tested? Have test cases been identified and listed

with their expected results? Is error handling to be tested? Are boundary values to be tested? Are timing and performance to be tested? Has an acceptable variation from the

expected results been specified?



Bach’s 1994 Testability Checklist

Testability - how easily a computer program can be tested depends on the following:

Operability Observability Controlability Decomposability Simplicity Stability Understandability



Some of the Art of Test Planning

Test Time Execution EstimationHow Long to get the Bugs Out?Realism not OptimismRules of ThumbLight Weight Test Plan



How Can We Predict Test Execution Time?

When will you be done executing the tests?Part of the answer is when you’ll have run all the planned tests once

Total estimated test time (sum for all planned tests) Total person-hours of tester time available per week Time spent testing by each tester

The other part of the answer is when you’ll have found the important bugs and confirmed the fixesEstimate total bugs to find, bug find rate, bug fix rate, and closure period (time from find to close) for bugs

Historical data really helps Formal defect removal models are even more accurate

Copyright © 1996-2002 Rex Black



How Long to Run the Tests?

It depends a lot on how you run tests Scripted vs. exploratory Regression testing strategy (repeat tests or just run once?)

What I often do Plan for consistent test cycles (tests run per test release) and

passes (running each test once) Realize that buggy deliverables and uninstallable builds slow

test execution…and plan accordingly Try to understand the amount of confirmation testing, as a

large number of bugs leads to lots of confirmation testing Check number of cycles with bug prediction

Testers spend less than 100% of their time testing E-mail, meetings, reviewing bugs and tests, etc. I plan six hours of testing in a nine-to-ten hour day

(contractor) Four hours of testing in an eight hour day is common

(employee)




How Long to Get the Bugs Out?

Using historical data and test cycle model, you can develop a simple model for bugsPredicting the total number bugs is hard

Subject to many factors and nonlinear effects Common techniques: bugs per developer-day,

KSLOC, or function point (FP) Project size is usually the easiest factor to estimate

Bug injection rates, fix rates, and closure periods are beyond the test team’s control

You can predict, but document assumptions The more historical data you have for similar (size,

staff, technology, etc.) projects, the more accurate and confident you can be




This chart was created in about one hour using historical data from a couple projects and some simplified models of bug find/fix rates. The more data you have, the more accurate your model can be, and the more you can predict the

accuracy.Copyright © 1996-2002 Rex Black



Objective: Realism, Not Optimism

Effort, dependencies, resources accurately forecastEqual likelihood for each task to be early as lateBest-case and worst-case scenarios knownPlan allows for corrections to early estimatesRisks identified and mitigated, especially…

Gaps in skills Risky technology Logistical issues and other tight critical paths Excessive optimism

“[Optimism is the] false assumption that… all will go well, i.e., that each task will take only as long as it ‘ought’ to take.”

--Fred Brooks, in The Mythical Man-month, 1975.




How About those “Rules of Thumb”?

Various rules of thumb exist for effort, team size, etc., e.g., developer-to-tester ratios, Jones’ Estimating Software CostsHow accurate are these rules?

Jones’ test development rule: ~ 1 hr/case Across four projects, I averaged 9 hrs./case:

almost an order of magnitude difference Imagine if I’d used Jones’ rule on IA project Even on my own projects, lots of variation

However, historical data from previous projects with similar teams, technologies, and techniques is useful for estimationYou can adjust data from dissimilar projects if you understand the influential factors

Person-hours per test case developed varied widely

across four projects. Factors included test precision,

number of conditions per case, automation, custom tool development, test data needs,

and testware re-use




Refining the Estimate

For each key deliverable, look at size and complexity Small, medium, large and simple, moderate, complex Do you have data from past projects that can help you

confirm the effort and time to develop deliverables of similar size and complexity?

Iterate until it works Must fit overall project schedule… …but not disconnected from reality Check “bottom-up” (work-breakdown-structure) estimates

against “top-down” rules of thumb, industry benchmarks, models relating metrics like FP or KSLOC to test size

Estimates will change (by as much as 200%) as the project proceeds (knowledge is gained, requirements evolve)




A Lightweight Test Plan Template

OverviewBounds

Scope Definitions Setting

Quality RisksSchedule of MilestonesTransitions

Entry Criteria Continuation Criteria Exit Criteria

Test Configurations and Environments

Test DevelopmentTest Execution

Key Participants Test Case and Bug Tracking Bug Isolation and

Classification Release Management Test Cycles Test Hours

Risks and ContingenciesChange HistoryReferenced DocumentsFrequently Asked Questions

You can find this template, a case study using it, and three other test plan templates at www.rexblackconsulting.com




How well did you test???

Questio

ns ???



Linda Shafer Bio:Linda Shafer has been working with the software industry since 1965, beginning with NASA in the early days of the space program. Her experience includes roles of programmer, designer, analyst, project leader, manager, and SQA/SQE. She has worked for large and small companies, including IBM, Control Data Corporation, Los Alamos National Laboratory, Computer Task Group, Sterling Information Group, and Motorola. She has also taught for and/or been in IT shops at The University of Houston, The University of Texas at Austin, The College of William and Mary, The Office of the Attorney General (Texas) and Motorola University. Ms. Shafer's publications include 25 refereed articles, and three books. She currently works for the Software Quality Institute and co-authored a SQI Software Engineering Series book published by PrenHall in 2002: Quality Software Project Management. She is on the International Press Committee of the IEEE and an author in the Software Engineering Series books for IEEE. Her MBA is from the University of New Mexico.



Don Shafer Bio:Don Shafer is a co-founder, corporate director and Chief Technology Officer of Athens Group, Inc. Incorporated in June 1998, Athens Group is an employee-owned consulting firm, integrating technology strategy and software solutions. Prior to Athens Group, Shafer led groups developing and marketing hardware and software products for Motorola, AMD and Crystal Semiconductor. He was responsible for managing a $129 million-a-year PC product group that produced award-winning audio components. From the development of low-level software drivers in yet-to-be-released Microsoft operating systems to the selection and monitoring of Taiwan semiconductor fabrication facilities, Shafer has led key product and process efforts. In the past three years he has led Athens engineers in developing industry standard semiconductor fab equipment software interfaces, definition of 300mm equipment integration tools, advanced process control state machine data collectors and embedded system control software agents. His latest patents are on joint work done with Agilent Technologies in state-based machine control. He earned a BS degree from the USAF Academy and an MBA from the University of Denver. Shafer’s work experience includes positions held at Boeing and Los Alamos National Laboratories. He is currently an adjunct professor in graduate software engineering at Southwest Texas His faculty web site is http://www.cs.swt.edu/~donshafer/. With two other colleagues in 2002, he wrote Quality Software Project Management for Prentice-Hall now used in both industry and academia. Currently he is working on an SCM book for the IEEE Software Engineering Series.

traditional software testing is a failure! linda shafer university of texas at austin ...

Documents

software systems

software engineering

software failures nimby

space shuttles

small computer program

commercial space business

history of computer

european space agency