1 cs 501 spring 2008 cs 501: software engineering lecture 21 reliability 3

1 CS 501 Spring 2008

CS 501: Software Engineering

Lecture 21

Reliability 3


Administration

Final presentations

Sign up for your presentations now.

Weekly progress reports

Remember to send your progress reports to your TA.


Some Notable Bugs

Even commercial systems may have horrific bugs

• Built-in function in Fortran compiler (e0 = 0)

• Japanese microcode for Honeywell DPS virtual memory

• The microfilm plotter with the missing byte (1:1023)

• The Sun 3 page fault that IBM paid to fix

• Left handed rotation in the graphics package

• The preload system with the memory leak

Good people work around problems.The best people track them down and fix them!


The Heisenbug


Fault Tolerance

General Approach:

• Failure detection

• Damage assessment

• Fault recovery

• Fault repair


Fault Tolerance

Basic Techniques:

• Timers and timeout in networked systems

• After error continue with next transaction (e.g., drop packet)

• User break options (e.g., force quit, cancel)

• Error correcting codes in data (e.g., RAID)

• Bad block tables on disk drives

• Forward and backward pointers in databases

Report all errors for quality control


Fault Tolerance

Backward Recovery:

• Record system state at specific events (checkpoints). After failure, recreate state at last checkpoint.

• Backup of files

• Combine checkpoints with system log (audit trail of transactions) that allows transactions from last checkpoint to be repeated automatically.

• Test the restore software!


Fault Tolerance

Google and Hadoop Files Systems

• Clusters of commodity computers (1,000+ computers, 1,000+ TB)

"Component failures are the norm rather than the exception....We have seen problems caused by application bugs, operating system bugs, human errors, and the failures of disks, memory, connectors,networking, and power supplies."

• Data is stored in large chunks (64 MB).

• Each chunk is replicated, typically with three copies.

• If component fails, new replicas are created automatically.

Ghemawat, et al., The Google File System. 19th ACM Symposium on Operating Systems Principles, October 2003


Fault Tolerance

N-version programming

• Execute independent implementation in parallel, compare results, accept the most probable.

• Used when extreme reliability is required with no opportunity to repair (e.g., space craft).

• Difficulty is to ensure that the implementations are independent (e.g., separate power supplies, sensors, algorithms).

10 CS 501 Spring 2008

Software Engineering for Real Time

The special characteristics of real time computing require extra attention to good software engineering principles:

• Requirements analysis and specification

• Special techniques (e.g., locks on data, semaphores, etc.)

• Development of tools

• Modular design

• Exhaustive testing

Heroic programming will fail!

11 CS 501 Spring 2008

Software Engineering for Real Time

Testing and debugging need special tools and environments

• Debuggers, etc., can not be used to test real time performance

• Simulation of environment may be needed to test interfaces -- e.g., adjustable clock speed

• General purpose tools may not be available

12 CS 501 Spring 2008

Validation and Verification

Validation: Are we building the right product?

Verification: Are we building the product right?

In practice, it is sometimes difficult to distinguish between the two.

That's not a bug. That's a feature!

13 CS 501 Spring 2008

The Testing Process

Unit, System and Acceptance Testing are major parts of a software project

• It requires time on the schedule

• It may require substantial investment in test data, equipment, and test software.

• Good testing requires good people!

• Documentation, including management and client reports, are important parts of testing.

What is the definition of "done"?

14 CS 501 Spring 2008

Test Design

Testing can never prove that a system is correct.

It can only show that either (a) a system is correct in a special case, or (b) that it has a fault.

• The objective of testing is to find faults.

• Testing is never comprehensive.

• Testing is expensive.

15 CS 501 Spring 2008

Testing Strategies

• Bottom-up testing. Each unit is tested with its own test environment.

• Top-down testing. Large components are tested with dummy stubs.

user interfaceswork-flowclient and management demonstrations

• Stress testing. Tests the system at and beyond its limits.

real-time systemstransaction processing

16 CS 501 Spring 2008

Methods of Testing

Closed box testing

Testing is carried out by people who do not know the internals of what they are testing.

Example. IBM educational demonstration that was not foolproof

Open box testing

Testing is carried out by people who know the internals of what they are testing.

Example. Tick marks on the graphing package

17 CS 501 Spring 2008

Stages of Testing

Testing is most effective if divided into stages

Unit testing unit test

System testing integration test function test performance test installation test

Acceptance testing

18 CS 501 Spring 2008

Testing: Unit Testing

• Tests on small sections of a system, e.g., a single class

• Emphasis is on accuracy of actual code against specification

• Test data is chosen by developer(s) based on their understanding of specification and knowledge of the unit

• Can be at various levels of granularity

• Open box or closed box: by the developer(s) of the unit or by special testers

If unit testing is not thorough, system testing becomes almost impossible. If your are working on a project that is behind schedule, do not rush the unit testing.

19 CS 501 Spring 2008

Testing: System and Sub-System Testing

• Tests on components or complete system, combining units that have already been thoroughly tested

• Emphasis on integration and interfaces

• Trial data that is typical of the actual data, and/or stresses the boundaries of the system, e.g., failures, restart

• Carried out systematically, adding components until the entire system is assembled

• Open or closed box: by development team or by special testers

System testing is finished fastest if each component is completely debugged before assembling the next

20 CS 501 Spring 2008

Testing:Acceptance Testing

• Closed box: by the client

• The entire system is tested as a whole

• The emphasis is on whether the system meets the requirements

• Uses real data in realistic situations, with actual users, administrators, and operators

The acceptance test must be successfully completed before the new system can go live or replace a legacy system.

Completion of the acceptance test may be a contractual requirement before the system is paid for.

21 CS 501 Spring 2008

Variants of Acceptance Testing

Alpha Testing: Clients operate the system in a realistic but non-production environment

Beta Testing: Clients operate the system in a carefully monitored production environment

Parallel Testing: Clients operate new system alongside old production system with same data and compare results

22 CS 501 Spring 2008

Test Cases

Test cases are specific tests that are chosen because they are likely to find faults.

Test cases are chosen to balance expense against chance of finding serious faults.

• Cases chosen by the development team are effective in testing known vulnerable areas.

• Cases chosen by experienced outsiders and clients will be effective in finding gaps left by the developers.

• Cases chosen by inexperienced users will find other faults.

23 CS 501 Spring 2008

Test Case Selection: Coverage of Inputs

Objective is to test all classes of input

• Classes of data -- major categories of transaction and data inputs.

Cornell example: (undergraduate, graduate, transfer, ...) by (college, school, program, ...) by (standing) by (...)

• Ranges of data -- typical values, extremes

• Invalid data

• Reversals, reloads, restarts after failure

24 CS 501 Spring 2008

Test Case Selection: Program

Objective is to test all functions of each computer program

• Paths through the computer programs

Program flow graphCheck that every path is executed at least once

• Dynamic program analyzers

Count number of times each path is executed

Highlight or color source code

Can not be used with time critical software

25 CS 501 Spring 2008

Test Strategies: Program

(a) Statement analysis

(b) Branch testing

If every statement and every branch is tested is the program correct?

26 CS 501 Spring 2008

Statistical Testing

• Determine the operational profile of the software

• Select or generate a profile of test data

• Apply test data to system, record failure patterns

• Compute statistical values of metrics under test conditions

27 CS 501 Spring 2008

Statistical Testing

Advantages:

• Can test with very large numbers of transactions

• Can test with extreme cases (high loads, restarts, disruptions)

• Can repeat after system modifications

Disadvantages:

• Uncertainty in operational profile (unlikely inputs)

• Expensive

• Can never prove high reliability

28 CS 501 Spring 2008

Regression Testing

Regression Testing is one of the key techniques of Software Engineering

When software is modified regression testing is to provide confidence that modifications behave as intended and do not adversely affect the behavior of unmodified code.

• Basic technique is to repeat entire testing process after every change, however small.

29 CS 501 Spring 2008

Regression Testing: Program Testing

1. Collect a suite of test cases, each with its expected behavior.

2. Create scripts to run all test cases and compare with expected behavior. (Scripts may be automatic or have human interaction.)

3. When a change is made to the system, however small (e.g., a bug is fixed), add a new test case that illustrates the change (e.g., a test case that revealed the bug).

4. Before releasing the changed code, rerun the entire test suite.

30 CS 501 Spring 2008

Documentation of Testing

Testing should be documented for thoroughness, visibility and for maintenance

(a) Test plan

(b) Test specification and evaluation

(c) Test suite and description

(d) Test analysis report

31 CS 501 Spring 2008

A Note on User Interface Testing

User interfaces need two categories of testing.

• During the design phase, user interface testing is carried out with trial users to ensure that the design is usable. Design testing is also used to develop graphical elements and to validate the requirements.

• During the implementation phase, the user interface goes through the standard steps of unit and system testing to check the reliability of the implementation.

Acceptance testing is then carried out with users on the complete system.

32 CS 501 Spring 2008

How we’re user testing:

- One-on-one, 30-45 min user tests with staff levels

- Specific tasks to complete

- No prior demonstration or training

- Pre-planned questions designed to stimulate feedback

- Emphasis on testing system, not the stakeholder!

- Standardized tasks / questions among all testers

A CS 501 Project: Methodology

The next few slides are from a CS 501 presentation (second milestone)

33 CS 501 Spring 2008

How we’re user testing:

Types of questions we asked:

- Which labels, keywords were confusing?

- What was the hardest task?

- What did you like, that should not be changed?

- If you were us, what would you change?

- How does this system compare to your paper based system

- How useful do you find the new report layout? (admin)

- Do you have any other comments or questions about the system? (open ended)

A CS 501 Project: Methodology

34 CS 501 Spring 2008

What we’ve found: Issue #1, Search Form Confusion!

A CS 501 Project: Results

35 CS 501 Spring 2008


What we’ve found: Issue #2, Inconspicuous Edit/ Confirmations!

36 CS 501 Spring 2008


What we’ve found: Issue #3, Confirmation Terms

37 CS 501 Spring 2008


What we’ve found: Issue #4, Entry Semantics

38 CS 501 Spring 2008

Results, Addressing

What we’ve found: #5, Search Results Disambiguation & Semantics

39 CS 501 Spring 2008

Fixing Bugs

Isolate the bugIntermittent --> repeatableComplex example --> simple example

Understand the bugRoot causeDependenciesStructural interactions

Fix the bugDesign changesDocumentation changesCode changes

40 CS 501 Spring 2008

Moving the Bugs Around

Fixing bugs is an error-prone process!

• When you fix a bug, fix its environment

• Bug fixes need static and dynamic testing

• Repeat all tests that have the slightest relevance (regression testing)

Bugs have a habit of returning!

• When a bug is fixed, add the failure case to the test suite for the future.

41 CS 501 Spring 2008

Maintenance

Most production programs are maintained by people other than the programmers who originally wrote them.

(a) What factors make a program easy for somebody else to maintain?

(b) What factors make a program hard for somebody else to maintain?

1 cs 501 spring 2008 cs 501: software engineering lecture 21 reliability 3

Documents

heisenbug slide

fault tolerance google

page fault

quality control slide

real time testing

software engineering

special tools

commercial systems