eit-061 software quality engineering

74
NOTES Note: Notes are completely design for Class Notes and Semester Preparation purpose by taking references from Various books and Internet. SOFTWARE QUALITY ENGINEERING IT VI SEM By: Pradeep Sharma GCET

Upload: pradeep-sharma

Post on 04-Sep-2014

170 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: EIT-061 Software Quality Engineering

NOTESNote: Notes are completely design for Class Notes and Semester Preparation purpose by taking references from Various books and Internet.

SOFTWARE QUALITY ENGINEERING

9L¢πлсм

IT VI SEM

By: Pradeep Sharma

GCET

SQE-EIT-061-Pradeep-1

Page 2: EIT-061 Software Quality Engineering

ό9L¢ψлсм) SOFTWARE QUALITY ENGINEERINGSyllabus

UNIT-I Introduction Defining Software Quality, Software Quality Attributes and Specification, Cost of Quality, Defects, Faults, Failures, Defect Rate and Reliability, Defect Prevention, Reduction, and Containment, Overview of Different Types of Software Review, Introduction to Measurement and Inspection Process, Documents and Metrics. UNIT-II Software Quality Metrics Product Quality Metrics: Defect Density, Customer Problems Metric, Customer Satisfaction Metrics, Function Points, In-Process Quality Metrics: Defect Arrival Pattern, Phase-Based Defect Removal Pattern, Defect Removal Effectiveness, Metrics for Software Maintenance: Backlog Management Index, Fix Response Time, Fix Quality, Software Quality Indicators. UNIT-III Software Quality Management and Models Modeling Process, Software Reliability Models: The Rayleigh Model, Exponential Distribution and Software Reliability Growth Models, Software Reliability Allocation Models, Criteria for Model Evaluation, Software Quality Assessment Models: Hierarchical Model of Software Quality Assessment. UNIT-IV Software Quality Assurance Quality Planning and Control, Quality Improvement Process, Evolution of Software Quality Assurance (SQA), Major SQA Activities, Major SQA Issues, Zero Defect Software, SQA Techniques, Statistical Quality Assurance, Total Quality Management, Quality Standards and Processes. UNIT-V Software Verification, Validation & Testing: Verification and Validation, Evolutionary Nature of Verification and Validation, Impracticality of Testing all Data and Paths, Proof of Correctness, Software Testing, Functional, Structural and Error-Oriented Analysis & Testing, Static and Dynamic Testing Tools, Characteristics of Modern Testing Tools.

SQE-EIT-061-Pradeep-2

Page 3: EIT-061 Software Quality Engineering

Unit –IIntroduction to Software Quality

SOFWARE QUALITY:The totality of features and characteristics of a software product that bear on its ability to satisfy stated or implied needs.

orSoftware quality is called the conformance to explicitly state functional and performance requirements, documented development standards, and implicit characteristics.

orSoftware quality is defined as the quality that ensures customer satisfaction by offering all the customer deliverables on performance, standards and ease of operations.

Important points:- software requirements are the foundation from which quality is measured ;- specified standards define development criteria that guide the manner in which the software is engineered ;- if the software meets only the explicit requirements, and does not meet the implicit requirements, the software quality is suspect.

Software Quality Characteristics: A set of attributes of a software product by which its quality is described and evaluated. A software quality characteristic may be refined into multiple levels of sub-characteristics.

Software Quality may be evaluated by the following characteristics: Functionality Reliability Usability Efficiency Maintainability Portability

SOFTWARE QUALITY CHARACTERISTICS/ATTRIBUTES:

Functionality:A set of attributes that bear on the existence of a set of functions and their specified properties. The functions are those that satisfy stated or implied needs.

Reliability:A set of attributes that bear on the capability of software to maintain its level of performance under stated conditions for a stated period of timeUsability:

SQE-EIT-061-Pradeep-3

Page 4: EIT-061 Software Quality Engineering

A set of attributes that bear on the effort needed for use, and on the individual assessment of such use by a stated or implied set of users.

Efficiency:A set of attributes that bear on the relationship between the level of performance of the software and the amount of resources used, under stated conditions.

Maintainability:A set of attributes that bear on the effort needed to make specified modified modifications

Portability:A set of attributes that bear on the ability of software to be transferred from on environment to another

Quality Sub-characteristicsFunctionality

Suitability Accuracy Interoperability Compliance Security

Reliability Maturity Fault-tolerance Recoverability

Usability Understandability Learnability Operability

Efficiency Time Behavior

Resource behavior

SQE-EIT-061-Pradeep-4

Page 5: EIT-061 Software Quality Engineering

Software Quality Attribute Trade-offsDesigners need to analyze trade-offs between multiple conflicting attributes to satisfy use requirements. The ultimate goal is the ability to quantitatively evaluate and trade off multiple quality attributes to arrive at a better overall system. We should not look for a single, universal metric, but rather for quantification of individual attributes and for trade-off between these different metrics, starting with a description of the software architecture.

Generic Taxonomy for Quality Attributes:Attributes will be thought of as properties of the service delivered by the system to its users. The service delivered by a system is its behavior as it is perceived by its user(s); a user is another system (physical or human) which interacts with the former [Laprie 92]. We think of the service as being initiated by some event, which is a stimulus to the system signaling the need for the service. The stimulus can originate either within the system or external to the system.

For each quality attribute (performance, dependability, security and safety) we use a taxonomy(see Figure 2-2) that identifies:

Concerns — the parameters by which the attributes of a system are judged, specified and measured. Requirements are expressed in terms of concerns.

Attribute-specific factors —properties of the system (such as policies and mechanisms built into the system) and its environment that have an impact on the concerns. Depending on the attribute, the attribute-specific factors are internal or external properties affecting the concerns. Factors might not be independent and might have cause/effect relationships. Factors and their relationships would be included in the system’s architecture:

• Performance factors — the aspects of the system that contribute to performance. These include the demands from the environment and the system responses to these demands.

• Dependability impairments — the aspects of the system that contribute to(lack of) dependability. There is a causal chain between faults inside the system and failures observed in the environment. Faults cause errors; an error is a system state that might lead to failure if not corrected.

SQE-EIT-061-Pradeep-5

Page 6: EIT-061 Software Quality Engineering

• Security factors — the aspects of the system that contribute to security. These include system/environment interface features and internal features such as kernelization.

• Safety impairments — the aspects of the system that contribute to (lack of)safety. Hazards are conditions or system states that can lead to a mishap or accident. Mishaps are unplanned events with undesirable consequences.

Methods — how we address the concerns: analysis and synthesis processes during the development of the system, and procedures and training for users and operators. Methods can be for analysis and/or synthesis, procedures and/or training, or procedures used at development or execution time.

COST OF QUALITY:

Quality costs are the costs associated with preventing, finding, and correcting defective work. These costs are huge, running at 20% - 40% of sales. Many of these costs can be significantly reduced or completely avoided. One of the key functions of a Quality Engineer is the reduction of the total cost of quality associated with a product.

Quality cost may be divided into costs associated with prevention, appraisal and failure.

• Prevention Costs: Costs of activities that are specifically designed to prevent poor quality. Examples of “poor quality” include coding errors, design errors, mistakes in the user manuals, as well as badly documented or un maintainable complex code.

Note that most of the prevention costs don’t fit within the Testing Group’s budget. This money is spent by the programming, design, and marketing staffs.

• Appraisal Costs: Costs of activities designed to find quality problems, such as code inspections and any type of testing. Design reviews are part prevention and part appraisal. To the degree that you’re looking for errors in the proposed design itself when you do the review, you’re doing an appraisal. To the degree that you are looking for ways to strengthen the design, you are doing prevention.

• Failure Costs: Costs that result from poor quality, such as the cost of fixing bugs and the cost of dealing with customer complaints.

SQE-EIT-061-Pradeep-6

Page 7: EIT-061 Software Quality Engineering

• Internal Failure Costs: Failure costs that arise before your company supplies its product to the customer. These costs go beyond the obvious costs of finding and fixing bugs. Many of the internal failure costs are borne by groups outside of Product Development.

For example, if your company sells thousands of copies of the same program, you will probably print several thousand copies of a multi-color box that contains and describes the program. You (your company) will often be able to get a much better deal by booking press time with the printer in advance. However, if you don’t get the artwork to the printer on time, you might have to pay for some or all of that wasted press time anyway, and then you may to pay additional printing fees and rush charges to get the printing done on the new schedule. This can be an added expense of many thousands of dollars.

Sometimes the programming group will treat user interface errors as low priority, leaving them until the end to fix. This can be a big mistake. The marketing staff (or packaging production staff) need pictures of the product’s screen long before the program is finished, in order to get the artwork for the box into the printer on time. User interface bugs – the ones that will be fixed later – can make it hard for these staff members to take (or mock up) accurate screen shots. Delays caused by these minor design flaws, or by bugs that block a packaging staff member from creating or printing special reports, can cause the company to miss its printer deadline.

• External Failure Costs: Failure costs that arise after your company supplies the product to the customer, such as customer service costs, or the cost of patching a released product and distributing the patch.

• Total Cost of Quality: The sum of all the costs (Prevention + Appraisal + Internal Failure + External Failure).

DEFECTS, FAULTS, FAILURES, DEFECT RATE AND RELIABILITY, DEFECT PREVENTION

• ERROR– Human action that results in software containing a fault.

• FAULT–A defect in code that can be the cause of one or more failures (synonymous with “bug”).

• FAILURE–The inability of a system or system component to perform a required function within specified limits. A departure of program operation from program requirements.

Software Errors, software faults and software failures● Bug/defect/fault consequence of a human error● results in non-conformance to requirements● manifests as failure in running software

Quality Defect DiscoveryA central research problem in software maintenance is still the inability to change software easily and quickly . To improve the quality of their products, organizations often use quality assurance techniques to tackle defects that reduce software quality. The techniques for the discovery of quality defects are based upon several research fields.

SQE-EIT-061-Pradeep-7

Page 8: EIT-061 Software Quality Engineering

• Software Inspections , and especially code inspections are concerned with the process of manually inspecting software products in order to find potential ambiguities as well as functional and non-functional problems. While the specific evaluation of code fragments is probably more precise than automated techniques, the effort for the inspection is higher, the completeness of an inspection regarding the whole system is smaller, and the number of quality defects searched for is smaller.

• Software Testing and debugging is concerned with the discovery of defects regarding the functionality and reliability as defined in a specification or unit test case in static and dynamic environments.

• Software product metrics are used in software analysis to measure the complexity, cohesion, coupling, or other characteristics of the software product, which are further analyzed and interpreted to estimate the effort for development or to evaluate the quality of the software product. Tools for software analysis in existence today are used to monitor dynamic or static aspects of software systems in order to manually identify potential problems in the architecture or find sources for negative effects on the quality.

DEFECT PREVENTION, REDUCTION, AND CONTAINMENT

By understanding the previous definitions of defect, error, fault and failure, defects can be dealt in three categories namely

Defects prevention through error removal Defect reduction through fault detection and removal Defect containment through failure prevention Defect prevention through error removal

Defects prevention through error removal:Defect through error sources can be removed in one or combination of following ways Train and educate the developers

SQE-EIT-061-Pradeep-8

Page 9: EIT-061 Software Quality Engineering

Use of formal methods like formal specification and formal verification.Formal specification is concerned with producing consistent requirements specification, constrains and designs so that it reduces the chances of accidental fault injections. With formal verifications, correctness of software system is proved. Axiomatic correctness is one such method.

Defect prevention based on tools, technologies, process and standards.Most of the company uses object oriented methodology which supports information hiding principle and reduces interface interactions, thus reducing interface or interaction problems. Likewise by following a managed process, ensuring of appropriate process selection and conformance, enforcement of selected product and development standard also prevents defect recurrence to a large extent.

Prevention of defects is possible by analyzing the root causes for the defects.Root cause analysis can take up two forms namely logical analysis and statistical analysis. Logical analysis is a human intensive analysis which requires expert knowledge of product, process, development and environment. It examines logical relation between faults (effects) and errors (causes).

Defect reduction through fault detection and removal

Large companies go for extensive mechanisms to remove as many faults as possible under project constraints. Inspection is direct fault detection and removal technique while testing is observation of failure and fault removal. . Inspections can range from informal reviews to formal inspections. Testing phase can be subdivided as code phase of the product before the shipment and post release phase of the product. It includes all kinds of testing from unit testing to beta testing.

Defect containment through failure prevention

In this defect preventive approach, causal relationship between faults and resulting failures are broken and there by preventing defects, but allowing faults to reside. Techniques like recovery blocks, n-version programming, safety assurance and failure containment are used. With the use of recovery blocks, failures are detected but the underlying faults are not removed, even though the off-line activities can be carried out to identify and remove the faults in case of repeated failures. N- version programming is most applicable when timely decisions or performance is critical such as in many real time control systems. Faults in different versions are independent, which implies that it is rare to have the same fault triggered by the same input and cause the same failure among different versions. For some safety critical system, the aim is to prevent accidents where an accident is a failure with severe consequence. In addition to above said quality assurance activities, specific techniques are used based on hazards or logical preconditions for accidents like hazard elimination, hazard reduction, hazard control, damage control.

Benefits of defect prevention

The existences of defect prevention strategies not only reflect a high level of test discipline maturity but also represent the most cost beneficial expenditure associated with the entire test effort. Detection of errors in the development life cycle helps to prevent the migration of errors from requirement

SQE-EIT-061-Pradeep-9

Page 10: EIT-061 Software Quality Engineering

specifications to design and from design into code. Thus test strategies can be classified into two different categories i.e. defect prevention technologies and defect detection technologies. Defect prevention provides the greatest cost and schedule savings over the duration of the application development efforts. Thus it significantly reduces the number of defects, brings down the cost for rework, makes it easier to maintain, port and reuse. It also makes the system reliable, offers reduced time and resources required for the organization to develop high quality systems. The defects can be traced back to the life cycle stage in which they were injected based on which the preventive measures are identified which in turn increases productivity. A defect preventive measure is a mechanism for propagating the knowledge of lessons learned between projects.

Limitations There is a need to develop and apply software in new and diverse domains where specific domain knowledge is lacking. In several occasions appropriate quality requirements might not be specified at first place. The conduction of inspections is labor intensive and requires high skills. Sometimes full-blown quality measurements may not have been identified at design time.

SOFTWARE REVIEW

Software reviews are a filter to the software engineering process. Reviews are applied at various points during software development and serve to uncover defects that can be removed.

A software review is a way of using a group of people to:- point out needed improvements in the product of a single person or team- confirm those parts of the product in which improvement is not desired- achieve technical work of more uniform quality than can be achieved without reviews, in order to make technical work more manageable

Notes::: Read Types of Review from Faculty Class Notes.

SOFTWARE REVIEWS, INSPECTIONS AND WALKTHROUGHS

A quarter-century ago, Michael Fagan of IBM developed the software inspection technique, a method for finding defects through manual examination of software work products by a group of the author's peers. Many organizations have achieved dramatic results from inspections, including IBM, Raytheon, Motorola, Hewlett Packard, and Bull HN. However, other organizations have had difficulty getting any kind of software review process going. Considering that effective technical reviews are one of the most powerful software quality practices available, all software groups should become skilled in their application.

Software Inspection. “A formal evaluation technique in which software requirements, design, or code are examined in detail by person or group other than the author to detect faults, violations of development standards, and other problems” [IEEE94]. A quality improvement process for written material that consists of two dominant components:

SQE-EIT-061-Pradeep-10

Page 11: EIT-061 Software Quality Engineering

Product (document) improvement and process improvement (document production and inspection). Software Inspection is the most formal, commonly used form of peer review. The key feature of an inspection is the use of the checklists to facilitate error detection and defined roles for participants.

Focus of the software inspection is on identifying problems and not on resolving them. Suggested software inspection participant roles: o Moderator is responsible for leading the inspection. o Reader - Leads the inspection through the logic. o Recorder - documents found during inspection o Reviewers - identifies and describes possible problems and defects

• Author - contribute his understanding.

Software Walkthrough In the most usual form of term, a walkthrough is step-by-step simulation of the execution of a procedure, as when walking through code line by line, with an imagined set of inputs. The term has been extended to the review of material that is not procedural, such as data descriptions, reference manuals, specifications, etc.

THE RULES OF A WALKTHROUGH The rules governing a walkthrough are:

• Provide adequate time

• Use multiple sessions when necessary

• Prepare a set of test cases

• Provide a copy of the program being tested to each team member

• Provide other support materials

Other support materials may be required to efficiently conduct a Walkthrough. These include:

• A list of questions prepared by each team member after reading the program or unit under review

• Flow charts

• Data dictionaries, list of variables, classes etc. The code walkthrough, like the inspection, is a set of procedures and error-detection techniques for group code reading. It shares much in common with the inspection process, but the procedures are slightly different, and a different error-detection technique is employed.

Role of participants:

SQE-EIT-061-Pradeep-11

Page 12: EIT-061 Software Quality Engineering

Like the inspection, the walkthrough is an uninterrupted meeting of one to two hours in duration. The walkthrough team consists of three to five people. One of these people plays a role similar to that of the moderator in the inspection process, another person plays the role of a secretary (a person who records all errors found), and a third person plays the role of a "tester”. Suggestions as to who the three to five people should be vary. Of course, the programmer is one among these people. Suggestions for the other participants include (1) a highly experienced programmer, (2) a programming-language expert, (3) a new programmer (to give a fresh, unbiased outlook), (4) the person who will eventually maintain the program, (5) someone from a different project, and (6) someone from the same programming team as the programmer.

Other Review

Over-the-shoulder reviewsThis is the most common and informal of code reviews. An “over-the-shoulder” review is just that – a developer standing over the author’s workstation while the author walks the reviewer through a set of code changes.

SQE-EIT-061-Pradeep-12

Page 13: EIT-061 Software Quality Engineering

Formal inspectionsa “formal” review refers to a heavy-process review with three to six participants meeting together in one room with print-outs and/or a projector. Someone is the “moderator” or “controller” and acts as the organizer, keeps everyone on task, controls the pace of the review, and acts as arbiter of disputes. Everyone reads through the materials beforehand to properly prepare for the meeting.

SQE-EIT-061-Pradeep-13

Page 14: EIT-061 Software Quality Engineering

E-mail pass-around reviewsE-mail pass-around reviews are the second-most common form of informal code review and the technique preferred by most open-source projects. Here, whole files or changes are packaged up by the author and sent to reviewers via e-mail. Reviewers examine the files, ask questions and discuss with the author and other develop-ers, and suggest changes. The hardest part of the e-mail pass-around is in finding and collecting the files under review. On the author’s end, he has to figure out how to gather the files together. For example, if this is a review of changes being proposed to check into version control, the user has to identify all the files added, deleted, and modified, copy them somewhere, then download the previous versions of those files (so reviewers can see what

SQE-EIT-061-Pradeep-14

Page 15: EIT-061 Software Quality Engineering

was changed), and organize the files so the reviewers know which files should be compared with which others. On the reviewing end, reviewers have to extract those files from the e-mail and generate differences between each.

INTRODUCTION TO INSPECTION AND MEASUREMENT PROCESS

Most of the software quality standards and frameworks, such as ISO 9001/9000-3, the Capability Maturity Model11, ANSI/IEEE Std. 730-1989 and ESA PSS-05-0 1991, require or recommend measurement of software quality. Unfortunately, there is a large gap between the requirements that quality Measurement should be carried out and the guidelines on how to carry out the measurements. For example, the software quality standard ISO 9000-3 (which is the

SQE-EIT-061-Pradeep-15

Page 16: EIT-061 Software Quality Engineering

guideline for the use of ISO 9001 on software production) states in Section 6.4.1 that: “There are currently no universally accepted measures of software quality. ... The supplier of software products should collect and act on quantitative measures of the quality of these software products.”

Here, ISO 9000-3 requires quality measurement and at the same time admits that there are no (universally) accepted quality measures. In order not to have contradicting requirements ISO 9000-3 (and other similar frameworks) may have made one or more of the following assumptions: A1: Although no universal software quality measures exist, there are meaningful quality measures for particular environments. A2: Widely accepted quality measures will occur, when the software quality measurement research becomes more mature. A3: Measurement of software quality indicators (so-called quality factors) can be measured and used to predict or indirectly measure the software quality.

Def. Empirical Relational system: <E,{R1..Rn}>, where E is a set of entities and R1..Rn the set of empirical relations defined on E with respect to a given attribute (for example, the attribute quality). The empirical relational system is a model of the part of the “real world” we are interested in, i.e. a perspective on our knowledge about the phenomenon to be measured. The model should ensure agreement about the empirical relations and enable measurement. For example, to state that program A has more modules than program B we need a model of programs that enable us to agree upon what a module is and how to identify it.

Def. Formal (numerical) Relational system: <N,{S1..Sn}>, where N is a set of numerals or symbols, and S1..Sn the set of numerical relations defined on N.

Def. Measure: M is a measure for <E,{R1..Rn}> with respect to a given attribute iff: 1. M: E → N 2. Ri(e1, e2, ... ek) ⇔ Si(M(e1), M(e2), ... M(ek)), for all i.

Condition 1 says that a measure is a mapping from entities to numbers or symbols. Condition 2, the representation condition, requires equivalence between the empirical and the formal relations.

Def. Admissible transformation: Let M be a measure, E a set of entities, N a set of numerals and F a transformation (mapping) from M(E) to N, i.e. F: M(E) → N. F is an admissible transformation iff F(M(E)) is a measure. In other words, an admissible transformation preserves the equivalence between the empirical relations and the formal relations. The definition of admissible transformations enables a classification of scales. A common classification of scales is the following:

Nominal scale: Admissible transformations are one-to-one mappings. The only empirical relation possible is related to “equality”. Separating programs into “Fortran-programs” and “Cobol-programs” leads to a nominal scale. SQE-EIT-061-Pradeep-16

Page 17: EIT-061 Software Quality Engineering

Ordinal scale: Admissible transformations are strictly increasing functions. The empirical relations possible are related to “equality” and “order”. Assigning the values “high quality”, “medium quality” and “low quality” to software leads to an ordinal scale.

Interval scale: Admissible transformations are of the type F(x) = ax + b, a > 0. The empirical relations possible are related to “equality”, “order” and “difference”. The temperature scale (in degrees Celsius) is an interval scale.

Ratio scale: Admissible transformations are of type F(x) = ax, a > 0. The empirical relations possible are “equality”, “order”, “difference” and “relative difference”. The length of programs measured in lines of code forms a ratio scale.

We believe that software quality should at least be measurable on an ordinal scale. Otherwise, we would not be able to state that software A is “better” than software B with respect to quality, i.e. our intuition of what software quality is would be strongly violated. In measurement theory terminology this means that we should require that the empirical relational system includes an accepted understanding of the relation “higher quality than”.

The appropriateness of statistical and mathematical methods to the measured values are determined by the type of scale, i.e. by the admissible transformations on the scale values. For example, addition of values are meaningful for values on an interval or ratio scale, not on an ordinal or nominal scale. This means, for example, that “units of software quality” must be meaningful to calculate the mean software quality. There are diverging opinions on how rigidly the scale prescriptions should be interpreted.

QUALITY INDICATORS CLASSIFICATION

All quality indicators should be classified within a package that includes analysis tools, metrics tools, measurement strategies and practices. That can help us to accelerate the development process and fine-tuning of performance while implanting international standards compliance to target quality results for any system.

The first quality indicators, i.e. SQA Plan along with V&V, should includes SQA tasks: Its management in relation with development plan and all software modules integration to be planned in every QA segment. That includes overall wellness, its risk assessment and software performance analysis.To ensure effective implementation of quality process management, every project team should have complete “Software Assessment Plan” which is the second quality indicator including all new constraints with change environment, its reusability along with other security plans and its

SQE-EIT-061-Pradeep-17

Page 18: EIT-061 Software Quality Engineering

configuration. Every assessment plan should carry forward all those testing mechanism that observes its changing environment and configuration issues.The third quality indicator, is identified in software products is “Quality Review Mechanism” that support to reduce all review efforts and improve quality control including continues automated process inspection and all those audit reviews.The fourth quality indicators, is all other “Supporting Methods and Techniques” facilitating slicing issues, traceability, consistency, complexity and sensitivity issues analyzed by different analytical tools that can calculate any changes/impact made or observed in the software.

SQE-EIT-061-Pradeep-18

Page 19: EIT-061 Software Quality Engineering

Unit-2Software Quality Metrics

SOFTWARE QUALITY MATRICES

Software metrics can be classified into three categories: product metrics, process metrics, and project metrics. Product metrics describe the characteristics of the product such as size, complexity, design features, performance, and quality level. Process metrics can be used to improve software development and maintenance. Examples include the effectiveness of defect removal during

development, the pattern of testing defect arrival, and the response time of the fix process. Project metrics describe the project characteristics and execution. Examples include the number of software developers, the staffing pattern over the

life cycle of the software, cost, schedule, and productivity. Some metrics belong to multiple categories. For example, the in- process quality metrics of a project are both process metrics and project metrics.

Software quality metrics are a subset of software metrics that focus on the quality aspects of the product, process, and project. In general, software quality metrics are more closely associated with process and product metrics than with project metrics. Nonetheless, the project parameters such as the number of developers and their skill levels, the schedule, the size, and the organization structure certainly affect the quality of the product. Software quality metrics can be divided further into end-product quality metrics and in-process quality metrics. The essence of software quality engineering is to investigate the relationships among in-process metrics, project characteristics, and end-product quality, and, based on the findings, to engineer improvements in both process and product quality. Moreover, we should view quality from the entire software life-cycle perspective and, in this regard, we should include metrics that measure the quality level of the maintenance process as another category of software quality metrics.

PRODUCT QUALITY METRICS

The de facto definition of software quality consists of two levels: intrinsic product quality and customer satisfaction. The metrics we discuss here cover both levels:

• Mean time to failure

• Defect density

• Customer problems

• Customer satisfaction.

Intrinsic product quality is usually measured by the number of “bugs” (functional defects) in the software or by how long the software can run before encountering a “crash.” In operational definitions, the two metrics are defect density (rate) and mean time to failure (MTTF). The MTTF metric is most often used with safety critical systems such as the airline traffic control systems, avionics, and weapons. For instance, the U.S. government mandates that its air traffic control system cannot be unavailable for more than three seconds per year. In civilian airliners, the probability of certain catastrophic failures must be no worse than 10 −9 per hour (Littlewoods and Strigini, 1992). The defect density metric, in contrast, is used in many commercial software systems. The two metrics are correlated but are different enough to merit close attention.

SQE-EIT-061-Pradeep-19

Page 20: EIT-061 Software Quality Engineering

First, one measures the time between failures, the other measures the defects relative to the software size (lines of code, function points, etc.). Second, although it is difficult to separate defects and failures in actual measurements and data tracking, failures and defects (or faults) have different meanings. According to the IEEE/ American National Standards Institute (ANSI) standard (982.2):

• An error is a human mistake that results in incorrect software.

• The resulting fault is an accidental condition that causes a unit of the system to fail to function as required.

• A defect is an anomaly in a product.

• A failure occurs when a functional unit of a software-related system can no longer perform its required function or cannot perform it within specifiedlimits.

From these definitions, the difference between a fault and a defect is unclear. For practical purposes, there is no difference between the two terms. Indeed, in many development organizations the two terms are used synonymously. We also use the two terms interchangeably. Simply put, when an error occurs during the development process, a fault or a defect is injected in the software. In operational mode, failures are caused by faults or defects, or failures are materializations of faults. Sometimes a fault causes more than one failure situation and, on the other hand, some faults do not materialize until the software has been executed for a long time with some particular scenarios. Therefore, defect and failure do not have a one-to-one correspondence.

Third, the defects that cause higher failure rates are usually discovered and removed early. The probability of failure associated with a latent defect is called its size, or “bug size.” For special-purpose software systems such as the air traffic control systems or the space shuttle control systems, the operations profile and scenarios are better defined and, therefore, the time to failure metric is appropriate. For general-purpose computer systems or commercial-use software, for which there is no typical user profile of the software, the MTTF metric is more difficult to implement and may not be representative of all customers.

Fourth, gathering data about time between failures is very expensive. It requires recording the occurrence time of each software failure. It is sometimes quite difficult to record the time for all the failures observed during testing or operation. To be useful, time between failures data also requires a high degree of accuracy. This is perhaps the reason the MTTF metric is not widely used by commercial developers.

Finally, the defect rate metric (or the volume of defects) has another appeal to commercial software development organizations. The defect rate of a product or the expected number of defects over a certain time period is important for cost and resource estimates of the maintenance phase of the software life cycle.Regardless of their differences and similarities, MTTF and defect density are the two key metrics for intrinsic product quality. Accordingly, there are two main types of software reliability growth models—the time between failures models and the defect count (defect rate) models..

The Defect Density Metric

Although seemingly straightforward, comparing the defect rates of software products involves many issues. In this section we try to articulate the major points. To define a rate, we first have to operationalize the numerator and the denominator, and specify the time frame. The general concept of defect rate is the number of defects over the opportunities for error (OFE) during a specific time frame. We have just discussed the definitions of software defect and failure. Because failures are defects materialized, we can use the number of unique causes of observed failures to approximate the number of defects in the

SQE-EIT-061-Pradeep-20

Page 21: EIT-061 Software Quality Engineering

software. The denominator is the size of the software, usually expressed in thousand lines of code (KLOC) or in the number of function points. In terms of time frames, various operational definitions are used for the life of product (LOP), ranging from one year to many years after the software product’s release to the general market. In our experience with operating systems, usually more than 95% of the defects are found within four years of the software’s release. For application software, most defects are normally found within two years of its release.

Lines of Code

The lines of code (LOC) metric is anything but simple. The major problem comes from the ambiguity of the operational definition, the actual counting. In the early days of Assembler programming, in which one physical line was the same as one instruction, the LOC definition was clear. With the availability of high-level languages the one-to-one correspondence broke down. Differences between physical lines and instruction statements (or logical lines of code) and differences among languages contribute to the huge variations in counting LOCs. Even within the same language, the methods and algorithms used by different counting tools can cause significant differences in the final counts. Jones (1986) describes several variations:

Count only executable lines. Count executable lines plus data definitions. Count executable lines, data definitions, and comments. Count executable lines, data definitions, comments, and job control language. Count lines as physical lines on an input screen. Count lines as terminated by logical delimiters.

To illustrate the variations in LOC count practices, let us look at a few examples by authors of software metrics. In Boehm’s well-known book Software Engineering Economics (1981), the LOC counting method counts lines as physical lines and includes executable lines, data definitions, and comments.

LOC is defined as follows:

A line of code is any line of program text that is not a comment or blank line, regardless of the number of statements or fragments of statements on the line.This specifically includes all lines containing program headers, declarations, and executable and non-executable statements. Thus their method is to count physical lines including prologues and data definitions (declarations) but not comments.

In Programming Productivity by Jones (1986), the source instruction (or logical lines of code) method is used. The method used by IBM Rochester is also to count source instructions including executable lines and data definitions but excluding comments and program prologues. The resultant differences in program size between counting physical lines and counting instruction statements are difficult to assess. It is not even known which method will result in a larger number. In some languages such as BASIC, PASCAL, and C, several instruction statements can be entered on one physical line. On the other hand, instruction statements and data declarations might span several physical lines, especially when the programming style aims for easy maintenance, which is not necessarily done by the original code owner. Languages that have a fixed column format such as FORTRAN may have the physical-lines-to-source-instructions ratio closest to one.

SQE-EIT-061-Pradeep-21

Page 22: EIT-061 Software Quality Engineering

According to Jones (1992), the difference between counts of physical lines and counts including instruction statements can be as large as 500%; and the average difference is about 200%, with logical statements outnumbering physical lines. In contrast, for COBOL the difference is about 200% in the opposite direction, with physical lines outnumbering instruction statements. There are strengths and weaknesses of physical LOC and logical LOC (Jones, 2000). In general, logical statements are a somewhat more rational choice for quality data. When any data on size of program products and their quality are presented, the method for LOC counting should be described. At the minimum, in any publication of quality when LOC data is involved, the author should state whether the LOC counting method is based on physical LOC or logical LOC.

Note: The LOC discussions in this section are in the context of defect rate calculation. For productivity studies, the problems with using LOC are more severe. A basic problem is that the amount of LOC in a software program is negatively correlated with design efficiency. The purpose of software is to provide certain functionality for solving some specific problems or to perform certain tasks. Efficient design provides the functionality with lower implementation effort and fewer LOCs. Therefore, using LOC data to measure software productivity is like using the weight of an airplane to measure its speed and capability.

When a software product is released to the market for the first time, and when a certain LOC count method is specified, it is relatively easy to state its quality level (projected or actual). For example, statements such as the following can be made: “This product has a total of 50 KLOC; the latent defect rate for this product during the next four years is 2.0 defects per KLOC.” However, when enhancements are made and subsequent versions of the product are released, the situation becomes more complicated. One needs to measure the quality of the entire product as well as the portion of the product that is new. The latter is the measurement of true development quality—the defect rate of the new and changed code. Although the defect rate for the entire product will improve from release to release due to aging, the defect rate of the new and changed code will not improve unless there is real improvement in the development process. To calculate defect rate for the new and changed code, the following must be available:

• LOC count: The entire software product as well as the new and changed code of the release must be available.

• Defect tracking: Defects must be tracked to the release origin—the portion of the code that contains the defects and at what release the portion was added, changed, or enhanced. When calculating the defect rate of the entire product, all defects are used; when calculating the defect rate for the new and changed code, only defects of the release origin of the new and changed code are included.

These tasks are enabled by the practice of change flagging. Specifically, when a new function is added or an enhancement is made to an existing function, the new and changed lines of code are flagged with a specific identification (ID) number through the use of comments. The ID is linked to the requirements number, which is usually described briefly in the module’s prologue. Therefore, any changes in the program modules can be linked to a certain requirement. This linkage procedure is part of the software configuration management mechanism and is usually practiced by organizations that have an established process. If the change-flagging IDs and requirements

PRODUCT QUALITY METRICS

IDs are further linked to the release number of the product, the LOC counting tools can use the linkages to count the new and changed code in new releases. The change flagging practice is also important to the developers who deal with problem determination and maintenance. When a defect is reported and the

SQE-EIT-061-Pradeep-22

Page 23: EIT-061 Software Quality Engineering

fault zone determined, the developer can determine in which function or enhancement pertaining to what requirements at what release origin the defect was injected.

The new and changed LOC counts can also be obtained via the delta-library method. By comparing program modules in the original library with the new versions in the current release library, the LOC count tools can determine the amount of new and changed code for the new release. This method does not involve the change-flagging method. However, change flagging remains very important for maintenance. In many software development environments, tools for automatic change flagging are also available. Example: Lines of Code Defect Rates At IBM Rochester, lines of code data is based on instruction statements (logical LOC) and includes executable code and data definitions but excludes comments.

LOC counts are obtained for the total product and for the new and changed code of the new release. Because the LOC count is based on source instructions, the two size metrics are called shipped source instructions (SSI) and new and changed source instructions (CSI), respectively. The relationship between the SSI count and the CSI count can be expressed with the following formula:

SSI (current release) = SSI (previous release) + CSI (new and changed code instructions for current release) − deleted code (usually very small) − changed code (to avoid double count in both SSI and CSI)

Defects after the release of the product are tracked. Defects can be field defects, which are found by customers, or internal defects, which are found internally. The several post release defect rate metrics per thousand SSI (KSSI) or per thousand CSI (KCSI) are:

(1) Total defects per KSSI (a measure of code quality of the total product)(2) Field defects per KSSI (a measure of defect rate in the field)(3) Release-origin defects (field and internal) per KCSI (a measure of development quality)(4) Release-origin field defects per KCSI (a measure of development quality per defects found by customers)Metric (1) measures the total release code quality, and metric (3) measures the quality of the new and changed code. For the initial release where the entire product is new, the two metrics are the same. Thereafter, metric (1) is affected by aging and the improvement (or deterioration) of metric (3). Metrics (1) and (3) are process measures; their field counterparts, metrics (2) and (4) represent the customer’s perspective. Given an estimated defect rate (KCSI or KSSI), software developers can minimize the impact to customers by finding and fixing the defects before customers encounter them.

CUSTOMER’S PERSPECTIVEThe defect rate metrics measure code quality per unit. It is useful to drive quality improvement from the development team’s point of view. Good practice in software quality engineering, however, also needs to consider the customer’s perspective. Assume that we are to set the defect rate goal for release-to-release improvement of one product. From the customer’s point of view, the defect rate is not as relevant as the total number of defects that might affect their business. Therefore, a good defect rate target should lead to a release-to-release reduction in the total number of defects, regardless of size. If a new release is larger than its predecessors, it means the defect rate goal for the new and changed code has to be significantly better than that of the previous release in order to reduce the total number of defects.

SQE-EIT-061-Pradeep-23

Page 24: EIT-061 Software Quality Engineering

Consider the following hypothetical example:

Initial Release of Product Y KCSI = KSSI = 50 KLOCDefects/KCSI = 2.0Total number of defects = 2.0 × 50 = 100

Second ReleaseKCSI = 20KSSI = 50 + 20 (new and changed lines of code) − 4 (assuming 20% are changed lines of code ) = 66Defect/KCSI = 1.8 (assuming 10% improvement over the first release)Total number of additional defects = 1.8 × 20 = 36

Third ReleaseKCSI = 30KSSI = 66 + 30 (new and changed lines of code) − 6 (assuming the same % (20%) of changed lines of code) = 90Targeted number of additional defects (no more than previous release) = 36Defect rate target for the new and changed lines of code: 36/30 = 1.2 defects/KCSI or lower

From the initial release to the second release the defect rate improved by 10%. However, customers experienced a 64% reduction [(100 − 36)/100] in the number of defects because the second release is smaller. The size factor works against the third release because it is much larger than the second release. Its defect rate has to be one third (1.2/1.8) better than that of the second release for the number of new defects not to exceed that of the second release. Of course, sometimes the difference between the two defect rate targets is very large and the new defect rate target is deemed not achievable. In those situations, other actions should be planned to improve the quality of the base code or to reduce the volume of post release field defects (i.e., by finding them internally).

Function Points

Counting lines of code is but one way to measure size. Another one is the function point. Both are surrogate indicators of the opportunities for error (OFE) in the defect density metrics. In recent years the function point has been gaining acceptance in application development in terms of both productivity (e.g., function points per person-year) and quality (e.g., defects per function point).

Definition: A function can be defined as a collection of executable statements that performs a certain task, together with declarations of the formal parameters and local variables manipulated by those statements (Conte et al., 1986). The ultimate measure of software productivity is the number of functions a development team can produce given a certain amount of resource, regardless of the size of the software in lines of code.

SQE-EIT-061-Pradeep-24

Page 25: EIT-061 Software Quality Engineering

The defect rate metric, ideally, is indexed to the number of functions software provides. If defects per unit of functions are low, then the software should have better quality even though the defects per KLOC value could be higher—when the functions were implemented by fewer lines of code. However, measuring functions is theoretically promising but realistically very difficult.

The function point metric, originated by Albrecht and his colleagues at IBM in the mid-1970s, however, is something of a misnomer because the technique does not measure functions explicitly (Albrecht, 1979). It does address some of the problems associated with LOC counts in size and productivity measures, especially the differences in LOC counts that result because different levels of languages are used. It is a weighted total of five major components that comprise an application:

Number of external inputs (e.g., transaction types) × 4 Number of external outputs (e.g., report types) × 5 Number of logical internal files (files as the user might conceive them, not physical files) × 10 Number of external interface files (files accessed by the application but not maintained by it) × 7 Number of external inquiries (types of online inquiries supported) × 4

These are the average weighting factors. There are also low and high weighting factors, depending on the complexity assessment of the application in terms of the five components (Kemerer and Porter, 1992; Sprouls, 1990):

External input: low complexity, 3; high complexity, 6 External output: low complexity, 4; high complexity, 7 Logical internal file: low complexity, 7; high complexity, 15 External interface file: low complexity, 5; high complexity, 10 External inquiry: low complexity, 3; high complexity, 6

The complexity classification of each component is based on a set of standards that define complexity in terms of objective guidelines. For instance, for the external output component, if the number of data element types is 20 or more and the number of file types referenced is 2 or more, then complexity is high. If the number of data element types is 5 or fewer and the number of file types referenced is 2 or 3, then complexity is low.With the weighting factors, the first step is to calculate the function counts (FCs) based on the following formula:

where wij are the weighting factors of the five components by complexity level (low, average, high) and xij are the numbers of each component in the application.

The second step involves a scale from 0 to 5 to assess the impact of 14 general system characteristics in terms of their likely effect on the application. The 14 characteristics are:1. Data communications2. Distributed functions

SQE-CS08-Yogesh-25

Page 26: EIT-061 Software Quality Engineering

3. Performance4. Heavily used configuration5. Transaction rate6. Online data entry7. End-user efficiency8. Online update9. Complex processing10. Reusability11. Installation ease12. Operational ease13. Multiple sites14. Facilitation of changeThe scores (ranging from 0 to 5) for these characteristics are then summed, based on the following formula, to arrive at the value adjustment factor (VAF)

Where ci is the score for general system characteristic i. Finally, the number of function points is obtained by multiplying function counts and the value adjustment factor:

FP = FC × VAFThis equation is a simplified description of the calculation of function points.

Example: Function Point Defect RatesIn 2000, based on a large body of empirical studies, Jones published the book Software Assessments, Benchmarks, and Best Practices. All metrics used throughout the book are based on function points. According to his study (1997), the average number of software defects in the U.S. is approximately 5 per function point during the entire software life cycle. This number represents the total number of defects found and measured from early software requirements throughout the life cycle of the software, including the defects reported by users in the field. Jones also estimates the defect removal efficiency of software organizations by level of the capability aturity model (CMM) developed by the Software Engineering Institute (SEI). By applying the defect removal efficiency to the overall defect rate per function point, the following defect rates for the delivered software were estimated. The time frames for these defect rates were not specified, but it appears that these defect rates are for the maintenance life of the software. The estimated defect rates per function point are as follows: SEI CMM Level 1: 0.75 SEI CMM Level 2: 0.44 SEI CMM Level 3: 0.27 SEI CMM Level 4: 0.14 SEI CMM Level 5: 0.05

SQE-EIT-061-Pradeep-26

Page 27: EIT-061 Software Quality Engineering

CUSTOMER PROBLEMS METRIC

Another product quality metric used by major developers in the software industry measures the problems customers encounter when using the product. For the defect rate metric, the numerator is the number of valid defects. However, from the customers’ standpoint, all problems they encounter while using the software product, not just the valid defects, are problems with the software. Problems that are not valid defects may be usability problems, unclear documentation or information, duplicates of valid defects (defects that were reported by other customers and fixes were available but the current customers did not know of them), or even user errors. These so-called non-defect-oriented problems, together with the defect problems, constitute the total problem space of the software from the customers’ perspective.

The problems metric is usually expressed in terms of problems per user month (PUM):PUM = Total problems that customers reported (true defects and non-defect-oriented problems) for a time period ÷ Total number of license-months of the

software during the periodwhereNumber of license-months = Number of install licenses of the software × Number of months in the calculation period

PUM is usually calculated for each month after the software is released to the market, and also for monthly averages by year. Note that the denominator is the number of license-months instead of thousand lines of code or function point, and the numerator is all problems customers encountered. Basically, this metric relates problems to usage. Approaches to achieve a low PUM include:

• Improve the development process and reduce the product defects.

• Reduce the non-defect-oriented problems by improving all aspects of the products (such as usability, documentation), customer education, and support.

• Increase the sale (the number of installed licenses) of the product.

SQE-EIT-061-Pradeep-27

Page 28: EIT-061 Software Quality Engineering

The first two approaches reduce the numerator of the PUM metric, and the third increases the denominator. The result of any of these courses of action will be that the PUM metric has a lower value. All three approaches make good sense for quality improvement and business goals for any organization. The PUM metric, therefore, is a good metric. The only minor drawback is that when the business is in excellent condition and the number of software licenses is rapidly increasing, the PUM metric will look extraordinarily good (low value) and, hence, the need to continue to reduce the number of customers’ problems (the numerator of the metric) may be undermined. Therefore, the total number of customer problems should also be monitored and aggressive year-to-year or release-to-release improvement goals set as the number of installed licenses increases. However, unlike valid code defects, customer problems are not totally under the control of the software development organization. Therefore, it may not be feasible to set a PUM goal that the total customer problems cannot increase from release to release, especially when the sales of the software are increasing.

The key points of the defect rate metric and the customer problems metric are briefly summarized in Table 4.1. The two metrics represent two perspectives of product quality. For each metric the numerator and denominator match each other well:

Defects relate to source instructions or the number of function points, and problems relate to usage of the product. If the numerator and denominator are mixed up, poor metrics will result. Such metrics could be counterproductive to an organization’s quality improvement effort because they will cause confusion and wasted resources.

The customer problems metric can be regarded as an intermediate measurement between defects measurement and customer satisfaction. To reduce customer problems, one has to reduce the functional defects in the products and, in addition, improve other factors (usability, documentation, problem rediscovery, etc.). To improve customer satisfaction, one has to reduce defects and overall problems and, in addition, manage factors of broader scope such as timing and availability of the product, company image, services, total customer solutions, and so forth. From the software quality standpoint, the relationship of the scopes of the three metrics can be represented by the Venn diagram in Figure 4.1.

SQE-EIT-061-Pradeep-28

Page 29: EIT-061 Software Quality Engineering

CUSTOMER SATISFACTION METRICS

Customer satisfaction is often measured by customer survey data via the five-point scale: Very satisfied Satisfied Neutral Dissatisfied Very dissatisfied.Satisfaction with the overall quality of the product and its specific dimensions is usually obtained through various methods of customer surveys. For example, the specific parameters of customer satisfaction in software monitored by IBM include the CUPRIMDSO categories (capability, functionality, usability, performance, reliability, installability, maintainability, documentation/information, service, and overall); for Hewlett-Packard they are FURPS (functionality, usability, reliability, performance, and service).

Based on the five-point-scale data, several metrics with slight variations can be constructed and used, depending on the purpose of analysis. For example:

SQE-EIT-061-Pradeep-29

Page 30: EIT-061 Software Quality Engineering

(1) Percent of completely satisfied customers(2) Percent of satisfied customers (satisfied and completely satisfied) (3) Percent of dissatisfied customers (dissatisfied and completely dissatisfied)(4) Percent of nonsatisfied (neutral, dissatisfied, and completely dissatisfied)

Usually the second metric, percent satisfaction, is used. In practices that focus on reducing the percentage of nonsatisfaction, much like reducing product defects, metric (4) is used. In addition to forming percentages for various satisfaction or dissatisfaction categories, the weighted index approach can be used. For instance, some companies use the net satisfaction index (NSI) to facilitate comparisons across product. The NSI has the following weighting factors: Completely satisfied = 100% Satisfied = 75% Neutral = 50% Dissatisfied = 25% Completely dissatisfied = 0%

NSI ranges from 0% (all customers are completely dissatisfied) to 100% (all customers are completely satisfied). If all customers are satisfied (but not completely satisfied), NSI will have a value of 75%. This weighting approach, however, may be masking the satisfaction profile of one’s customer set. For example, if half of the customers are completely satisfied and half are neutral, NSI’s value is also 75%, which is equivalent to the scenario that all customers are satisfied. If satisfaction is a good indicator of product loyalty, then half completely satisfied and half neutral is certainly less positive than all satisfied. Furthermore, we are not sure of the rationale behind giving a 25% weight to those who are dissatisfied. Therefore, this example of NSI is not a good metric; it is inferior to the simple approach of calculating percentage of specific categories. If the entire satisfaction profile is desired, one can simply show the percent distribution of all categories via a histogram. A weighted index is for data summary when multiple indicators are too cumbersome to be shown. For example, if customers’ purchase decisions can be expressed as a function of their satisfaction with specific dimensions of a product, then a purchase decision index could be useful. In contrast, if simple indicators can do the job, then the weighted index approach should be avoided.

IN-PROCESS QUALITY METRICS

Because our goal is to understand the programming process and to learn to engineer quality into the process, in-process quality metrics play an important role. In-process quality metrics are less formally defined than end-product metrics, and their practices vary greatly among software developers. On the one hand, in-process quality metrics simply means tracking defect arrival during formal machine testing for some organizations. On the other hand, some software organizations with well-established software metrics programs cover various parameters in each phase of the development cycle. In this section we briefly discuss several metrics that are basic to sound in-process quality management. In later chapters on modeling we will examine some of them in greater detail and discuss others within the context of models.

Defect Density During Machine Testing

SQE-EIT-061-Pradeep-30

Page 31: EIT-061 Software Quality Engineering

Defect rate during formal machine testing (testing after code is integrated into the system library) is usually positively correlated with the defect rate in the field. Higher defect rates found during testing is an indicator that the software has experienced higher error injection during its development process, unless the higher testing defect rate is due to an extraordinary testing effort—for example, additional testing or a new testing approach that was deemed more effective in detecting defects. The rationale for the positive correlation is simple: Software defect density never follows the uniform distribution. If a piece of code or a product has higher testing defects, it is a result of more effective testing or it is because of higher latent defects in the code. Myers (1979) discusses a counterintuitive principle that the more defects found during testing, the more defects will be found later. That principle is another expression of the positive correlation between defect rates during testing and in the field or between defect rates between phases of testing.

This simple metric of defects per KLOC or function point, therefore, is a good indicator of quality while the software is still being tested. It is especially useful to monitor subsequent releases of a product in the same development organization. Therefore, release-to-release comparisons are not contaminated by extraneous factors. The development team or the project manager can use the following scenarios to judge the release quality:

If the defect rate during testing is the same or lower than that of the previous release (or a similar product), then ask: Does the testing for the current release deteriorate? o If the answer is no, the quality perspective is positive.

o If the answer is yes, you need to do extra testing (e.g., add test cases to increase coverage, blitz test, customer testing, stress testing, etc.).

If the defect rate during testing is substantially higher than that of the previous release (or a similar product), then ask: Did we plan for and actually improve testing effectiveness?o If the answer is no, the quality perspective is negative. Ironically, the only remedial approach that can be taken at this stage of the life cycle is to

do more testing, which will yield even higher defect rates.o If the answer is yes, then the quality perspective is the same or positive.

Defect Arrival Pattern during Machine Testing

Overall defect density during testing is a summary indicator. The pattern of defect arrivals (or for that matter, times between failures) gives more information. Even with the same overall defect rate during testing, different patterns of defect arrivals indicate different quality levels in the field. Figure 4.2 shows two contrasting patterns for both the defect arrival rate and the cumulative defect rate. Data were plotted from 44 weeks before code-freeze until the week prior to code-freeze. The second pattern, represented by the charts on the right side, obviously indicates that testing started late, the test suite was not sufficient, and that the testing ended prematurely.The objective is always to look for defect arriv als that stabilize at a very low level, or times between failures that are far apart, before ending the testing effort and releasing the software to the field. Such declining patterns of defect arrival during testing are indeed the basic assumption of many software reliability models. The time unit for observing the arrival pattern is usually weeks and occasionally months. For reliability models that require execution time data, the time interval is in units of CPU time. When we talk about the defect arrival pattern during testing, there are actually three slightly different metrics, which should be looked at simultaneously:

SQE-EIT-061-Pradeep-31

Page 32: EIT-061 Software Quality Engineering

The defect arrivals (defects reported) during the testing phase by time interval (e.g., week). These are the raw number of arrivals, not all of which are valid defects.

The pattern of valid defect arrivals—when problem determination is done on the reported problems. This is the true defect pattern. The pattern of defect backlog overtime. This metric is needed because development organizations cannot investigate and fix all reported problems

immediately. This metric is a workload statement as well as a quality statement. If the defect backlog is large at the end of the development cycle and a lot of fixes have yet to be integrated into the system, the stability of the system (hence its quality) will be affected. Retesting (regression test) is needed to ensure that targeted product quality levels are reached.

Phase-Based Defect Removal Pattern

The phase-based defect removal pattern is an extension of the test defect density metric. In addition to testing, it requires the tracking of defects at all phases of the development cycle, including the design reviews, code inspections, and formal verifications before testing. Because a large percentage of programming defects is related to design problems, conducting formal reviews or functional verifications to enhance the defect removal capability of the process at the front end reduces error injection. The pattern of phase-based defect removal reflects the overall defect removal ability of the development process. With regard to the metrics for the design and coding phases, in addition to defect rates, many development organizations use metrics such as inspection coverage and inspection effort for in-process quality management. Some companies even set up “model values” and “control boundaries” for various in-process quality indicators.

Defect Removal Effectiveness

Defect removal effectiveness (or efficiency, as used by some writers) can be defined as follows:

Because the total number of latent defects in the product at any given phase is not known, the denominator of the metric can only be approximated. It is usually estimated by:

The metric can be calculated for the entire development process, for the front end (before code integration), and for each phase. It is called early defect removal and phase effectiveness when used for the front end and for specific phases, respectively. The higher the value of the metric, the more effective the development process and the fewer the defects escape to the next phase or to the field. This metric is a key concept of the defect removal model for software development.

SQE32

Page 33: EIT-061 Software Quality Engineering

METRICS FOR SOFTWARE MAINTENANCE

When development of a software product is complete and it is released to the market, it enters the maintenance phase of its life cycle. During this phase the defect arrivals by time interval and customer problem calls (which may or may not be defects) by time interval are the de facto metrics. However, the number of defect or problem arrivals is largely determined by the development process before the maintenance phase. Not much can be done to alter the quality of the product during this phase. Therefore, these two de facto metrics, although important, do not reflect the quality of software maintenance. What can be done during the maintenance phase is to fix the defects as soon as possible and with excellent fix quality. Such actions, although still not able to improve the defect rate of the product, can improve customer satisfaction to a large extent. The following metrics are therefore very important:

a) Fix backlog and backlog management indexb) Fix response time and fix responsivenessc) Percent delinquent fixesd) Fix quality

Fix Backlog and Backlog Management Index

Fix backlog is a workload statement for software maintenance. It is related to both the rate of defect arrivals and the rate at which fixes for reported problems become available. It is a simple count of reported problems that remain at the end of each month or each week. Using it in the format of a trend chart, this metric can provide meaningful information for managing the maintenance process. Another metric to manage the backlog of open, unresolved, problems is the backlog management index (BMI).

As a ratio of number of closed, or solved, problems to number of problem arrivals during the month, if BMI is larger than 100, it means the backlog is reduced. If BMI is less than 100, then the backlog increased. With enough data points, the techniques of control charting can be used to calculate the backlog management capability of the maintenance process. More investigation and analysis should be triggered when the value of BMI exceeds the control limits. Of course, the goal is always to strive for a BMI larger than 100. A BMI trend chart or control chart should be examined together with trend charts of defect arrivals, defects fixed (closed), and the number of problems in the backlog. Figure 4.5 is a trend chart by month of the numbers of opened and closed problems of a software product, and a pseudo-control chart for the BMI. The latest release of the product was available to customers in the month for the first data points on the two charts. This explains the rise and fall of the problem arrivals and closures. The mean BMI was 102.9%, indicating that the capability of the fix process was functioning normally. All BMI values were within the upper (UCL) and lower (LCL) control limits—the backlog management process was in control. (Note: We call the BMI chart a pseudo-control chart because the BMI data are autocorrelated and therefore the assumption of independence for control charts is violated. Despite not being “real” control charts in statistical terms, however, we found pseudo-control charts such as the BMI chart quite useful in software quality management. In Chapter 5 we provide more discussions and examples.)

SQE-EIT-061-Pradeep-33

Page 34: EIT-061 Software Quality Engineering

A variation of the problem backlog index is the ratio of number of opened problems (problem backlog) to number of problem arrivals during the month. If the index is 1, that means the team maintains a backlog the same as the problem arrival rate. If the index is below 1, that means the team is fixing problems faster than the problem arrival rate. If the index is higher than 1, that means the team is losing ground in their problem-fixing capability relative to problem arrivals. Therefore, this variant index is also a statement of fix responsiveness.

Fix Response Time and Fix Responsiveness

For many software development organizations, guidelines are established on the time limit within which the fixes should be available for the reported defects. Usually the criteria are set in accordance with the severity of the problems. For the critical situations in which the customers’ businesses are at risk due to defects in the software product, software developers or the software change teams work around the clock to fix the problems. For less severe defects for which circumventions are available, the required fix response time is more relaxed. The fix response time metric is usually calculated as follows for all problems as well as by severity level:Mean time of all problems from open to closedIf there are data points with extreme values, medians should be used instead of mean. Such cases could occur for less severe problems for which customers may be satisfied with the circumvention and didn’t demand a fix. Therefore, the problem may remain open for a long time in the tracking report.In general, short fix response time leads to customer satisfaction. However, there is a subtle difference between fix responsiveness and short fix response time. From the customer’s perspective, the use of averages may mask individual differences. The important elements of fix responsiveness are customer expectations, the agreed-to fix time, and the ability to meet one’s commitment to the customer.

Percent Delinquent Fixes

The mean (or median) response time metric is a central tendency measure. A more sensitive metric is the percentage of delinquent fixes. For each fix, if the turnaround time greatly exceeds the required response time, then it is classified as delinquent:

This metric, however, is not a metric for real-time delinquent management because it is for closed problems only. Problems that are still open must be factored into the calculation for a real-time metric. Assuming the time unit is 1 week, we propose that the percent delinquent of problems in the active backlog be used. Active backlog refers to all opened problems for the week, which is the sum of the existing backlog at the beginning of the week and new

34

Page 35: EIT-061 Software Quality Engineering

problem arrivals during the week. In other words, it contains the total number of problems to be processed for the week—the total workload. The number of delinquent problems is checked at the end of the week. Figure below shows the real-time delivery index diagrammatically.

It is important to note that the metric of percent delinquent fixes is a cohort metric. Its denominator refers to a cohort of problems (problems closed in a given period of time, or problems to be processed in a given week). The cohort concept is important because if it is operationalized as a cross-sectional measure, then invalid metrics will result. For example, we have seen practices in which at the end of each week the number of problems in backlog (problems still to be fixed) and the number of delinquent open problems were counted, and the percent delinquent problems was calculated. This cross-sectional counting approach neglects problems that were processed and closed before the end of the week, and will create a high delinquent index when significant improvement (reduction in problems backlog) is made.

Fix Quality

Fix quality or the number of defective fixes is another important quality metric for the maintenance phase. From the customer’s perspective, it is bad enough to encounter functional defects when running a business on the software. It is even worse if the fixes turn out to be defective. A fix is defective if it did not fix the reported problem, or if it fixed the original problem but injected a new defect. For mission-critical software, defective fixes are detrimental to customer satisfaction.

The metric of percent defective fixes is simply the percentage of all fixes in a time interval (e.g., 1 month) that are defective. A defective fix can be recorded in two ways: Record it in the month it was discovered or record it in the month the fix was delivered. The first is a customer measure, the second is a process measure. The difference between the two dates is the latent period of the defective fix. It is meaningful to keep track of the latency data and other information such as the number of customers who were affected by the defective fix. Usually the longer the latency, the more customers are affected because there is more time for customers to apply that defective fix to their software system.

There is an argument against using percentage for defective fixes. If the number of defects, and therefore the fixes, is large, then the small value of the percentage metric will show an optimistic picture, although the number of defective fixes could be quite large. This metric, therefore, should be a straight count of the number of defective fixes. The quality goal for the maintenance process, of course, is zero defective fixes without delinquency.

-35

Page 36: EIT-061 Software Quality Engineering

Unit-3Software Quality Management and Models

MODELLING PROCESS:

To model software reliability, the following process or similar procedures should be used. Examine the data, Study the nature of the data (fault counts versus times between failures), the unit of analysis (CPU hour, calendar day, week, month, etc.), the data tracking system, data reliability, and any relevant aspects of the data. Plot the data points against time in the form of a scatter diagram, analyze the data informally, and gain an insight into the nature of the process being modeled.

1. Select a model or several models to fit the data based on an understanding of the test process, the data, and the assumptions of the models. The plot in step 1 can provide helpful information for model selection.

2. Estimate the parameters of the model. Different methods may be required depending on the nature of the data. The statistical techniques (e.g., the maximum likelihood method, the least-squares method, or some other method) and the software tools available for use should be considered.

3. Obtain the fitted model by substituting the estimates of the parameters into the chosen model. At this stage, you have a specified model for the data set. 4. Perform a goodness-of-fit test and assess the reasonableness of the model. If the model does not fit, a more reasonable model should be selected with regard to model assumptions and the nature of the data. For example, is the lack of fit due to a few data points that were affected by extraneous factors? Is the time unit too granular so that the noise of the data obscures the underlying trend?

4. Make reliability predictions based on the fitted model. Assess the reasonableness of the predictions based on other available information actual performance of a similar product or of a previous release of the same product, subjective assessment by the development team, and so forth.

SOFTWARE RELIABILITY MODELS:

Software reliability models are used to assess a software product's reliability or to estimate the number of latent defects when it is available to the customers. Such an estimate is important for two reasons: (1) as an objective statement of the quality of the product and (2) for resource planning for the software maintenance phase. The criterion variable under study is the number of defects (or defect rate normalized to lines of code or function points) in specified time intervals (weeks, months, etc.), or the time between failures. Reliability models can be broadly classified into two categories: static models and dynamic models (Conte et al., 1986). A static model uses other attributes of the project or program modules to estimate the number of defects in the software. A dynamic model, usually based on statistical distributions, uses the current development efect patterns to estimate end-product reliability. A static model of software quality estimation has the following general form: Y=ƒ(x1,x2.......xk)+e

36

Page 37: EIT-061 Software Quality Engineering

where the dependent variable y is the defect rate or the number of defects, and the independent variablexsi are the attributes of the product, the project, or the process through which the product is developed. They could be size, complexity, skill level, count of decisions, and other meaningful measurements. The error term is e (because models don't completely explain the behavior of the dependent variable). Estimated coefficients of the independent variables in the formula are based on data from previous products. For the current product or project, the values of the independent variables are measured, then plugged into the formula to derive estimates of the dependent variable—the product defect rate or number of defects. Static models are static in the sense that the estimated coefficients of their parameters are based on a number of previous projects. The product or project of interest is treated as an additional observation in the same population of previous projects. In contrast, the parameters of the dynamic models are estimated based on multiple data points gathered to date from the product of interest; therefore, the resulting model is specific to the product for which the projection of reliability is attempted. Observation and experience shows that static models are generally less superior than dynamic models when the unit of analysis is at the product level and the purpose is to estimate product-level reliability. Such modeling is better for hypothesis testing (to show that certain project attributes are related to better quality or reliability) than for estimation of reliability. When the unit of analysis is much more granular, such as at the program module level, the static models can be powerful—not for product-level reliability estimates, but for providing clues to software engineers on how to improve the quality of their design and implementation.

Dynamic software reliability models, in turn, can be classified into two categories: those that model the entire development process and those that model the back-end testing phase. The former is represented by the Rayleigh model. The latter is represented by the exponential model and other reliability growth models. A common denominator of dynamic models is that they are expressed as a function of time in development or its logical equivalent (such as development phase).

THE RAYLEIGH MODEL:

The Rayleigh model is a parametric model in the sense that it is based on a specific statistical distribution. When the parameters of the statistical distribution are estimated based on the data from a software project, projections about the defect rate of the project can be made based on the model. The Rayleigh model is implemented in several software products for quality assessment.

The Rayleigh model is a member of the family of the Weibull distribution. The Weibull distribution has been used for decades in various fields of engineering for reliability analysis, ranging from the fatigue life of deep-groove ball bearings to electron tube failures and the overflow incidence of rivers. It is one of the three known extreme-value distributions (Tobias, 1986). One of its marked characteristics is that the tail of its robability density function approaches zero asymptotically, but never reaches it. Its cumulative distribution function (CDF) and probability density function (PDF) are:

SQE-EIT-061-Pradeep-37

Page 38: EIT-061 Software Quality Engineering

where m is the shape parameter, c is the scale parameter, and t is time. When applied to software, the PDF often means the defect density (rate) over time or the defect arrival pattern and the CDF means the cumulative defect arrival pattern. Figure 1 shows several Weibull probability density curves with varying values for the shape parametemr . For reliability applications in an engineering field, the choice of a specific model is not arbitrary. The underlying assumptions must be considered and the model must be supported by empirical data. Of the Weibull family, the two models that have been applied in software reliability are the models with the shape parameter value m = 2 and m = 1.

The Rayleigh model is a special case of the Weibull distribution when m = 2. Its CDF and PDF are:

38

Page 39: EIT-061 Software Quality Engineering

The Rayleigh PDF first increases to a peak and then decreases at a decelerating rate. The c parameter is a function of tm, the time at which the curve reaches its peak. By taking the derivative of f(t) with respect to t, setting it to zero and solving the equation, tm can be obtained.

tm=c/root 2After tm is estimated, the shape of the entire curve can be determined. The area below the curve up to tm is 39.35% of the total area. The preceding formulas represent a standard distribution; specifically the total area under the PDF curve is 1. In actual applications, a constant K is multiplied to the formulas (K is the total number of defects or the total cumulative defect rate). If we also substitutec=tm (root2)in the formula, we get

Basic Assumptions

The first assumption is that the defect rate observed during the development process is positively correlated with the defect rate in the field, as illustrated in Figure 2. In other words, the higher the curve (more area under it), the higher the field defect rate (the GA phase in the figure), and vice versa. This is related to the concept of error injection.

The second assumption is that given the same error injection rate, if more defects are discovered and removed earlier, fewer will remain in later stages. As a result, the field quality will be better. This relationship is illustrated in Figure.3

SQE-EIT-061-Pradeep-39

Page 40: EIT-061 Software Quality Engineering

THE EXPONENTIAL MODEL

The exponential model is another special case of the Weibull family, with the shape parameter m equal to 1. It is best used for statistical processes that decline monotonically to an asymptote. Its cumulative distribution function (CDF) and probability density function (PDF) are:

where c is the scale parameter, t is time, and l=1/c. Applied to software reliability, l is referred to as the error detection rate or instantaneous failure rate. In statistical terms it is also called the hazard rate.The exponential distribution is the simplest and most important distribution in reliability and survival studies. In software reliability the exponential distribution is one of the better known models and is often the basis of many other software reliability growth models.

Like the Rayleigh model, the exponential model is simple and quick to implement when powerful statistical software is available. Besides programming, the following should be taken into consideration when applying the exponential distribution for reliability projection or estimating the number of software defects.

SQE-EIT-061-Pradeep-40

Page 41: EIT-061 Software Quality Engineering

To verify the assumption, indicators of the testing effort, such as the person-hours in testing for each time unit (e.g., day or week), test cases run, or the number of variations executed, are needed. If the testing effort is clearly not homogeneous, some sort of normalization has to be made. Otherwise, models other than the exponential distribution should be considered.

RELIABILITY GROWTH MODELS

Software reliability growth models can be classified into two major classes, depending on the dependent variable of the model. For the time between failures models, the variable under study is the time between failures. This is the earliest class of models proposed for software reliability assessment.

For the fault count models the variable criterion is the number of faults or failures (or normalized rate) in a specified time interval. The time can be CPU execution time or calendar time such as hour, week, or month. As defects are detected and removed from the software, it is expected that the observed number of failures per unit time will decrease. The number of remaining defects or failures is the key parameter to be estimated from this class of models.

We first summarize three time between failures models, followed by three fault count models. 1.Jelinski-Moranda Model 2.Littlewood Models 3.Goel-Okumoto Imperfect Debugging Model

Jelinski-Moranda Model

The Jelinski-Moranda (J-M) model is one of the earliest models in software reliability research (Jelinski and Moranda, 1972). It is a time between failures model. It assumes N software faults at the start of testing, failures occur purely at random, and all faults contribute equally to cause a failure during testing. It also assumes the fix time is negligible and that the fix for each failure is perfect. Therefore, the software product's failure rate improves by the same amount at each fix. The hazard function (the instantaneous failure rate function) at time ti, the time between the (i - 1)st and ith failures, is given

Z(tm) = ϕ[N-i-1]where N is the number of software defects at the beginning of testing and is a pro-portionality constant. Note that the hazard function is

constant between failures but decreases in steps of following the removal of each fault. Therefore, as each fault is removed, the time between failures is expected to be longer.

SQE-EIT-061-Pradeep-41

Page 42: EIT-061 Software Quality Engineering

Littlewood Models

The Littlewood (LW) model is similar to the J-M model, except it assumes that different faults have different sizes, thereby contributing unequally to failures (Littlewood, 1981). Larger-sized faults tend to be detected and fixed earlier. As the number of errors is driven down with the progress in test, so is the average error size, causing a law of diminishing return in debugging. The introduction of the error size concept makes the model assumption more realistic. In real-life software operation, the assumption of equal failure rate by all faults can hardly be met, if at all.

Latent defects that reside in code paths that rarely get executed by customers' operational profiles may not be manifested for years. Littlewood also developed several other models such as the Littlewood non-homogeneous Poisson process (LNHPP) model (Miller, 1986). The LNHPP model is similar to the LW model except that it assumes a continuous change in instantaneous failure rate rather than discrete drops when fixes take place.

Goel-Okumoto Imperfect Debugging Model

The J-M model assumes that the fix time is negligible and that the fix for each failure is perfect. In other words, it assumes perfect debugging. In practice, this is not always the case. In the process of fixing a defect, new defects may be injected. Indeed, defect fix activities are known to be error-prone. During the testing stages, the percentage of defective fixes in large commercial software development organizations may range from 1% or 2% to more than 10%. Goel and Okumoto (1978) proposed an imperfect debugging model to overcome the limitation of the assumption. In this model the hazard function during the interval between the (i - 1)st and the ith failures is given

Z(tm)=[N-p(i-1)]λwhere N is the number of faults at the start of testing, is the probability of imperfect debugging, andl is the failure rate per fault.

SOFTWARE RELIABILITY ALLOCATION MODELS

In any non trivial software system the reliability of the software cannot be determined exactly. Instead, we must apply statistical methods to create an estimate based on a sample of test cases. Our goal is, given a fixed total number of test cases, to determine how to allocate these test cases among the partitions of the software so as to minimize the variance incurred by the maximum likelihood estimator of the overall software reliability. In contrast to fixed sampling models, where the proportion of test cases taken from each partition is determined before reliability testing begins, we make allocation decisions dynamically throughout the testing process. We compare the results from the dynamic test allocation model with the optimal sampling model and demonstrate its strength with respect to the variance incurred when the overall system reliability is estimated by its maximum likelihood estimator both theoretically and through Monte Carlo simulations.

-42

Page 43: EIT-061 Software Quality Engineering

CRITERIA FOR MODEL EVALUATION

For reliability models, in 1984 a group of experts (Iannino et al., 1984) devised a set of criteria for model assessment and comparison. The criteria are listed as follows, by order of importance as determined by the group:

Predictive validity: The capability of the model to predict failure behavior or the number of defects for a specified time period based on the current data in the model.

Capability: The ability of the model to estimate with satisfactory accuracy quantities needed by software managers, engineers, and users in planning and managing software development projects or controlling change in operational software systems.

Quality of assumptions: The likelihood that the model assumptions can be met, and the assumptions' plausibility from the viewpoint of logical consistency and software engineering experience.

Applicability: The model's degree of applicability across different software products (size, structure, functions, etc.). Simplicity: A model should be simple in three aspects: (1) simple and inexpensive to collect data, (2) simple in concept and does not

require extensive mathematical background for software development practitioners to comprehend, and (3) readily implemented by computer programs.

SOFTWARE QUALITY ASSESSMENT MODELS: HIERARCHICAL MODEL OF SOFTWARE QUALITY ASSESSMENT.

The paper describes an improved hierarchical model for the assessment of high-level design quality attributes in object-oriented designs. In this model, structural and behavioral design properties of classes, objects, and their relationships are evaluated using a suite of object-oriented design metrics. This model relates design properties such as encapsulation, modularity, coupling, and cohesion to high-level quality attributes such as reusability, flexibility, and complexity using empirical and anecdotal information. The relationship or links from design properties to quality attributes are weighted in accordance with their influence and importance.

The model is validated by using empirical and expert opinion to compare with the model results on several large commercial object-oriented systems. A key attribute of the model is that it can be easily modified to include different relationships and weights, thus providing a practical quality assessment tool adaptable to a variety of demands

To compare quality in different situations, both qualitatively and quantitatively, it is necessary to establish a model of quality. Many model suggested for quality. Most are hierarchical in nature. A quantitative assessment is generally made, along with a more quantified assessment.

43

Page 44: EIT-061 Software Quality Engineering

Two principal models of this type, one by Boehm (1978) and one by McCall in 1977. A hierarchical model of software quality is based upon a set of quality criteria, each of which has a set of measures or metrics associated with it. The issues relating to the criteria of quality are: What criteria of quality should be employed? • How do they inter-relate? • How may the associated metrics be combined into a meaningful overall measure of Quality?

THE HIERARCHICAL MODELS OF BOEHM AND MCCALL McCall’s Quality Model (1977) One of the more renown predecessors of today’s quality models is the quality model presented by Jim McCall et al. [9-11] (also known as the General Electrics Model of 1977). This model, as well as other contemporary models, originates from the US military (it was developed for the US Air Force, promoted within DoD) and is primarily aimed towards the system developers and the system development process. It his quality model McCall attempts to bridge the gap between users and developers by focusing on a number of software quality factor that reflect both the users’ views and the developers’ priorities. The McCall quality model has three major perspectives for defining and identifying the quality of a software product: product revision (ability to undergo changes), product transition (adaptability to new environments) and product operations (its operation characteristics).

Product revision includes maintainability (the effort required to locate and fix a fault in the program within its operating environment), flexibility (the ease of making changes required by changes in the operating environment) and testability (the ease of testing the program, to ensure that it is error-free and meets its specification). Product transition is all about portability (the effort required to transfer a program from one environment to another), reusability (the ease of reusing software in a different context) and interoperability (the effort required to couple the system to another system). Quality of product operations depends on correctness (the extent to which a program fulfils its specification), reliability (the systems ability not to fail), efficiency (further categorized into execution efficiency and storage efficiency and generally meaning the use of resources, e.g. processor time, storage), integrity (the protection of the program from unauthorized access) and usability (the ease of the software). The model furthermore details the three types of quality characteristics (major perspectives) in a hierarchy of factors, criteria and metrics: • 11 Factors (To specify): They describe the external view of the software, as viewed by the users. • 23 quality criteria (To build): They describe the internal view of the software, as seen by the developer. • Metrics (To control): They are defined and used to provide a scale and method for measurement. Boehm’s Quality Model (1978) The second of the basic and founding predecessors of today’s quality models is the quality model presented by Barry W. Boehm [12;13]. Boehm addresses the contemporary shortcomings of models that automatically and quantitatively evaluate the quality of software. In essence his models attempts to qualitatively define software quality by a given set of attributes and metrics. Boehm's model is similar to the McCall Quality Model in that it also presents a hierarchical quality model structured around high-level characteristics, intermediate level characteristics, primitive characteristics - each of which contributes to the overall quality level. The high-level characteristics represent basic high-level requirements of actual use to which evaluation of software quality could be put – the general utility of software. The high-level characteristics address three main questions that a buyer of software has:

-44

Page 45: EIT-061 Software Quality Engineering

• As-is utility: How well (easily, reliably, efficiently) can I use it as-is? • Maintainability: How easy is it to understand, modify and retest? • Portability: Can I still use it if I change my environment?

The intermediate level characteristic represents Boehm’s 7 quality factors that together represent the qualities expected from a software system: • Portability (General utility characteristics): Code possesses the characteristic portability to the extent that it can be operated easily and well on computer configurations other than its current one. • Reliability (As-is utility characteristics): Code possesses the characteristic reliability to the extent that it can be expected to perform its intended functions satisfactorily. • Efficiency (As-is utility characteristics): Code possesses the characteristic efficiency to the extent that it fulfils its purpose without waste of resources. • Usability (As-is utility characteristics, Human Engineering): Code possesses the characteristic usability to the extent that it is reliable, efficient and human-engineered. • Testability (Maintainability characteristics): Code possesses the characteristic testability to the extent that it facilitates the establishment of verification criteria and supports evaluation of its performance. • Understandability (Maintainability characteristics): Code possesses the characteristic understandability to the extent that its purpose is clear to the inspector. • Flexibility (Maintainability characteristics, Modifiability): Code possesses the characteristic modifiability to the extent that it facilitates the incorporation of changes, once the nature of the desired change has been determined.

45

Page 46: EIT-061 Software Quality Engineering

Unit-4Software Quality Assurance

INTRODUCTIONSoftware Quality Assurance (SQA) consists of a means of monitoring the software engineering processes and methods used to ensure quality. It does this by means of audits of the quality management system under which the software system is created.It is distinct from software quality control which includes reviewing requirements documents, and software testing. SQA encompasses the entire software development process, which includes processes such as software design, coding, source code control, code reviews, change management, configuration management, and release management. Whereas software quality control is a control of products, software quality assurance is a control of processes.

Software quality assurance is related to the practice of quality assurance in product manufacturing. There are, however, some notable differences between software and a manufactured product. These differences stem from the fact that the manufactured product is physical and can be seen whereas the software product is not visible. Therefore its function, benefit and costs are not as easily measured. What's more, when a manufactured product rolls off the assembly line, it is essentially a complete, finished product, whereas software is never finished. Software lives, grows, evolves, and metamorphoses, unlike its tangible counterparts. Therefore the processes and methods to manage, monitor, and measure its ongoing quality are as fluid and sometimes elusive as are the defects that they are meant to keep in check.

Software Quality Assurance is often a synonym for testing, but actually involves a great deal more than simply evaluating products produced by development or engineering. Essentially, it is meant to find the faults in the process that lead to low quality so that the source of the problem can be dealt with

QUALITY PLANNING AND CONTROL

Quality Control (QC) is a system of routine technical activities, to measure and control the quality of the inventory as it is being developed. The QC system is designed to:

Provide routine and consistent checks to ensure data integrity, correctness, and completeness; Identify and address errors and omissions; Document and archive inventory material and record all QC activities.

QC activities include general methods such as accuracy checks on data acquisition and calculations and the use of approved standardized procedures for emission calculations, measurements, estimating uncertainties, archiving information and reporting. Higher tier QC activities include technical reviews of source categories, activity and emission factor data, and methods.

Quality Assurance (QA) activities include a planned system of review procedures conducted by personnel not directly involved in the inventory compilation/development process. Reviews, preferably by independent third parties, should be performed upon a finalized inventory following the

46

Page 47: EIT-061 Software Quality Engineering

implementation of QC procedures. Reviews verify that data quality objectives were met, ensure that the inventory represents the best possible estimates of emissions and sinks given the current state of scientific knowledge and data available, and support the effectiveness of the QC programme.

Quality planning refers to the activities that establish the objectives and requirements for quality. An SQA Plan is detailed description of the project and its approach for testing. Going with the standards, an SQA Plan is divided into four sections:

• Software Quality Assurance Plan for Software Requirements;• Software Quality Assurance Plan for Architectural Design;• Software Quality Assurance Plan for Detailed Design and Production and;• Software Quality Assurance Plan for Transfer

In the first phase, the SQA team should write in detail the activities related for software requirements. In this stage, the team will be creating steps and stages on how they will analyze the software requirements. They could refer to additional documents to ensure the plan works out. The second stage of SQA Plan or the SQAP for AD (Architectural Design) the team should analyze in detail the preparation of the development team for detailed build-up. This stage is a rough representation of the program but it still has to go through rigorous scrutiny before it reaches the next stage.The third phase which tackles the quality assurance plan for detailed design and actual product is probably the longest among phases. The SQA team should write in detail the tools and approach they will be using to ensure that the produced application is written according to plan. The team should also start planning on the transfer phase as well.The last stage is the QA plan for transfer of technology to the operations. The SQA team should write their plan on how they will monitor the transfer of technology such as training and support.SQA PrincipalThe following are some of the most powerful principles that can be used for proper execution of software quality assurance:

Feedback – In gist, the faster the feedback the faster the application will move forward. An SQA principle that uses rapid feedback is assured of success. Time will always be the best friend and the most notorious enemy of any developer and it’s up to the SQA team to give the feedback as soon as possible. If they have the ability to get the feedback of their applications as soon as possible, then the chance of developing a better application faster is possible. Focus on Critical Factor – This principle has so many meanings; first it just means that some of the factors of the software being developed are not as critical compared to other. That means SQA should be focused on the more important matters. Secondly, SQA’s measurement should never be universal in the sense that every factor in the application should not have the same treatment. One great example of this is the treatment of the specific functions compared to the skin or color of the interface. Clearly, the function should have more focus compared to a simple skin color.Multiple Objectives – This is partly a challenge as well as risk for the SQA team. At the start of the SQA planning, the team should have more than one objective. If you think about it, it could be very dangerous however it is already a common practice. But what is emphasized here is that each objective should be focused on. As much as possible a matrix should be built by the SQA so that it could track the actual actions that relates to the objective. Evolution – Reaching the objective is really easy but every time something new happens, it should be always noted. Evolution is setting the benchmark in each development. Since the SQA team is able to mark every time something new is done, evolution is monitored. The good thing about this principle is for future use. Whenever a benchmark is not reached, the SQA team should be able to study their previous projects. Evolution should be able to inform and educate the SQA team while working on the project.

47

Page 48: EIT-061 Software Quality Engineering

Quality Control – By the name itself, Quality Control is the pillar for Software Quality Assurance. Everything needs to have quality control – from the start to the finish. With this principle there has to be an emphases on where to start. The biggest and the tightest quality control should be executed as early as possible. For example, when the SQA team receives the SR (software requirements document) the intensity of quality control should be at the start. Of course quality control will still be executed until the end but developers should take into account that anything that starts out real bad could never take off. It’s better to know what’s wrong at first than to find that out later. Motivation – There is not substitute than to have the right people who has the will to do their job at all times. When they have the right mindset the willingness to do it, everything will just go through. Work will definitely be lighter, expertise will be seen and creativity is almost assured when everyone has the drive and passion in their line of work. Quality assurance is a very tedious task and will get the most out of the person if they are not dedicated to their line of work. Process Improvement – Every project of the SQA team should be a learning experience. Of course each project will give us the chance to increase our experience of SQA but there’s more to that. Process improvement fosters the development of the actual treatment of the project. Every project has a unique situation that will give the SQA team a chance to experience something new. This “new” will never be translated to something good if they are not documented for future references. Learning should not only be based on individual experience but also on company’s ability to adapt to the new situation and use it for future references.Persistence – There is no perfect application. The bigger they get, the more error there could be. The SQA team should be very tenacious in looking for concerns in every aspect of the software development process. Even with all the obstacles everyone would just have to live with the fact that every part should be scrutinized without hesitation. Different Effects of SQA – SQA should go beyond software development. A regular SQA will just report for work, look for errors and leave. The SQA team should be role models in business protocols at all times. This way, the SQA does not only foster perfection in the application but also in their way of life. That seemed to be quite off topic but believe me; when people dress and move to success, their work will definitely reflect with it. Result-focused – SQA should not only look at the process but ultimately its effect to the clients and users. The SQA process should always look for results whenever a phase is set. These are the principles that every SQA plan and team should foster. These principles tell encourages dedication towards work and patience not necessarily for perfection but for maximum efficiency.

QUALITY IMPROVEMENT PROCESS

There are many methods for quality improvement. These cover product improvement, process improvement and people based improvement. The following list are methods of quality management and techniques that incorporate and drive quality improvement:

• Continuous Improvement Process: The ongoing enhancement of work processes for the benefit of the customer and the organization; activities devoted to maintaining and improving work process performance through small and gradual improvements as well as radical innovations.

• Control Chart: A line graph that identifies the variation occurring in a work process over time; helps distinguish between common-cause variation and special-cause variation.

• Cost of Quality: A term used by many organizations to quantify the costs associated with producing quality products. Typical factors taken into account are prevention costs (training, work process analyses, design reviews, customer surveys), appraisal costs (inspection and testing), and failure costs (rework, scrap, customer complaints, returns).

48

Page 49: EIT-061 Software Quality Engineering

• Cross Functional: Involving the cooperation of two or more departments within the organization (e.g., Marketing and Product Development).

• Deming Cycle (also known as Shewart's Wheel): A model that describes the cyclical interaction of research, sales, design, and production as a continuous work flow, so that all functions are involved constantly in the effort to provide products and services that satisfy customers and contribute to improved quality.

• Department Improvement Team: Made up of all members of a department and usually chaired by the manager or supervisor, department improvement teams function as a vehicle for all employees to continuously participate in ongoing quality improvement activities.

• Executive Steering Committee (or Executive Improvement Team): Includes top executives and is chaired by the CEO; encourages and participates in a quality initiative by reviewing, approving, and implementing improvement activities.

• Fitness-For-Use: Juran's definition of quality suggesting that products and services need to serve customers' needs, instead of meeting internal requirements only.

• Improving Steering Council (also known as Quality Steering Committee): A group of people with representation from all functions in the organization, usually drawn from management levels, chartered to develop and monitor a quality improvement process in their own functions. This group is often responsible for deciding which improvement projects or work processes will be addressed and in what priority.

• Juran Trilogy: The interrelationship of three basic managerial processes with which to manage quality, quality control, and quality improvement.

• Just-In-Time (JIT): A method of production and inventory cost control based on delivery of parts and supplies at the precise time they are needed in a production process.

• PDCA Cycle: An adaptation of the Deming Cycle, which stresses that every improvement activity, can best be accomplished by the following steps: plan, do, check, etc. (See Deming Cycle.)

• Process Improvement Team: Includes experienced employees from different departments who solve problems and improve work processes that go across-functional lines. (Also known as Service Improvement Team, Quality Improvement Team, or Corrective Action Team.)

• Task Force: An ad hoc, cross-functional team formed to resolve a major problem as quickly as possible; usually includes subject matter experts temporarily relieved of their regular duties.

• Total Quality Control (TQM): A management approach advocating the involvement of all employees in the continuous improvement process-not-just quality control specialists.

• Zero Defects: An approach to quality based on prevention of errors; often adopted as a standard for performance or a definition of quality

SQA ACTIVITIESFollowing are the SQA Activities:

1. Metric calculation2. Monitoring and improving process3. Prepare an SQA plan for a project4. Participates in the developed of the project's software process description5. Audit designated software work.6. Ensure that deviation in software work7. Records any non compliance and reports to senior management.

49

Page 50: EIT-061 Software Quality Engineering

8. Formal technical reviews- includes interviews n all9. Performance monitoring10. Simulation11. Feasibility study12. Development testing 13. Documentation review14. Qualification testing15. Installation testing

SQA ISSUES

Identifying SQA IssuesSoftware Quality Assurance is a good practice that every large scale business should employ. IT related businesses have never hesitated to use SQA to ensure that the application they will release for their users or sell to their customers will live up to their expectations. Like other development plans, there are issues why developers and companies do not use SQA.SQA CostThis is the main reason why small scale companies hesitated to use SQA as part of their development plan. SQA is a different entity that works separately from the development team. Hiring another set of developers will mean another set of people to pay for. Developing software takes months to take and if you are a small-scaled company you will definitely have to cut costs.This problem cannot just be answered with more money and resources. Unfortunately, the developers have to face the fact that SQA cannot just be implemented when everyone has to cut costs. Only the SDLC and the testing team could compensate what the SQA is lacking.

SQA Complexity Years ago, an application could be easily built by a single developer and let the whole world enjoy them. Today, a highly efficient application would take

years for a single developer to develop. By the time it could be integrated and implemented, it is already outdated. The point is, software today is very complicated that team after team will be working on developing a single application and will take them months to develop. SQA has already been here for years and there are models that are not as good as it was. To answer this concern the SQA team should carefully use CASE tools. These tools could easily summarize the functions. They ensure consistency in the evaluation of the application. That includes documentation of every stage of development.

SQA Integration This is one of the many concerns of the SQA models today. Because application development has been very rapid, the standardization of the application might not be the same as it was five years ago. The result for this is that although it has been deemed as an application developed with SQA, the type of SQA might not live up according to what is expected. The standardization models might ensure that the application developed ten years ago is good but it might work today. Fortunately, this issue was answered with the development of new standardizations. This ensured that the application was developed according to the plan and today’s need. Today’s standards usually come with set of tools with an understanding that the application is more complex than it was.

50

Page 51: EIT-061 Software Quality Engineering

Demand for Faster Development More than ever, the market for better software is increasing and everyone wants to sell and developing something new. The software has to be developed faster more than every to be ahead of everyone. Complimenting that speed is also the increasing complexity of the application. It is not only the developers that have this problem of rapid software development but also the SQA as well. They have to act really fast and decide whether or not the application is worthy to be released for operation. If that was the case 10 years ago, everyone will have a very hard time. Checking the application used to be manual and if that will be the case today, everyone will definitely have a very hard time approving the application.

Today, there are tools that have great testing capabilities that could work faster like any other manual testers. That does not include the stability the application could offer. Formal verification could be attained using this application since it has been sanctioned by standards to work. The development of the application for formal verification has ease up the demand.

Defective Software After months and moths of development and manual coding, it is very frustrating when your application is disregarded and has to start all over again. SQA has that function to be a little bit perfectionists and anything bad is disregarded. Considering the need to develop an application rapidly, everyone has to complete the application really fast. Perfection and speed is always a requirement for SQA and have been one of the biggest reasons why developers and small scale companies have opted not to use SQA too much. They have limited themselves to code testing and SDLC.

Today, that problem is addressed with the aid of libraries. There are many libraries today that can easily be used to develop the application. The good thing about it is that they do not have to code. The well known libraries can easily be used in an application and in no time the application could be used and tested.

SQA has been tested on these applications and have passed so many times. Although there are no frameworks that have been approved to bypass SQA, they would almost pass the SQA as soon as they are tested. What’s lest is just the documentation and proper commenting in each function.

These are the issues that have been holding back the small scale companies. However, there are now solutions to these concerns that small scale companies does not need to have second doubts in using SQA for developing a good looking and well functioning application. Besides the monetary consideration, every issue that surrounds the SQA has been addressed.One of the biggest reasons why SQA could now be used by any company is automation. CASE tools can easily gauge if the application has enough software or user requirements. Although there are things that the SQA team should work on themselves, automation has helped application to be the best it could be. Testing tools has also developed and it has not checked the codes only but also on stresses. Anything that concerns around SQA has already been address so anyone could easily use this service.

ZERO DEFECTS:Zero-Defect Software Development (ZDSD) is a practice of developing software that is maintained in the highest quality state throughout the entire development process. "Defects" are aspects of the evolving software that would not be suitable for the final product as-is. This broad definition includes bugs as well as unwanted deviations from the desired final outcome.

-51

Page 52: EIT-061 Software Quality Engineering

The basic tenet of ZDSD is this: Maintain your product in what you believe to be a defect-free state throughout the development process. This sounds simple, but it is a rare practice. The most common approach is to delay major testing until the final QA phase of software development, where defects are often discovered for the first time. Most bugs are not detected or fixed until long after their introduction. The longer a defect remains, the harder it is to fix. On large software products, each stage of development that a defect survives will increase the cost of fixing the defect by ten to fifty times. A defect introduced in the design phase can cost hundreds of times more to fix in the testing phase than it would if fixed immediately after its introduction.

By focusing on product quality throughout the development lifecycle, you will actually complete products faster than if you didn't pay attention to quality until the end of the project. The general rule of software quality is counter-intuitive: Improving quality actually reduces development time. This is because you eliminate all the time spent fixing bugs and reworking code, which can account for as much as 50% of development costs on a large project. The typical programmer writes between eight and twenty lines of code a day; the rest of the day is usually spent on debugging. ZDSD shortens schedules by eliminating most debugging time. Extensive studies done at NASA, IBM, and elsewhere have shown that better QA leads to shorter schedules. An IBM study concluded that software projects that make quality a top priority typically have the shortest schedules, the highest productivity, and even the best sales.

Here are the ten basic rules of ZDSD:

1. Test your product every day as you develop it, and fix defects as soon as you find them. Apply the daily build and smoke test. At the end of every day you work on your project, build the current version of your software, and test it for basic functionality. Microsoft enforces this policy religiously, using large teams to build each project on a daily basis. A programmer whose code breaks the build may be called in the middle of the night and must go back to work to fix the problem immediately. For independent game developers working on small projects, this is far easier. At the end of each day, test your program for at least ten minutes. Make a list of anything you would consider a "defect," and resolve to fix all defects before implementing any new features. Once you find a defect, fixing it becomes your number one priority, and you avoid writing any new code until the defect is 100% eliminated.

2. Review your code regularly. When most people think of QA, they think of testing, but testing is actually one of the least cost-effective strategies for finding bugs. The most rigorous testing will typically find less than 60% of all bugs in a program, and there are certain types of bugs that testing will rarely find. Studies conducted at many large software organizations have concluded that code inspections are far more cost-effective than testing. A NASA study found that code reading detected almost twice as many defects per hour as testing. Whenever you've added a few hundred lines of new code to your project, set aside an hour or two to read over your work and look for mistakes. One hour of code review is equivalent to two or more hours of methodical testing. As you gain experience, keep a list of the types of defects you find, and run down your list whenever reviewing new code. To find even more defects, have someone else read your code as well.

3. Rewrite poor-quality modules. When you discover an obscure new bug, do you ever pray, "Oh no! Please don't let it be in that module!" We all have monster modules of legacy code that were written when we weren't such seasoned programmers as we are today. Don't fear them; rewrite them. Often a better approach will only become clear when an inferior solution has already been implemented. This is certainly true for John Carmack, who coded dozens of different approaches when writing the Quake engine before discovering one that satisfied him. Defects will not be distributed evenly across your code. You will typically find that 20% of your routines are responsible for 80% of your errors.

-52

Page 53: EIT-061 Software Quality Engineering

In my programs it is normally the modules that interact with the hardware or with third-party drivers, especially DirectX, that are the most buggy. Raise your standards for those modules that seem to produce a never-ending supply of bugs, and take the time to rewrite them from scratch. You may find that other intermittent bugs disappear completely as a result.

4. Assume full responsibility for every bug. 95% of all software defects are caused by the programmer. Only 1% of defects are hardware errors, and the remaining 4% are caused by the compiler, the OS, or other software. Never dismiss a potential bug; find out the exact cause of any anomaly. When the Mars probe suffered serious software glitches during its mission, it was learned that the same glitch had occurred only once during testing on earth, but the engineers dismissed it as a temporary hardware hiccup. Unless your hardware drinks soda, it does not hiccup.

5. Handle change effectively. You will always think of great new features to add after you have started coding. Carefully consider how each change will impact your pre-existing code. Poor integration of unanticipated features is a major cause of defects.

6. Rewrite all prototyping code from scratch. Sometimes you may quickly prototype a new feature to see if it will be viable. Often this is done by sacrificing code quality in the name of rapid development. If you eventually decide to keep the feature, it is very tempting to simply tack on some basic error checking to the prototyping code. Don't fall into this trap. If you weren't writing the code originally with quality as a priority, scrap the prototyping code, and re-implement the feature from scratch. Rapidly prototyped features that slip into the final product are a major source of bugs because they are not subject to the same quality standards as the rest of the code.

7. Set QA objectives at the beginning of every project. Studies have shown that developers who set reasonable QA goals will usually achieve them. Decide in advance if your product must be fast, small, feature-rich, intuitive, scalable, etc. Then prioritize those objectives. When designing the interface code for an upcoming game, I decided that my top three priorities were to make it beginner-intuitive, fast, and fun, in that order. Consequently, my game's interface isn't as graphically rich as other games, but it is easier to use and faster than any other game of its type. Whenever you have to make a design decision, keep your objectives in mind. If you do not set clear QA goals, then you are doomed to accept the results of random chance.

8. Don't rush debugging work. Fully 50% of all bug fixes are done incorrectly the first time, often introducing new bugs in the process. Never experiment by simply changing "x-1" to "x+1" to see if that will do the trick. Take the time to understand the source of the bug. Long ago when I was a boy scout and had to put out a campfire, the Scoutmaster would sometimes test my thoroughness by asking me to put my hand in the ashes. I learned very quickly how to put out a fire so well that I had complete confidence it was 100% extinguished. When you find a defect, it means your code is on fire. As long as the defect remains, any new code you write will add fuel to that fire. Whenever you find a defect, drop everything to fix it, and don't move on until you are 100% confident that your fix is correct. If you don't take the time to do it right the first time, when will you find the time to do it over?

9. Treat the quality of your code at the same level of importance as the quality of your product . Rate your code on a scale of one to ten for overall quality. The first time I did this, I rated my 30,000-line project as a four. I rewrote the worst of the code until I reached an eight overall.

-53

Page 54: EIT-061 Software Quality Engineering

It was one of the best investments of time I ever made because I was then able to add new features at double my previous rate. The quality of your code is highly indicative of the quality of your product. You may find as I have that your best selling products also receive your highest ratings for code quality.

10. Learn from every bug; each one represents a mistake that you made. Learn why you made each mistake, and see if you can change something about your development practices to eliminate it. Over the years I have adopted many simple coding practices that allow me to avoid common bugs that used to plague me. There are many types of bugs that I now never encounter because my coding style makes it physically impossible for me to introduce them.

Each of these rules represents a simple concept, but their combined benefits are significant. You will achieve higher progress visibility, avoiding the situation of being "99% done" for the last 80% of your development time. Higher quality will make your products easier to maintain and less expensive to support. You will spend less time debugging old code and more time writing new code. And most importantly, it actually takes less time to write high-quality code than it does to write low-quality code, so you will save a great deal of time on overall development. If you have never developed products with a zero-defect philosophy from day one, its adoption can reduce your development time for new products by 30% or more while simultaneously improving product quality.

SQA TECHNIQUES

You Will Learn How To Implement and effectively lead Software Quality Assurance (SQA) activities Improve customer satisfaction through SQA practices Deliver consistent quality through verification and validation best practices Control critical product components using Configuration Management (CM) Analyze data through quality audits to make better decisions Champion a continuous process improvement program in your organization

Course BenefitsSoftware systems that fail to provide full functionality, performance or otherwise not meet user needs can reduce profit, productivity and result in costly rework. Optimizing Software Quality Assurance practices results in cost-effective and high-quality software. This course provides the necessary skills to define, design and implement a software quality system using proven techniques tailored for your life cycle model.

10. Course Workshop11. You apply proven quality assurance techniques in a series of workshops, including: 12. Discovering quality problems 13. Streamlining the process 14. Applying life cycle models 15. Determining the appropriate project standards 16. Conducting peer reviews 17. Identifying configuration items 18. Simulating audit situations 19. Designing metrics for your project20.

54

Page 55: EIT-061 Software Quality Engineering

TOTAL QUALITY MANAGEMENT(TQM) is an integrated organizational effort designed to improve quality at every level. QM is about meeting quality.expectations as defined by the customer; this is called customer-defined quality.Total Quality Management (TQM) is a management strategy aimed at embedding awareness of quality in all organizational processes. TQM is composed of three paradigms:

Total: Involving the entire organization, supply chain, and/or product life cycle Quality: With its usual Definitions, with all its complexities (External Definition) Management: The system of managing with steps like Plan, Organize, Control, Lead, Staff, provisioning and suchlike[citation needed].

As defined by the International Organization for Standardization (ISO):"TQM is a management approach for an organization, centered on quality, based on the participation of all its members and aiming at long-term success through customer satisfaction, and benefits to all members of the organization and to society. TQM Total Quality Management is the management of total quality. We know that management consists of planning, organizing, directing, control, and assurance. Then, one has to define "total quality".

Total quality is called total because it consists of 3 qualities: Quality of return to satisfy the needs of the shareholders, Quality of products and services to satisfy some specific needs of the consumer (end user) and Quality of life - at work and outside work - to satisfy the needs of the people in the organization. Also

TQM may also be elaborated as T= made up of Whole, Q= Degree of Excellence, M= An Art, Hence TQM is the art of managing the whole who achieve excellence.Goal: Do right thing right, first time and every time.It Include three CPI i.e.

1. Continuous Process Improvement2. Continuous Product Improvement3. Continuous Productivity Improvement

Continuous Process Improvement:It means making the things better. Its goal is not to blame people for problem failures. It is a simple a way of loading how we can do our work better.

CPI Procedure:It has been defined using a model that known as ADDIE

55

Page 56: EIT-061 Software Quality Engineering

Analyze: Identify the area of opportunity and target specific problems.Design: Generates solution through brain storming session, identifying required resources.Development: Formulate it details procedure for approve id.Implementation: Execute SolutionEvaluation: Build measurement tool, monitor implementation & evaluate measurment to base line.

Continuous Product Improvement: The purpose of product improvement is to document and formalize the multilayer software product. Strategy:

1. Key Requirement needsa. System level Requirementb. Functional Requirementc. Non-Technical Requirement

2. Task Order Management Approach3. Approach to re-architecture

Continuous Productivity Improvement:

Gaining productivity in the work place requires1. Max to how many employees are may be used.2. Ensuring their environment is conductive to better work3. Using the correct reward to motivates study.

-56

Page 57: EIT-061 Software Quality Engineering

QUALITY STANDARD:

Quality Standards and Processes

Getting started with quality standards

Our objective here is not to have you become an expert in quality standards, but to help you start learning about quality standards and make sure you are aware of different sources of information that are conveniently available. There is no "one and only" way to define the quality needed for a laboratory test, even though there are fierce arguments for or against certain types of quality requirements [see Quality requirements: The debate heats up]. Different types of quality standards are needed to manage quality at different places in the process, such as clinical outcome criteria that reflect medically important changes in test results, analytical outcome criteria that describe the allowable total analytical error in test results, and analytical operating specifications that describe the allowable imprecision, allowable bias, and the QC needed to detect medically important errors in the testing process physical or mental; both physical work processes and management decision processes need to be standardized or systematized to assure consistent and reliable results.]

Another important term is standard of quality which is a criterion or statement that describes the acceptable level of something. For an analytical test, we need to know how quickly a test result needs to be reported, as well as how close the result must be to the true or correct value. A standard for turnaround time is more easily understood than a standard for truth or correctness. For example, it is obvious that turnaround time should be stated in units of time, usually minutes. These units are understood by both the party requesting the test and the party providing the service. The party ordering the test defines the requirement on the basis of the medical service being provided. Both parties can measure whether the observed performance satisfies the requirement.

Analytical quality is more difficult because it involves technical concepts such as imprecision and inaccuracy, which are not always understood by laboratorians and are certainly even less well understood by the physicians who order the tests or the patients who are the ultimate consumers of the test results. Customers and consumers of laboratory services can not easily define the analytical quality that is required (at least not in the analytical terms desired by the laboratory), nor can they measure or assess analytical quality. The laboratory, therefore, must take full responsibility for managing the analytical quality of its services

The accompanying figure shows the relationships between these different types of quality standards. Starting at the top left of the figure, standard treatment guidelines (clinical pathways, clinical practice guidelines, etc.) can be used to define medically important changes and establish clinical outcome criteria in the form of decision intervals (Dint). Such clinical criteria can be converted to laboratory operating specifications for imprecision (smeas), inaccuracy (biasmeas), and QC (control rules, N) by a clinical quality-planning model [4] that accounts for pre analytical factors, such as within-individual biologic. Biologic goals based on within-subject biologic variability should set a boundary condition on these operating specifications, defining the most demanding condition for stable performance that would be required to monitor changes in individual subjects. The right side of the figure shows how proficiency testing criteria define analytical outcome criteria in the form of allowable total errors (TEa), which can likewise be translated to operating specifications (smeas, biasmeas, control rules, N) via an analytical quality-planning model. Note that the allowable total error can also be set on the basis of total biologic goals , therefore the extensive information that is available on biologic variation can also be useful in this situation.

Convenient sources of quality standards

57

Page 58: EIT-061 Software Quality Engineering

A list of analytical quality requirements is provided by the proficiency testing criteria for acceptable performance that have been defined in the Clinical Laboratory Improvement Amendments (CLIA). This information is readily available and can be used when validating the performance of analytical methods important changes in test results is also available, however, that information needs to be very carefully interpreted if used for validating the performance of analytical methods. An extensive databank is available that summarizes the biologic variation and can be used to calculate "biologic goals" for imprecision, inaccuracy, and total error. You can also review European recommendations for biologic goals for imprecision and inaccuracy, as well as calculated biologic allowable total errors, based on individual or within-subject biological variation

Software Engineering Standards

According to the IEEE Comp. Soc. Software Engineering Standards Committee a standard can be:• An object or measure of comparison that defines or represents the magnitude of a unit• A characterization that establishes allowable tolerances or constraints for categories of items,• A degree or level of required excellence or attainment.

58

Page 59: EIT-061 Software Quality Engineering

Unit-5Software Validation, Verification and Testing

VERIFICATION AND VALIDATION

The evolution of software that satisfies its user expectations is a necessary goal of a successful software development organization. To achieve this goal, software engineering practices must be applied throughout the evolution of the software product. Most of these software engineering practices attempt to create and modify software in a manner that maximizes the probability of satisfying its user expectations. Other practices, addressed in this module, actually attempt to insure that the product will meet these user expectations. These practices are collectively referred to as software verification and validation (V&V). The reader is cautioned that terminology in this area is often confusing and conflicting. The glossary of this module contains complete definitions of many of the terms often used to discuss V&V practices. This section attempts to clarify terminology as it will be used in the remainder of the module.

Validation refers to the process of evaluating software at the end of its development to insure that it is free from failures and complies with its requirements. A failure is defined as incorrect product behavior. Often this validation occurs through the utilization of various testing approaches. Other intermediate software products may also be validated, such as the validation of a requirements description through the utilization of a prototype.

Verification refers to the process of determining whether or not the products of a given phase of a software development process fulfill the requirements established during the previous phase. Software technical reviews represent one common approach for verifying various products. For example, a specifications review will normally attempt to verify the specifications description against a requirements description (what Rombach has called “D requirements” and “C requirements,” respectively [Rombach87]). Proof of correctness is another technique for verifying programs to formal specifications. Verification approaches attempt to identify product faults or errors, which give rise to failures.

Evolving Nature of Area

As the complexity and diversity of software products continue to increase, the challenge to develop new and more effective V&V strategies continues. The V&V approaches that were reasonably effective on small batch-oriented products are not sufficient for concurrent, distributed, or embedded products.Thus, this area will continue to evolve as new research results emerge in response to new V&V challenges.

V&V LimitationsThe overall objective of software V&V approaches is to insure that the product is free from failures and meets its user’s expectations. There are several theoretical and practical limitations that make this objective impossible to obtain for many products.

1. Theoretical FoundationsSome of the initial theoretical foundations for testing were presented by Good enough and Gerhardt in their classic paper [Goodenough75]. This paper provides definitions for reliability and validity, in an attempt to characterize the properties of a test selection strategy. A mathematical framework for investigating testing that enables comparisons of the power of testing methods is described in [Gourlay83]. Howden claims the most important

59

Page 60: EIT-061 Software Quality Engineering

theoretical result in program testing and analysis is that no general purpose testing or analysis procedure can be used to prove program correctness. A proof of this result is contained in his text [Howden87].2. Impracticality of Testing All DataFor most programs, it is impractical to attempt to test the program with all possible inputs, due to a combinatorial explosion [Beizer83, Howden87]. For those inputs selected, a testing oracle is needed to determine the correctness of the output for a particular test input [Howden87].

3. Impracticality of Testing All PathsFor most programs, it is impractical to attempt to test all execution paths through the product, due to a combinatorial explosion [Beizer83]. It is also not possible to develop an algorithm for generating test data for paths in an arbitrary product, due to the inability to determine path feasibility [Adrion86].

4. No Absolute Proof of CorrectnessHowden claims that there is no such thing as an absolute proof of correctness [Howden87]. Instead, he suggests that there are proofs of equivalence, i.e., proofs that one description of a product is equivalent to another description. Hence, unless a formal specification can be shown to be correct and, indeed, reflects exactly the user’s expectations, no claim of product correctness can be made [Beizer83, Howden87].

THE ROLE OF V&V IN SOFTWARE EVOLUTION

The evolution of a software product can proceed in many ways, depending upon the development approach used. The development approach determines the specific intermediate products to be created. For any given project, V&V objectives must be identified for each of the products created.

1. Types of ProductsTo simplify the discussion of V&V objectives, five types of products are considered in this module. These types are not meant to be a partitioning of all

software documents and will not be rigorously defined. Within each product type, many different representational forms are possible. Each representational form determines, to a large extent, the applicability of particular V&V approaches. The intent here is not to identify V&V approaches applicable to all products in any form, but instead to describe V&V approaches for representative forms of products. References are provided to other sources that treat particular approaches in depth.

a. RequirementsThe requirements document (Rombach [Rombach87]: “customer/user-oriented requirements” or C-requirements) provides an informal

statement of the user’s needs.

b. Specifications The specifications document (Rombach: “design oriented requirements” or D-requirements) provides a refinement of the user’s needs, which

must be satisfied by the product. There are many approaches for representing specifications, both formal and informal [Berztiss87, Rombach87].

60

Page 61: EIT-061 Software Quality Engineering

c. DesignsThe product design describes how the specifications will be satisfied. Depending upon the development approach applied in the project,

there may be multiple levels of designs. Numerous possible design representation approaches are described in Introduction to Software Design [Budgen88].

d. Implementations“Implementation” normally refers to the source code for the product. It can, however, refer to other implementation-level products, such as

decision tables [Beizer83].

e. ChangesChanges describe modifications made to the product. Modifications are normally the result of error corrections or additions of new

capabilities to the product.

V&V ObjectivesThe specific V&V objectives for each product must be determined on a project-by-project basis. This determination will be influenced by the criticality

of the product, its constraints, and its complexity. In general, the objective of the V&V function is to insure that the product satisfies the user needs. Thus, everything in the product’s requirements and specifications must be the target of some V&V activity. In order to limit the scope of this module, however, the V&V approaches described will concentrate on the functional and performance portions of the requirements and specifications for the product. Approaches for determining whether a product satisfies its requirements and specifications with respect to safety, portability, usability, maintainability, serviceability, security, etc., although very important for many systems, will not be addressed here. This is consistent with the V&V approaches normally described in the literature. The broader picture of “assurance of software quality” is addressed elsewhere [Brown87].Limiting the scope of the V&V activities to functionality and performance, five general V&V objectives can be identified [Howden81, Powell86a].

These objectives provide a framework within which it is possible to determine the applicability of various V&V approaches and techniques. a. Correctness The extent to which the product is fault free. b. Consistency

The extent to which the product is consistent within itself and with other products.c. Necessity

The extent to which everything in the product is necessary.d. Sufficiency

The extent to which the product is complete.e. Performance

The extent to which the product satisfies its performance requirements.

61

Page 62: EIT-061 Software Quality Engineering

SOFTWARE V&V APPROACHES AND THEIR APPLICABILITY

Software V&V activities occur throughout the evolution of the product. There are numerous techniques and tools that may be used in isolation or in combination with each other. In an effort to organize these V&V activities, five broad classifications of approaches are presented. These categories are not meant to provide a partitioning, since there are some techniques that span categories. Instead, the categories represent a practical view that reflects the way most of the V&V approaches are described in the literature and used in practice. Possible combinations of these approaches are discussed in the next section.

1. Software Technical ReviewsThe software technical review process includes techniques such as walk-through, inspections, and audits. Most of these approaches involve a group

meeting to assess a work product. A comprehensive examination of the technical review process and its effectiveness for software products is presented in The Software Technical Review Process [Collofello88].

Software technical reviews can be used to examine all the products of the software evolution process. In particular, they are especially applicable and necessary for those products not yet in machine processable form, such as requirements or specifications written in natural language.

2. Software TestingSoftware testing is the process of exercising a product to verify that it satisfies specified requirements or to identify differences between expected and

actual results [IEEE83a].

a. Levels of TestingIn this section, various levels of testing activities, each with its own specific goals, are identified and described. This listing of levels is not meant to be

complete, but will illustrate the notion of levels of testing with particular goals. Other possible levels of testing not addressed here include acceptance testing, alpha testing, beta testing, etc.[Beizer84].

(i) Module TestingModule (or unit) testing is the lowest level of testing and involves the testing of a software module or unit. The goal of module-level testing is to

insure that the component being tested conforms to its specifications and is ready to be integrated with other components of the product. Module testing is treated in depth in the curriculum module Unit Testing and Analysis [Morell88].

(ii) Integration TestingIntegration testing consists of the systematic combination and execution of product components. Multiple levels of integration testing are possible

with a combination of hardware and software components at several different levels. The goal of integration testing is to insure that the interfaces between the components are correct and that the product components combine to execute the product’s functionality correctly.

SQE-EIT-061-Pradeep-62

Page 63: EIT-061 Software Quality Engineering

(iii) System TestingSystem testing is the process of testing the integrated hardware and software system to verify that the system meets its specified requirements

[IEEE83a]. Practical priorities must be established to complete this task effectively. One general priority is that system testing must concentrate more on system capabilities rather than component capabilities [Beizer84, Mc- Cabe85, Petschenik85]. This suggests that system tests concentrate on insuring the use and interaction of functions rather than testing the details of their implementations. Another priority is that testing typical situations is more important that testing special cases [Petschenik85, Sum86]. This suggests that test cases be constructed corresponding to high-probability user scenarios. This facilitates early detection of critical problems that would greatly disrupt a user.

There are also several key principles to adhere to during system testing:• System tests should be developed and performed by a group independent of the people who developed the code.• System test plans must be developed and inspected with the same rigor as other elements of the project.• System test progress must be planned and tracked similarly to other elements of the project.• System tests must be repeatable.

(iv)Regression TestingRegression testing can be defined as the process of executing previously defined test cases on a modified program to assure that the software changes

have not adversely affected the program’s previously existing functions. The error-prone nature of software modification demands that regression testing be performed.Some examples of the types of errors targeted by regression testing include:

• Data corruption errors. These errors are side effects due to shared data.• Inappropriate control sequencing errors. These errors are side effects due to changes in execution sequences. An example of this type of error is the attempt to remove an item from a queue before it is placed into the queue.• Resource contention. Examples of these types of errors are potential bottlenecks and deadlocks.• Performance deficiencies. These include timing and storage utilization errors.

An important regression testing strategy is to place a higher priority on testing the older capabilities of the product than on testing the new capabilities provided by the modification [Pet- schenik85]. This insures that capabilities the user has become dependent upon are still intact. This is especially important when we consider that a recent study found that half of all failures detected by users after a modification were failures of old capabilities, as a result of side effects of implementation of new functionality [Collofello87].

Regression testing strategies are not well defined in the literature. They differ from development tests in that development tests tend to be smaller and diagnostic in nature, whereas regression tests tend to be long and complex scenarios testing many capabilities, yet possibly proving unhelpful in isolating a problem, should one be encountered. Most regression testing strategies require that some baseline of product tests be rerun. These tests must be

63

Page 64: EIT-061 Software Quality Engineering

supplemented with specific tests for the recent modifications. Strategies for testing modifications usually involve some sort of systematic execution of the modification and related areas. At a module level, this may involve retesting module execution paths traversing the modification. At a product level, this activity may involve retesting functions that execute the modified area [Fisher77]. The effectiveness of these strategies is highly dependent upon the utilization of test matrices (see below), whichenable identification of coverage provided by particular test cases.

b. Testing Techniques and their Applicability

(i) Functional Testing and AnalysisFunctional testing develops test data based upon documents specifying the behavior of the software. The goal of functional testing is to exercise each

aspect of the software’s specified behavior over some subset of its input. Howden has developed an integrated approach to testing based upon this notion of testing each aspect of specified behavior [Howden86, Howden87]. Functional testing and analysis techniques are applicable for all levels of testing. However, the level of specified behavior to be tested will normally be at a higher level for integration and system-level testing. Thus, at a module level, it is appropriate to test boundary conditions and low-level functions, such as thecorrect production of a particular type of error message. At the integration and system level, the types of functions tested are normally those involving some combination of lower-level functions. Testing combinations of functions involves selection of specific sequences of inputs that may reveal sequencing errors due to:• race conditions• resource contention• deadlock• interrupts• synchronization issues

Functional testing and analysis techniques are effective in detecting failures during all levels of testing. They must, however, be used in combination with other strategies to improve failure detection effectiveness.The automation of functional testing techniques has been hampered by the informality of commonly used specification techniques. The difficulty lies in the identification of the functions to be tested. Some limited success in automating this process has been obtained for some more rigorous specification techniques. orFunctional testing is a type of black box testing that bases its test cases on the specifications of the software component under test. Functions are tested by feeding them input and examining the output, and internal program structure is rarely considered.Functional testing differs from system testing in that functional testing "verif[ies] a program by checking it against ... design document(s) or specification(s)", while system testing "validate[s] a program by checking it against the published user or system requirements"Functional testing typically involves five steps

1. The identification of functions that the software is expected to perform64

Page 65: EIT-061 Software Quality Engineering

2. The creation of input data based on the function's specifications3. The determination of output based on the function's specifications4. The execution of the test case5. The comparison of actual and expected outputs

(ii) Structural Testing and Analysis

Structural testing develops test data based upon the implementation of the product. Usually this testing occurs on source code. However, it is possible to do structural testing on other representations of the program’s logic. Structural testing and analysis techniques include data flow anomaly detection, data flow coverage assessment, and various levels of path coverage. A classification of structural testing approaches and a description of representative techniques is presented in [Morell88] and in Glenford Myers’ text [Myers79].

Structural testing and analysis are applicable to module testing, integration testing, and regression testing. At the system test level, structural testing is normally not applicable, due to the size of the system to be tested. For example, a paper discussing the analysis of a product consisting of 1.8 million lines of code, suggests that over 250,000 test cases would be needed to satisfy coverage criteria [Petschenik85]. At the module level, all of the structural techniques are applicable. As the level of testing increases to the integration level, the focus of the structural techniques is on the area of interface analysis [Howden87]. This interface analysis may involve module interfaces, as well as interfaces to other system components. Structural testing and analysis can also be performed on designs using manual walk-through or design simulations [Powell86a].

Structural testing and analysis techniques are very effective in detecting failures during the module and integration testing levels. Beizer reports that path testing catches 50% of all errors during module testing and a total of one-third of all of the errors [Beizer84]. Structural testing is very cumbersome to perform without tools, and even with tools requires considerable effort to achieve desirable levels of coverage. Since structural testing and analysis techniques cannot detect missing functions (nor some other types of errors), they must be used in combination with other strategies to improve failure detection effectiveness [Beizer84, Girgis86, Howden80, Selby86].

There are numerous automated techniques to support structural testing and analysis. Most of the automated approaches provide statement and branch coverage. Tools for automating several structural testing techniques are described in the papers cited in [Morell88].

(iii) Error-Oriented Testing and AnalysisError-oriented testing and analysis techniques are those that focus on the presence or absence of errors in the programming process. A classification

of these approaches and a description of representative techniques is presented in [Morell88].

Error-oriented testing and analysis techniques are, in general, applicable to all levels of testing. Some techniques, such as statistical methods [Currit86], error seeding [Mills83], and mutation testing [DeMillo78], are particularly suited to application during the integration and system levels of testing.

65

Page 66: EIT-061 Software Quality Engineering

(iv) Hybrid ApproachesCombinations of the functional, structural, and error-oriented techniques have been investigated and are described in [Morell88]. These hybrid

approaches involve integration of techniques, rather than their composition. Hybrid approaches, particularly those involving structural testing, are normally applicable at the module level.

(v) Integration StrategiesIntegration consists of the systematic combination and analysis of product components. It is assumed that the components being integrated have

already been individually examined for correctness. This insures that the emphasis of the integration activity is on examining the interaction of the components[Beizer84, Howden87]. Although integration strategies are normally discussed for implementations, they are also applicable for integrating the components of any product, such as designs.

There are several types of errors targeted by integration testing:

• Import/export range errors This type of error occurs when the source of input parameters falls outside of the range of their destination. For example, assume module A calls module B with table pointer X. If A assumes a maximum table size of 10and B assumes a maximum table size of 8, an import/export range error oc-curs. The detection of this type of error requires careful boundary-value testing of parameters.• Import/export type compatibility errors. This type of error is attributive to a mismatch of user-defined types. These errors are normally detected by compilers or code inspections.• Import/export representation errors. This type of error occurs when parameters are of the same type, but the meaning of the parameters is different in the calling and called modules. For example, assume module A passes a parameter Elapsed Time, of type real, to module B. Module A might pass the value as seconds,while module B is assuming the value is passed as milliseconds. These types of errors are difficult to detect, although range checks and inspections provide some assistance.• Parameter utilization errors. Dangerous assumptions are often made concerning whether a module called will alter the information passed to it. Although support for detecting such errors is provided by some compilers, careful testing and/or inspections may be necessary to insure that values have not been unexpectedly corrupted.• Integration time domain/ computation errors. A domain error occurs when a specific input follows the wrong path due to an error in the control flow. A computation error exists when a specific input follows the correct path, but an error in some assignment statement causes the wrong function to be computed. Al-though domain and computation errors are normally addressed during module testing, the concepts apply across module boundaries. In fact, some domain and computation errors in the integrated program might be masked during integration testing if the module being integrated is assumed to be correct and is treated asa black box. Examples of these types of errors and an approach for detecting them is presented in [Haley84].

(vi) Transaction Flow Analysis66

Page 67: EIT-061 Software Quality Engineering

Transaction flow analysis develops test data to execute sequences of tasks that correspond to a transaction, where a “transaction” is defined as a unit of work seen from a system user’s point of view [Beizer84, McCabe85, Petschenik85].An example of a transaction for an operating system might be a request to print a file. The execution of this transaction requires several tasks, such as checking the existence of the file, validating permission to read the file, etc. The first step of transaction flow analysis is to identify the transactions. McCabe suggests the drawing of data flow diagrams after integration testing to model the logical flow of the system. Each transaction can then be identified as path through the data flow diagram, with each data flow process corresponding to a task that must be tested in combination with other tasks on the transaction flow [McCabe85]. Information about transaction flows may also be obtained from HIPO diagrams, Petri nets, or other similar system-level documentation [Beizer84].Once the transaction flows have been identified, black-box testing techniques can be utilized to generate test data for selected paths through the transaction flow diagram. Some possible guidelines for selecting paths follow:• Test every link/decision in the trans-action flow graph.• Test each loop with a single, double, typical, maximum, and maximum-less-one number of iterations.• Test combinations of paths within and between transaction flows.• Test that the system does not do things that it is not supposed to do,

(vii) Stress AnalysisStress analysis involves analyzing the behavior of the system when its resources are saturated, in order to assess whether or not the system will

continue to satisfy its specifications.Some examples of errors targeted by stress tests include:• potential race conditions• errors in processing sequences• errors in limits, thresholds, or controls designed to deal with overload situations• resource contention and depletion

For example, one typical stress test for an operating system would be a program that re-quests as much memory as the system has available. The first step in performing a stress analysis is identifying those resources that can and should be stressed. This identification is very system-dependent, but often includes resources such as file space, memory, I/O buffers, processing time, and interrupt handlers. Once these resources have been identified, test cases must be designed to stress them. These tests often require large amounts of data, for which auto-mated support in the form of test-case genera-tors is needed [Beizer84, Sum86].Although stress analysis is often viewed as one of the last tasks to be performed during system testing, it is most effective if it is applied during each of the product’s V&V activities. Many of the errors detected during a stress analysis correspond to serious design flaws. For example, a stress analysis of a design may involve an identification of potential bottlenecks that may prevent the product from satisfying its specifications under extreme loads[Beizer84].Stress analysis is a necessary complement to the previously described testing and analysis techniques for resource-critical applications. Whereas the foregoing techniques primarily view the product under normal operating conditions, stress analysis views the product under conditions that may not have been anticipated. Stress analysis techniques can also be combined with other approaches during V&V activities to insure that the product’s specifications for such attributes as performance, safety, security, etc., are met.

67

Page 68: EIT-061 Software Quality Engineering

(viii) Failure AnalysisFailure analysis is the examination of the product's reaction to failures of hardware or software. The product’s specifications must be examined to

determine precisely which types of failures must be analyzed and what the product's reaction must be. Failure analysis is sometimes referred to as “recovery testing”[Beizer84].Failure analysis must be performed during each of the product’s V&V activities. It is essential during requirement and specification V&V activities that a clear statement of the product's response to various types of failures be ad-dressed in terms that allow analysis. The de-sign must also be analyzed to show that the product's reaction to failures satisfies its specifications. The failure analysis of implementations often occurs during system testing. This testing may take the form of simulating hardware or software errors or actual introduction of these types of errors. Failure analysis is essential to detecting product recovery errors. These errors can lead to lost files, lost data, duplicate transactions, etc. Failure analysis techniques can also be combined with other approaches during V&V activities to insure that the product’s specifications for such attributes as performance, security, safety, usability, etc., are met.

(ix) Concurrency AnalysisConcurrency analysis examines the interaction of tasks being executed simultaneously within the product to insure that the overall specifications are

being met. Concurrent tasks maybe executed in parallel or have their execution interleaved. Concurrency analysis is some-times referred to as “background testing”[Beizer84].For products with tasks that may execute in parallel, concurrency analysis must be per-formed during each of the product’s V&V activities. During design, concurrency analysis should be performed to identify such issues as potential contention for resources, deadlock, and priorities. A concurrency analysis for implementations normally takes place during system testing. Tests must be designed, executed, and analyzed to exploit the parallelism in the system and insure that the specifications are met.

(x) Performance AnalysisThe goal of performance analysis is to insure that the product meets its specified performance objectives. These objectives must be stated in

measurable terms, so far as possible. Typical performance objectives relate to response time and system throughput [Beizer84].A performance analysis should be applied during each of the product’s V&V activities. During requirement and specification V&V activities, performance objectives must be analyzed to insure completeness, feasibility, and testability. Prototyping, simulation, or other modeling approaches may be used to insure feasibility. For designs, the performance requirements must be allocated to individual components. These components can then be analyzed to determine if the performance requirements can be met. Prototyping, simulation, and other modeling approaches again are techniques applicable to this task. For implementations, a performance analysis can take place during each level of testing. Test data must be carefully constructed to correspond to the scenarios for which the performance requirements were specified.

PROOF OF CORRECTNESS

Proof of correctness is a collection of techniques that apply the formality and rigor of mathematics to the task of proving the consistency between an algorithmic solution and a rigorous, complete specification of the intent of the solution [Adrion86, Powell86b].This technique is also often referred to as “formal verification.” The usual proof technique follows Floyd's Method of Inductive Assertions or some variant [Floyd67, Hantler76].Proof of correctness techniques are normally presented in the context of verifying an implementation against a specification. The techniques are also applicable in verifying the correctness of other products, as long as they possess a formal representation [Ambler78, Korelsky87].

68

Page 69: EIT-061 Software Quality Engineering

There are several limitations to proof of correctness techniques. One limitation is the dependence of the technique upon a correct formal specification that reflects the user’s needs. Current specification approaches cannot always capture these needs in a formal way, especially when product aspects such as performance, reliability, quality, etc., are considered[Berztiss87, Rombach87]. Another limitation has to do with the complexity of rigorously specifying executioner behavior of the computing environment. For large programs, the amount of detail to handle, combined with the lack of powerful tools may make the proof technique impractical [Beizer83, Korelsky-87, Howden87, Powell86b].More information on proof of correctness approaches is contained in the curriculum module Formal Verification of Programs [Berztiss88].

SIMULATION AND PROTOTYPING

Simulation and prototyping are techniques for analyzing the expected behavior of a product. There are many approaches to constructing simulations and prototypes that are well-documented in the literature. For V&V purposes, simulations and prototypes are normally used to analyze requirements and specifications to insure that they reflect the user's needs [Brackett88]. Since they are executable, they offer additional insight into the completeness and correctness of these documents. Simulations and prototypes can also be used to analyze predicted product performance, especially for candidate product designs, to insure that they conform to the requirements. It is important to note that the utilization of simulation and prototyping as V&V techniques requires that the simulations and prototypes themselves be correct. Thus, the utilization of these techniques requires an additional level of V&V activity.

REQUIREMENTS TRACING

Requirements tracing is a technique for insuring that the product, as well as the testing of the product, addresses each of its requirements. The usual approach to performing requirements tracing uses matrices. One type of matrix maps requirements to software modules. Construction and analysis of this matrix can help insure that all requirements are properly addressed by the product and that the product does not have any superfluous capabilities[Powell86b]. System Verification Diagrams are another way of analyzing requirements/modules traceability [Deutsch82]. Another type of matrix maps requirements to test cases. Construction and analysis of this matrix can help insure that all requirements are properly tested. A third type of matrix maps requirements to their evaluation approach. The evaluation approaches may consist of various levels of testing, reviews, simulations, etc. The requirements/evaluation matrix insures that all requirements will undergo some form of V&V[Deutsch82, Powell86b]. Requirements tracing can be applied for all of the products of the software evolution process.

SOFTWARE V&V PLANNING

The development of a comprehensive V&V plan is essential to the success of a project. This plan must be developed early in the project. Depending on the development approach followed, multiple levels of test plans may be developed, corresponding to various levels of V&V activities. Guidelines for the contents of system, software, build, and module test plans have been documented in the literature [Deutsch82, DoD87,Evans84, NBS76, IEEE83b]. These references also contain suggestions about how to document other in-formation, such as test procedures and test cases. The formulation of an effective V&V plan requires many considerations that are defined in the remainder of this section.

1. Identification of V&V GoalsV&V goals must be identified from the requirement sand specifications. These goals must address those attributes of the product that correspond to

its user expectations. These goals must be achievable, taking into account both theoretical and practical limitations [Evans84, Powell86a, Sum86].

69

Page 70: EIT-061 Software Quality Engineering

2. Selection of V&V TechniquesOnce a set of V&V objectives has been identified, techniques must be selected for each of the project's evolving products. A methodology for the

selection of techniques and tools is presented in[Powell86b]. More specific guidelines for the selection of techniques applicable at the unit level of testing are presented in [Morell88]. A mapping of some of the approaches presented in Section IV of this module to the products in Section III follows.

a. RequirementsThe applicable techniques for accomplishing the V&V objectives for requirements are technical re-views, prototyping, and simulation. The review process is often called a System Requirements Review (SRR). Depending upon the representation of the requirements, consistency analyzers may be used to support the SRR.b. SpecificationsThe applicable techniques for accomplishing the V&V objectives for specifications are technical reviews, requirements tracing, prototyping, and simulation. The specifications review is some-times combined with a review of the product's high-level design. The requirements must be traced to the specifications.c. DesignsThe applicable techniques for accomplishing the V&V objectives for designs are technical reviews, requirements tracing, prototyping, simulation, and proof of correctness. High-level designs that correspond to an architectural view of the product are often reviewed in a Preliminary Design Re-view. Detailed designs are addressed by a Critical Design Review. Depending upon the representation of the design, static analyzers may be used to assist these review processes. Requirements must be traced to modules in the architectural design; matrices can be used to facilitate this process [Powell86b]. Prototyping and simulation can be used to assess feasibility and adherence to performance requirements. Proofs of correctness, where applicable, are normally performed at the detailed design level [Dyer87].d. ImplementationsThe applicable techniques for accomplishing the V&V objectives for implementations are technical reviews, requirements tracing, testing, and proof of correctness. Various code review techniques such as walkthroughs and inspections exist. At the source-code level, several static analysis techniques are available for detecting implementation errors. The requirements tracing activity is here concerned with tracing requirements to source-code modules. The bulk of the V&V activity for source code consists of testing. Multiple levels of testing are usually performed. Where applicable, proof-of-correctness techniques may be applied, usually at the module level.e. ChangesSince changes describe modifications to products, the same techniques used for V&V during development may be applied during modification. Changes to implementations require regression testing.

3. Organizational ResponsibilitiesThe organizational structure of a project is a key planning consideration for project managers. An important aspect of this structure is delegation of

V&V activities to various organizations [Deutsch82,Evans84, Petschenik85, Sum86]. This decision is often based upon the size, complexity, and criticality of the product. In this module, four types of organizations are addressed. These organizations reflect typical strategies for partitioning tasks to achieve V&V goals for the product. It is, of course, possible to delegate these V&V activities in many other ways.

SQE-EIT-061-Pradeep-70

Page 71: EIT-061 Software Quality Engineering

a. Development OrganizationThe development organization has responsibility for participating in technical reviews for all of the evolution products. These reviews must insure

that the requirements can be traced throughout the class of products. The development organization may also construct prototypes and simulations. For code, the development organization has responsibility for preparing and executing test plans for unit and integration levels of testing. In some environments, this is referred to as Preliminary Qualification Testing. The development organization also constructs any applicable proofs of correctness at the module level.

b. Independent Test OrganizationAn independent test organization (ITO) may be established, due to the magnitude of the testing effort or the need for objectivity. An ITO enables the

preparation for test activities to occur in parallel with those of development. The ITO normally participates in all of the product’s technical re- views and monitors the preliminary qualification testing effort. The primary responsibility of the ITO is the preparation and execution of the product’s system test plan. This is sometimes referred to as the Formal Qualification Test. The plan for this must contain the equivalent of a requirements/evaluation matrix that defines the V&V approach to be applied for each requirement [Deutsch82]. If the product must be integrated with other products, this integration activity is normally the responsibility of the ITO as well.

c. Software Quality AssuranceAlthough software quality assurance may exist as a separate organization, the intent here is to identify some activities for assuring software quality

that may be distributed using any of a number of organizational structures [Brown87]. Evaluations are the primary avenue for assuring software quality. Some typical types of evaluations to be performed where appropriate throughout the product life cycle are identified below. Other types can be found in Assurance of Software Quality [Brown87].

Evaluation types: • internal consistency of product• understandability of product• traceability to indicated documents• consistency with indicated documents• appropriate allocation of sizing, timing resources• adequate test coverage of requirements• consistency between data definitions and use• adequacy of test cases and test procedures• completeness of testing• completeness of regression testing

d. Independent V&V ContractorSQE-EIT-061-Pradeep-71

Page 72: EIT-061 Software Quality Engineering

An independent V&V contractor may sometimes be used to insure independent objectivity and evaluation for the customer. The scope of activities for this contractor varies, including any or all of the activities addressed for the Independent Test and Software Quality Assurance organizations [Deutsch82].

INTEGRATING V&V APPROACHES

Once a set of V&V objectives has been identified,an overall integrated V&V approach must be deter-mined. This approach involves integration of techniques applicable to the various life cycle phases as well as delegation of these tasks among the project's organizations. The planning of this integrated V&V approach is very dependent upon the nature of the product and the process used to develop it. Traditional integrated V&V approaches have followed the “waterfall model” with various V&V functions allocated to the project’s development phases[Deutsch82, DoD87, Evans84, Powell86a]. Alter- natives to this approach exist, such as the Clean room software development process developed by IBM. This approach is based on a software development process that produces incremental product releases, each of which undergoes a combination of formal verification and statistical testing techniques[Currit86, Dyer87]. Regardless of the approach selected, V&V progress must be tracked. Requirements/evaluation matrices play a key role in this tracking by providing a means of insuring that each requirement of the product is addressed [Powell86b,Sum86].

PROBLEM TRACKING

Other critical aspects of a software V&V plan are developing a mechanism for documenting problems encountered during the V&V effort, routing problems identified to appropriate individuals for correction, and insuring that the corrections have been per- formed satisfactorily. Typical information to be collected includes:• when the problem occurred• where the problem occurred• state of the system before occurrence• evidence of the problem• actions or inputs that appear to have led to occurrence• description of how the system should work; reference to relevant requirements• priority for solving problem• technical contact for additional informationProblem tracking is an aspect of configuration management that is addressed in detail in the curriculum module Software Configuration Management [To-mayko87]. A practical application of problem tracking for operating system testing is presented in[Sum86].

Tracking Test ActivitiesThe software V&V plans must provide a mechanism for tracking the testing effort. Data must be collected that enable project management to assess

both the quality and the cost of testing activities. Typical data to collect include:• number of tests executed• number of tests remaining• time used• resources used• number of problems found and the time spent finding them

SQE-EIT-061-Pradeep-72

Page 73: EIT-061 Software Quality Engineering

These data can then be used to track actual test progress against scheduled progress. The tracking information is also important for future test scheduling.

AssessmentIt is important that the software V&V plan provide for the ability to collect data that can be used to assess both the product and the techniques used

to develop it. Often this involves careful collection of error and failure data, as well as analysis and classification of these data. More information on assessment approaches and the data needed to perform them is contained in [Brown87].

STATIC AND DYNAMIC TESTING AND TOOLS

Static Testing Dynamic Testing

About Prevention About CureBefore Compilation After Compilation and linkingVerification Part Validation Part

Note: Static Testing is more effective than Dynamic Testing

Static Testing is also known as Review Inspection Walkthrough while Dynamic Testing is known to be Unit Testing, Integration Testing, System testing, Acceptance Testing

a) Flow Analyzer: It ensure consistency in data flow from input to output.b) Path Test: It is used to find unused code or code with contradiction.c) Coverage Analyzer: It ensure that all logical path are tested.d) Interface Analyzer: It examine the effects of passing variables and data between variables.

Example of Static Testing Tools:

Multi-Language Tools

• CDP (Copy Paste Detector): It analyze duplicate code detection for Java, JSP, C++ and PHP.

• Sonar: It manages unit testing, complexity duplication, comments, coding standards and design for Java, PHP, PL/SQL and flex.

• Yasca Tool: for C,C++,Java, JavaScript, ASP, HTML/CSS

Individual Tools

• DOT NETo Style Cop: It analyzes C# code to enforce a set of style and consistency rules.

o Fx Cop: It is used in Ms.Net program that compile to see CIL(Common Intermediate Language).

SQE-EIT-061-Pradeep-73

Page 74: EIT-061 Software Quality Engineering

• C/C++o Astree: For run time error in C

o BLAST: (Berkley Lazy Abstraction) Software model for C

o SPARSE: It is a tool design to find faults in LINUX Kernel

o CPP Check: It checks the use of Standard Libraries

• Java Toolso Check Style: It can be used to show violation of a configured coding standards.

o SOOT: Language Manipulation and Optimization frame work.

Dynamic Testing Tools:These are of four types:

1. Test Driver: It inputs data into a module under test (MUT).2. Test Beds: Simultaneously display the source code along with the problem under execution.3. Emulators: It is used to emulate the parts of system not yet developed.4. Mutation Analyzer: The error are fed into the source code in order to test fault tolerance of the system.

Note: This Notes are completely design for Class Notes and Semester Preparation purpose by taking references from Various books and Internet.

Prepared BY: Pradeep Sharma.

Information Technology (GCET)

-74