quality with statistics-2.ppt

Senior Management ProgrammeQuality With Statistics

Ahmedabad Electricity Company Ltd.26 – 27 June 2002

Indian Statistical Institute98 sampatrao ColonyBARODA – 390 007

Conducted by:

Quality with Statistics 2

Defining the Ideal Quality Value

Guidelines for scoring

1. Use same time scale:Frequency of

improvement actions <= frequency ofreview <= frequency ofreporting <= frequency ofmeasurement.

2. A factor is measured always if it is measured as frequently as is practicallypossible.

3. Ideal value for each factor need not be 5. For example,scrap% may be measured every hour (always) but the ideal frequency may be every shift (often).

4. A factor which is not measurable (e.g. integrity)gets a score of 1 for all the four actions (M-R-R-I).

M Performance of the listed factor should be measured

R The performance measure should be reported R The management should review the performance

reportsI Improvement actions should stem from the reviewsFactor

M R R I Total Measure Report Review Improve

Total

5 = Always4 = Often3 = Occasionally2 = Rarely1 = Never

1) List the six factors which you believe are the major determinants of quality

2) For each factor, place a rating on the following statements


Guidelines for scoring

1. Use the same 6factors and the

time scale as was used while defining

idealquality value.

2. Real score can be

equal to or more or

less than the idealscore.

M Performance of the listed factor is measured R The performance measure is reported R The management reviews the performance reportsI Improvement actions stem from the reviews

Defining the Real Quality Value

Factor M R R I Total Measure Report Review Improve

Total

5 = Always4 = Often3 = Occasionally2 = Rarely1 = Never

1) List the 6 factors you believe are the major determinants of quality

2) For each factor place a rating on the following statements


The Quality Value GridBehavior Score

Belief Score

0 20 40 60 80 100 1200

20

40

60

80

100

120


Cutting to the Core

Behavior is a function of values

B = f(V)

BehaviorThe way in which a person or group of people responds

ValuesThe complex of beliefs, ideals or standards which characterizes a person or group of people


The Cost of Remaining Average

Waste as a proportion of total sales volume

30%?

Typical Company Your Area?


The Classical View of PerformancePractical Meaning of “99% Good”

20,000 lost articles of mail per hour

Unsafe drinking water almost 15 minutes each day

5000 incorrect surgical operations per week

2 short or long landings at major airports each day

2,00,000 wrong drug prescriptions every year

No electricity for about 7 hours each month


The Need for Knowledge

Knowledge

We don’t know what we don’t knowIf we can’t express what we know in terms of numbers, we really don’t know much about it

“ In God we believe, all else must have data” – Hewlett Packard

The Need

If we don’t know, we can not act

If we can not act, the risk of loss is high

If we know and act, the risk is managed

If we do know and do not act, we deserve the loss


The Role of Questions

Questions lead and answers follow. The same question most often lead to the same answers which invariably produce the same result. To change the result means to change the question.

New measures lead to new questions. [Management needs to focus on new measures like …….. rather than outputs and budgets].

As questions arise, vision emerges, direction becomes apparent and ambiguity diminishes. In turn, people become organized and mobilized to common action.

When people take common action, the organization's ability to survive and prosper will increase, owing to the discovery of answers to problems heretofore not known.

“Insanity is doing the same thing over and over again but expecting different results” – Rita Mae Brown (Author)


The Value of Measurement

Measurement Question Search Knowledge

Improved Measurement

We don’t know what we don’t know

We can’t act on what we don’t know

We won’t know until we search

We won’t search for what we don’t question

We don’t question what we don’t measure

Hence, we just don’t know Mikel J. Harry


The Role of Training

Undoubtedly the most important aspect of Quality is people and their knowledge. Without this golden asset all is for nothing. At the risk of redundancy, you don’t know what you don’t know and if you don’t know something nothing will happen. Obviously the key is knowledge. Successful change can not occur without it.

Today, the best-in-class companies provide a tremendous amount of training and education to their employees. Many of such companies have made significant investments in training, and are discovering the rewards. For example, Motorola Inc. has discovered a 10:1 return on their budget. In fact, they require every employee to receive 40 hours or more of training annually, of which 40% must be in the area of Quality.


What is Quality• Quality means different things to different people. There is no universally accepted definition.

• However, there is a broad agreement on the following–Very difficult to define

–Determined by customer

–Multi dimensional

–Dynamic

–Needs to be TOTAL

• Usually, TOTAL QUALITY refers to the fact that all departments have roles in quality.


ISO 9000:2000 Definition of Quality• “Degree to which a set of inherent characteristics fulfills requirements”

–Requirements are needs or expectations that are stated or implied–Requirements can be generated by different interested parties–Inherent characteristics are the distinguishing features that exist in the product/process/system, specially as a permanent characteristic

• Inherent characteristics are called quality characteristics• Assigned characteristics (e.g. product price) are not quality characteristics Note: This definition is an improvement over its 1994 version.

However, it can still be argued that all inherent characteristics are not quality characteristics.


How to Measure Quality

CustomerSatisfaction

• Appropriateness of requirements

• Degree of conformance to requirements

• Cost of identifying and meeting the requirements

Product Quality

Marketing Quality + Design Quality + Mfg.Quality + …. + Service Quality

=

f


How to Measure Quality (Contd.)• Customer satisfaction can be measured but it is not very useful as a stand-alone measure.

• Establishing the function ‘f’ is a highly challenging task

• Presently, all quality measures (e.g. Defect Rate, Process Capability, Quality Cost, Cycle Time) address only a part of the whole.

• Points to remember– Quality is customer satisfaction but customer satisfaction is not quality

– Reducing internal rejection and rework reduces producer’s cost but not that of the customer


Components of Quality

Main Componen

t

Sub-componen

t

Examples of features

Quality of Design

Product Design

Power rating of an engine Robustness Operating cost Ease of use

Process Design

Rated efficiency Process capability Cycle time Downtime for regulatory inspection

Quality of Conformance

Process Conformance

Process instability Process failures Late deliveries Loss of efficiency/yield

Product Conformance

Field failures Factory scrap and rework Deviation from target Incorrect invoices

Quality of Design

•Decides the level of customer attraction

•Related to market segmentation based on product ‘grade’

•Improving design quality may lead to higher cost but the same need not be the case always.

Quality of Conformance• Refers to the deficiencies resulting from lack of control

• Decides the level of customer dissatisfaction

• Improving quality of conformance always leads to reduction of costs. It is in this sense that Crosby says “quality is free”


Quality with Statistics

Quality

Quality of Design

Quality of Conformance

System Design

Parameter Design

Tolerance Design

Process Monitoring and Adjustment

Problem Solving

Product Disposal

Statistical Tools

Tasks


Quality with Statistics – This Programme

Quality by

Tasks Scope in

AECL

Statistical Tools This Programm

e

Product Design

System Design

Limited* QFD, FMEA, Reliability Engineering

Nil

Parameter

Design

-do- Statistical Designs, S/N Ratio, ANOVA

The concept of robustness

onlyTolerance Design

-do- Statistical Designs, Loss Function, Simulation,

Regression

Nil

Process Design

System design

Limited** Same as those mentioned against product design

PLUS optimization tools for inventory management,

transportation, scheduling etc.

Nil

Parameter

Design

Very High

The concept of robustness

onlyTolerance Design

High Illustration with an

example

* Applicable only for intermediate products and services** Applicable mostly for management and service delivery processes


Quality with Statistics – This Programme (Contd.)

Quality by

Tasks Scope in

AECL

Statistical Tools This Programm

e

Process Conforman

ce

Process Monitorin

g and Adjustme

nt

Very High

Probability Distributions, Control Charts, GR&R Studies,

PCA, Process adjustment methods

Principles and tools of process

monitoring only

Problem Solving

Very High

Simple tools like Histogram and C&E diagram, (Z, t, 2, F)-tests, Advanced tools

PLUS all the tools mentioned above

Concepts, disciplines and simple

tools of problem solving

Product Conforman

ce

Product Disposal

High Bulk Sampling, Acceptance Sampling,

Loss Function

Issues in bulk sampling only

Field Service

High Nil Nil

Chapter 2:

Data and Data Collection


Data• Data are facts or figures related to any characteristic of an individual

Also called a variableA m/c, an year, a casting, a dimension, a person

• Power station outages (up to 31/03/01 since commissioning)

Station Date of commi-ssioning

Avail-ability(%)

No. of outages

Average duration of non-stop operation

(days)

Average loss per outage (hours)

Main cause of outage

Capa-city

utiliza-tion

Forced Planned

C:15 12/11/98 92.59

30 27 64 52 Leakage

High

C:16 10/05/97 93.04

47 28 52 52 Leakage

Mod.

D 12/10/78 88.32

124 58 261 164 Gen* V. Low

E 31/12/84 82.77

116 42 440 158 Gen* Low

F 29/09/88 89.23

82 50 379 79 Gen* High

VARIABLES

INDIVIDUALS * Generator stator / rotor problem


Types of Data/Variable

C on tin u ou s D isc re te

N u m erica l/Q u an tita tive

O rd in a l N om in a l

C a teg orica l/Q u a lita t ive

D ata /V ariab le

• Continuous: An infinite number of values (positive or negative) are possible, e.g. measurements of weight, length, chemical composition.

• Discrete: The variable can take values 0,1,2,3, ….. e.g. count of frequency (# of defects, breakdowns etc.)

• Ordinal: Data classified in ordered categories, e.g. quality of service provided is classified as poor, moderate, good or yearly rainfall classified as very low, low, moderate, good and very good.

• Nominal: Data classified in categories having no inherent or explicit order, e.g. location classified as east, west, north, south or names of departments.


Types of Data - Outage Data Example

Variable Name Variable Type1. Date of commissioning2. Availability (%)3. Number of outages since commissioning4. Average duration of non-stop operation (days)5. Average loss per outage (hours)6. Main cause of outage7. Capacity utilization


Types of Data - Further Considerations

Continuous data may appear as discrete either due to rounding (see the outage data example) or due to measurement limitations. We should treat such data as continuous unless the number of levels in the data set is very few (say 2-4). However, hourly records of steam pressure at turbine inlet (station F) show that the values are either 126 or 127 or 128. Great care must be exercised while analyzing such data.

Discrete data having seven or more levels may be treated as continuous data.

Dichotomous data (O.K/Not O.K, Pass/Fail etc.) may be treated as discrete data after coding the two categories as 1 (O.K) and 0 (Not O.K).

In the field of Quality Control, various types of data are classified as- VARIABLE DATA : Continuous data- ATTRIBUTE DATA: Others - Discrete, Dichotomous, Ordinal and NominalHenceforth we shall use this later classification.


Data Gateway

Problem/Hypothesis

Data

Solution/Fact

DATA COLLECTION

DATA ANALYSIS

• Quality problems can not be solved merely based on experience.• Any claim not backed by data is only a hypothesis.• Data Gates: Quality of the data gates and their placement at appropriate locations of a process are extremely important for process control.• Data Quality: Data collection step is vital – garbage in, garbage out


Data Quality ScaleMost Data are of Poor QualityWhenever you see data, doubt it

Quality category

Impact Example Rank*

Wrong data Misleading information

Cooked data 1

Noisy Data Potentially misleading information

High gauge R&R

2

Irrelevant data Useless information Old data 3Inadequate data

Partial information Small sample 4

Hard data Difficult to process Censored data

5

Redundant data

Useful but adds to cost

Multiple copies

6

Right data Useful and economical

- 7

* Higher the better


Information Content in Data for Process Control

Source of Data Attribute Data

Variable Data

General literature Very low LowPast data: In-house routine Q.C records

Low Moderate

Past Data: Statistically designed experiments

Moderate High

Live data: Passive observation of the process

Moderate High

Live Data: Statistically designed experiments

High Very High

Do not transform variable data to attribute data. That will be like burning diamond for heat.


Data Collection Process

INDIVI-DUALS

VARIABLESVar.

1Var.

2Var.

3. . . Var.

pInd. 1 Data Data Data DataInd. 2 Data Data Data DataInd. 3 Data Data Data Data. . . . .. . . . .Ind. n Data Data Data Data

Population . . Sample

Measurement . .

Recording

Editing, Storage, Retrieval


Linking Data Quality to Data Collection Process

Process Elements

Wrong

Noisy Irrelevant

Inadequat

e

Hard

Redundant

Population

Individual

Issues related to data base

mgmt.

Variables

Sample

Procedure

Size

Measuremen

t

Gauge Appraise

r

Others

Recording

Format Recorder

Editing, Storage, Retrieval


Poor Data Quality- Cause and Effect Diagram

Population

Sample

Measurement

Recording

Editing, storage, retrieval

Individual

Variable

SizeSamplingMethod

Gauge

Appraiser

Measurand

Method

Recorder

Format

OperatorHardwar

e

SoftwareData base

Mgmt. policy

PoorDataQuality

Note: Due to limitations of space, only the main sub-causes are shown in the CE diagram.


Measurement Related Causes forPoor Data Quality

Calibration

Status

Not done

Done long back

ResultsNot used

Not traceable

Number

ManyVariable least count

Different makes

Capability

Operating range

Beyond limit

Type of dataUnwante

d

Lowrepeatability

Low least count

Precision

Operation

Malfunctioning

Breakdown

Gauges

Bias

Inadvertent error

Number

Reproducibility

Appraisers

Measurand

Unstable

Inhomogeneous

Method

Standard procedureNot

availableNot followedCommunicatio

n

PoorDataQuality

Measurement


Data Collection Planning- Principle of Inverse Loading

The Planning QuestionsPlan1) What do you want to know?2) How do you want to see what it is that you need to know?3) What type of tool will generate what it is that you need to see?4) What type of data is required of the selected tool?

5) Where can you get the required type of data?

....

.. . ..

..

X1 X2 X3Y11 Y21 Y31. . .Y1n Y2p Y3q

X YX1 Y1. .Xn Yn

Has X any effect on Y?

Execute

. .... ....

Histogram

Scatter diagram

Final inspection and production log book

Nowhere- to be collected

Illustration

Y X

YX1 X2 X3


Data Collection ToolsForegoing discussion indicates that collection of right data, by no means, is a trivial task. One can go wrong in various ways at different stages of the data collection process.The two basic requirements for data collection are Clarity of purpose Use of a structured approach

Commonly used data collection tools, that satisfies the two requirements are Check Sheet Data Sheet

Check Sheet: Checks (/, , x etc.) are made against a category of a variable or combination of categories of several variables. Used primarily for collecting attribute data.Data Sheet: Measurement results are recorded against an individual and its characteristics. Used for collecting both attribute and variable data.Many consider all check sheets as data sheets and vice versa. However, we shall distinguish between the two as above.


Process Distribution Check Sheet

Power Generation Process (Moving Target)Month:

September Process average (Y1 bar): 420 MW

Characteristic: Y1= Total generation (MW), Y2= System demandSampling interval: Every 3.5

hoursTarget: Min(420, Y1)

Data: Target - Y1 barClass

IntervalCheck Fr

q<-54.99 5

-54.99 to –44.99

7

-44.99 to –34.99

5

-34.99 to –24.99

5

-24.99 to –14.99

5

-14.99 to –04.99

8

-04.99 to 05.01

126

05.01 to 15.01

16

15.01 t0 25.01

2

25.01 to 35.01

10

35.01 to 45.01

11

45.01 to 55.01

2

> 55.01 4

Total No. of observations: 206

Import limit = +20

Export limit = -10

Wasteful importdue to lack of

control

Wasteful exportdue to lack of

control

Defect rate = 27 %


Causes for Wasteful Import of Power

0.05.0

10.015.020.025.030.035.0

1

104

207

310

413

516

619

722

825

928

1031

1134

1237

1340

Run Chart of half-hourly readings of generation at station C15 in September

2001

A

B CD

A: Process failure B: Process deficiency C: Early slow down D: Late pick up


Defect Cause Check Sheet

StationDefect C15 C16 D E F Tota

lProcess failure

52

Process deficiency

81

Early slowdown

15

Late pick up

34

Total 54 22 65 21 20 182

Month: September, 2001 Data: # of hours of generation affected

Note: Criticality of the defects is not same over all stations


Identifying Critical Causesfor Wasteful Import

C15

C16

D E F

PF 30 0 15 7 0PD 11 9 36 14 11ES 2 2 9 0 2LP 11 11 5 0 7

C15 C16 D E FPF 29.

029.5

107.0 103.5 110.0

PD 5 2 10 5 5ES 10 4 30 - 15LP 10 4 30 - 15

C15 C16 D E F Total

PF 870 0 1605

725 0 3200

PD 55 18 360 70 55 558ES 20 8 270 0 30 328LP 110 44 150 0 105 409

Total

1055

70 2385

795 190 4495

Hours of low generation

Average generation loss at each instant

Total generation loss (MWH)

Х

=

PF= Process failurePD= Process deficiencyES= Early slow downLP= Late pick up


Other Types of Check SheetsDefective item check sheet

Checks are made against various causes of rejection/rework of an item.Defect location check sheet

Instead of a table a diagram is made of the defect space. Checks are made at the location where defect occurs. Locational segregation of defects, if any, provides valuable clue.

Leakage in a cooling system Cracks in castings Wear out of moving parts

Check-up confirmation check sheet Used to make a comprehensive check-up of product/process quality (usually

at the final stage). Preprinted items of checks avoids duplication and missing of tests to be

performed. It is a variation of check list, which is used for checking if all the tasks have

been performed or not.C-E diagram check sheet

Checks are made against the cause of a problem in the C-E diagram.


Data Sheet – General FormatTitle

Common relevant information

Individual

Var. 1

Var. 2

Var. p Remark

Ind. 1Ind. 2

Ind. n Important summary of dataNotes:


Data Sheet - ExampleUp-load detention report for the month of July, 2001Rak

eN0.

Date

Arrivaltime

Qua

lity

# ofwagons

Form

date

Form

time

Depart.

date

Depart.

time

Deten.

hours

Demur.

hours

Reason

Actual unloading time -

Hr.01 01 19.45 Envi

ro58 02 05.3

502 15.30 09.55 - - 09.00

. . . . . . . . . . . . .20 14 07.50 Du.

hill58 15 16.4

516 00.20 07.35 23 S(19)

+I(4)14.30

. . . . . . . . . . . . .42 31 20.20 . . . . . . . . . 14.45

Purpose? Estimation of demurrage

hoursControl of demurrage hoursImportant reasons cited are receipt in quick succession, successive

detentions and wet coal. These are beyond the control of the coal handling section. Inadequate

Data!

Chapter 3:

Summarization of Data


Data Analysis – Getting Started

102.8 105.2 103.2 104.0 105.2 104.8 105.6 105.0 105.0 104.0 104.0 105.2 106.0 106.4 103.2 104.2 102.0 103.6 103.8 105.0 105.2 105.2 106.0 105.0 103.0 103.2 103.0 103.0 104.2 105.8 105.4 104.8 104.8 105.2 105.2 106.0 104.0 104.2 103.8 104.4 104.0 102.2 103.4 104.4 104.4 104.2 104.8 106.2 106.4 104.8 102.8 103.6 104.8 104.4 104.8 104.0 104.0 104.0 104.0 104.0 104.4 104.0 102.6 103.0 104.8 102.8 104.0 103.4 103.6 104.0 104.0 103.4 106.0 104.4 104.4 102.4 102.8 105.0 105.2 105.2

Hours Generation (MW)10.00 – 13.3014.00 – 17.3018.00 – 21.3022.00 – 01.3002.00 – 05.3006.00 – 09.3010.00 – 13.3014.00 – 17.3018.00 – 21.3022.00 – 01.30

Half-hourly record of generation by station ‘E’ during 19/9/01 (10 hrs.) to 21/9/01 (1.30 hrs.) under normal operating condition

What are your conclusions?


Frequency Distribution- Analyzing a large data set on the same variable

Class Interval Tally Frequency101.7 – 102.3 02102.3 – 102.9 06102.9 – 103.5 10103.5 – 104.1 19104.1 – 104.7 11104.7 – 105.3 22105.3 – 105.9 03105.9 – 106.5 07 Total 80

Generation data set (previous slide)The eighty observations are grouped in eight classes of

equal length

Does the frequency distribution provide better insight into the process?

DATA + ANALYSIS = INFORMATION

Data are not information


Constructing Frequency Distributions- Variable Data

Data set Number of observations (N): About 100 on the same variable.

Formation of the classes (first column) Number of classes (k)

Too many classes obscure the pattern of the distribution due to sampling fluctuations. Details are lost with too few classes. Optimum number of classes is given by k = 1 + 3.3 log10 (N)

The simpler formula k = N also works well in practice. For better visual impact, it is preferable to have 5 k 12. For the generation data set we have N = 80. Therefore, k

= 1+3.3*log(80) = 7.3. This means the number of classes should be either 7 or 8. We have chosen 7 classes.


Constructing Frequency Distributions (..contd.)

Class width (h) h = (R + w) / k where R = Range of the observations = Maximum – Minimum and w = Least count of measurement. Next, h is rounded to the nearest integer multiple of w. This

means, if the least unit of measurement (w) is 0.1, then h = 2.312 should be rounded to 2.3. However, if w = 0.2, then the same h should be rounded to 2.4.

In our generation data example, R = 106.4 – 102.0 = 4.4, and w = 0.2. Thus, h = (4.4+0.2) / 7 = 0.657, which is rounded to 0.6. We shall explain later, why taking h = 0.7 will be erroneous.

Note that if h is rounded down then we shall need (k+1) classes to cover the whole range of the observations. How many classes shall we need if h is rounded up?


Constructing Frequency Distributions (..Contd.)

Class limits The minimum value of the generation data is 102.0 and the class

width has been determined as 0.6. So we can form the classes as 102.0 – 102.6, 102.7 – 103.3, 103.4 – 103.9, . . .

The problem with the above classification is that there is a gap between two successive class intervals. This is not desirable since we are dealing with continuous data.

Discontinuity can be removed by forming the classes as 102.0 – 102.6, 102.6 – 103.2, 103.2 – 103.8, . . . However, this classification has another problem. Suppose we have

an observation 102.6. In which class shall we place it, first or second?

In order to avoid such confusion we take Lower limit of the first class = Minimum – w/2 and then successively add the class width to this lower limit to

obtain the other class limits.



Class limits (..Contd.) Thus, for the generation data we have the classes as 101.9 – 102.5 102.5 – 103.1 103.1 – 103.7 103.7 – 104.3 104.3 – 104.9 104.9 – 105.5 105.5 – 106.1 106.1 – 106.7

Note that now we have - 8 classes (since h has been rounded down from 0.657 to 0.6) - no confusion in classification (since there are no observations

which fall on the class limits) and - an extended last class (ideally the upper limit of the last class

should have been 106.5).

In the example, we have extended the first class instead of the last one since this has brought out the process abnormalities better. Thus the eight classes used are

101.7 – 102.3, 102.3 – 102.8, … , 105.9 – 106.5



Tally marking (second column) Start with the first observation. Find the class to which the observation

belongs. Put a tally against the class. Classify all the remaining observations as above. Tally marks are grouped in five, with the fifth tally crossed through the

previous four tallies. This provides a better visual display and helps in counting the frequency of each class.

Note that all the above observations get classified as we go through the observations only once. However, if we concentrate on a class and then try to find out the number of observations in the class then we have to go through the observations k times. This not only consumes more time but also increases the chance of committing error.

Counting frequency (third column) The frequency (f) of each class is obtained simply by counting the

tallies.Other columns Columns giving cumulative frequency (f1, f1+f2, ..) and relative

frequency (f1/N, f2/N, ..) may also be added, if required.


Constructing Frequency Distributions- Getting the class intervals right

Why class width (h) is rounded to nearest integer multiple of w Consider the same generation data example. Here w=0.2. Assume that

h = 0.657 is rounded to 0.7 (which is not an integer multiple of 0.2) instead of 0.6. Thus the classes will be 101.9 – 102.6, 102.6 – 103.3, ..

Now in order to overcome the problem of classifying observations like 102.6, we are forced to consider w=0.1 and have the classes as

101.95 – 102.65, 102.65 – 103.35, 103.35 – 104.05, 104.05 – 104.75, 104.75 – 105.45, 105.45 – 106.15, 106.15 – 106.85

Note that the number of observation units covered by each class are not same. For example, the second class covers three units (102.8, 103.0 and 103.2) but the third class covers four units (103.4, 103.6, 103.8 and 104.0). As a result the frequency distribution is likely to show many peaks.

Balancing end points Assuming w=0.1, the seven classes shown above should be

appropriate. However, note that the last class is extended by four units beyond the maximum observed value of 106.4. It is desirable to distribute this imbalance to the two end classes by starting the first class from 101.75 and ending at 106.65.


Frequency Distribution of The Generation Data – Further analysis The frequency distribution shows an abnormal pattern (nearly alternative peaks). Does this mean the process mean is jumping randomly by about 1.2 unit? Following two frequency distributions constructed out of the same data provide some additional clues.Fractional part Frequenc

y.0 27.2 18.4 15.6 5.8 15Total 80

Noisy Data !!

Class interval Frequency101.7 – 102.7 04102.7 – 103.7 17103.7 – 104.7 26104.7 – 105.7 25105.7 – 106.7 08

Total 80 0’s occur more frequently at the cost of 6’s. Does this indicate measurement bias?

Smooth pattern (left skewed). Smoothness has been achieved not only by reducing the number of classes but also by including the adjacent 0’s and 6’s in the same interval.


Histogram Histogram is a graphical representation of a frequency distribution of variable data. The histogram of the generation data having five classes is shown below.

101.7 103.7 105.7 Generation in ‘E’ station (MW)

05

1015

20 25

Freq

uenc

y

30 Bars of equal width (= class width) Heights of the bars are proportional to the frequencies of the classes Bar width of about 1 cm. (7-10 classes) Horizontal axis is about 1.6 times longer than the vertical axis Central tendency: About 104.2. Pattern of variation: Slightly left skewed

Specification limits: Should be shown wherever applicable. Class mid-point: Marking the class mid-points may be helpful in certain cases. Open ended classes: Avoid adding too many classes at the ends having zero or very low frequencies. Shown as open ended bars with arbitrarily reduced heights.


Construction of Histogram- An exercise

Half-hourly record of power (MW) generated by station ‘E’ during 29.9.2001 (10.00 hours) to 30.9.2001 (24.00 hours) gives us the following data. 6.4 6.4 6.8 6.0 5.2 4.8 6.4 4.4 5.2

6.07.6 8.0 7.4 6.6 8.0 5.6 7.2 7.2 7.0 4.06.4 8.0 8.0 6.0 6.0 6.4 7.8 7.6 7.6 7.47.6 7.6 7.4 4.6 4.2 4.8 6.0 5.6 5.4 5.06.2 7.8 7.4 7.2 7.4 7.8 6.6 6.4 6.8 6.86.8 6.8 6.6 6.8 6.6 6.8 6.8 6.8 7.0 7.06.0 5.6 4.4 4.6 4.6 4.8 6.2 7.0 6.6 6.45.2 5.2 7.2 7.4 6.0 5.0 7.0 7.6 7.6 7.45.2 7.2 7.2 7.0 7.2 6.8 6.0 6.0 6.0 5.2

Construct a histogram of the above data set. Compare with the histogram for the period 19.9.01 to 21.9.01 ( previous slide) and offer your comments.

29/9 (10 hrs.)

30/9(24 hrs.)


Commonly Observed Histogram Patterns

Single peak, symmetric, bell shaped, commonly observed pattern of a stable process

Single peak, positively skewed (long tail on the right)

Single peak, negatively skewed (Long tail on the left)

Many characteristics follow such patterns. We have already seen that generation data is negatively skewed while breakdown data is positively skewed. However such shapes may also indicate process instability.

LSL USL

Single peak, thick tail

Two peaks (bi-modal)

How?

How?


Frequency Distribution of Discrete Data

Number of plant outages in each year since commissioningStation

Period Type of outage

# of outages in a year

D 1978-79To

2000-01

Forced 2, 3, 1, 0, 3, 2, 1, 0, 2, 2, 0, 2, 3, 0, 2, 1, 2, 1, 1, 0, 1, 0, 2

Planned 3, 5, 1, 4, 2, 5, 2, 1, 6, 3, 7, 7, 4, 7, 6, 5, 6, 4, 2, 2, 2, 6, 2

E 1985-86To

2000-01

Forced 2, 2, 5, 3, 0, 0, 1, 0, 1, 0, 2, 1, 1, 0, 1, 4Planned 15, 7, 8, 3, 7, 5, 2, 6, 3, 8, 7, 4, 5, 4, 3, 4

F 1988-89To

2000-01

Forced 4, 1, 1, 0, 0, 1, 1, 2, 0, 1, 0, 1, 6Planned 3, 11, 6, 12, 4, 0, 1, 2, 8, 2, 4, 4, 6 Ideally we should construct six frequency distributions (for each type

of outage in each station). However, due to shortage of data we shall construct only two - one for forced outage and the other for planned outage. What can you say about the occurrence of two types of outages from the above data set?


48

12

16

0 1 2 3 4 5 6Number of outages

Freq

uenc

y 48

12

16

0 1 2 3 4 5 6 7 8 9 11,12,15 Number of

outagesFr

eque

ncy

Distribution of number of yearly outages of stations ‘D’, ‘E’ and ‘F’ since commissioning

Forced outages (Line graph)

Planned outages (Bar graph)

Line graph is showing the frequencies of individual outcomes. Bar graph is similar to the histogram. But there are gaps between the bars since we are dealing with discrete (attribute) data. Planned outages occur more frequently than forced outages. Number of planned outages is uniformly distributed between 2 and 7 with very few outages outside this band. Such a pattern is somewhat odd. Planned outages need to be defined properly. Do we undertake unnecessary planned outages?

Line Graph and Bar Graph


Measures of Central Tendency – The Typical value

Mean

• Most effective measure for numerical data. Let {X1, X2, … , XN-1, XN} be the data set. Then Mean = X = (X1 + X2 + ………+ XN) /N = Xi / N• May be used for ordinal data but not for nominal data• Sensitive to extreme values

Median

• Ordinal data: Category containing the (N+1)/2 case• Numerical data: (N+1)/2 th ordered observation, when N is odd and average of N/2 th and (N/2)+1 th ordered observations, when N is even.• Can be computed even for open ended classes at the extremes provided each of the end classes contain less than 50% of the observations.• Insensitive to outliers.

Mode

• Category or the value occurring with greatest frequency• Only measure of center for nominal data • May not be unique and highly sensitive to how the classes or categories are formed.


Interpretation of MeanIn a rising voltage test the alternating breakdown voltage(kV) of 24 samples of an insulation arrangement were found to be as follows:210; 208; 208; 175; 182; 206; 190; 194; 198; 205; 212; 200; 205;202; 207; 210; 202; 201; 188; 205; 209; 201; 216; 196

170 180 190 210 220

MEAN = [210 + 208 + … + 216 + 196] / 24 = 201.25 kV

Mean

DotPlot

• Mean is the balance point (or fulcrum) for the distribution of the values • Mean is analogous to centre of gravity• Sum of negative deviations from mean exactly equals the sum of positive deviations. Thus the total sum of the deviations from mean is always zero• In the above example, the mean should be interpreted as a measure of centre and not that of central tendency or typical value


Data Analysis – Getting Started

Reportable accidents (#) in AEC Ltd., Sabarmati

during 1995-2000Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Total

1995 24 17 27 19 10 25 19 22 23 16 18 15 235

1996 22 10 22 18 16 21 21 20 21 18 18 18 225

1997 19 14 12 15 15 9 15 24 19 16 14 19 192

1998 14 14 12 20 19 23 10 16 13 15 17 19 192

1999 19 13 15 13 16 18 17 16 20 17 13 16 193

2000 12 14 15 22 12 4 6 13 7 9 7 13 134

Total

110 82 103 108 88 100 88 111 103 91 87 100 1171

What are your conclusions?

quality with statistics-2.ppt

Documents