quality with statistics-2.ppt
TRANSCRIPT
Senior Management ProgrammeQuality With Statistics
Ahmedabad Electricity Company Ltd.26 – 27 June 2002
Indian Statistical Institute98 sampatrao ColonyBARODA – 390 007
Conducted by:
Quality with Statistics 2
Defining the Ideal Quality Value
Guidelines for scoring
1. Use same time scale:Frequency of
improvement actions <= frequency ofreview <= frequency ofreporting <= frequency ofmeasurement.
2. A factor is measured always if it is measured as frequently as is practicallypossible.
3. Ideal value for each factor need not be 5. For example,scrap% may be measured every hour (always) but the ideal frequency may be every shift (often).
4. A factor which is not measurable (e.g. integrity)gets a score of 1 for all the four actions (M-R-R-I).
M Performance of the listed factor should be measured
R The performance measure should be reported R The management should review the performance
reportsI Improvement actions should stem from the reviewsFactor
M R R I Total Measure Report Review Improve
Total
5 = Always4 = Often3 = Occasionally2 = Rarely1 = Never
1) List the six factors which you believe are the major determinants of quality
2) For each factor, place a rating on the following statements
Quality with Statistics 3
Guidelines for scoring
1. Use the same 6factors and the
time scale as was used while defining
idealquality value.
2. Real score can be
equal to or more or
less than the idealscore.
M Performance of the listed factor is measured R The performance measure is reported R The management reviews the performance reportsI Improvement actions stem from the reviews
Defining the Real Quality Value
Factor M R R I Total Measure Report Review Improve
Total
5 = Always4 = Often3 = Occasionally2 = Rarely1 = Never
1) List the 6 factors you believe are the major determinants of quality
2) For each factor place a rating on the following statements
Quality with Statistics 4
The Quality Value GridBehavior Score
Belief Score
0 20 40 60 80 100 1200
20
40
60
80
100
120
Quality with Statistics 5
Cutting to the Core
Behavior is a function of values
B = f(V)
BehaviorThe way in which a person or group of people responds
ValuesThe complex of beliefs, ideals or standards which characterizes a person or group of people
Quality with Statistics 6
The Cost of Remaining Average
Waste as a proportion of total sales volume
30%?
Typical Company Your Area?
Quality with Statistics 7
The Classical View of PerformancePractical Meaning of “99% Good”
20,000 lost articles of mail per hour
Unsafe drinking water almost 15 minutes each day
5000 incorrect surgical operations per week
2 short or long landings at major airports each day
2,00,000 wrong drug prescriptions every year
No electricity for about 7 hours each month
Quality with Statistics 8
The Need for Knowledge
Knowledge
We don’t know what we don’t knowIf we can’t express what we know in terms of numbers, we really don’t know much about it
“ In God we believe, all else must have data” – Hewlett Packard
The Need
If we don’t know, we can not act
If we can not act, the risk of loss is high
If we know and act, the risk is managed
If we do know and do not act, we deserve the loss
Quality with Statistics 9
The Role of Questions
Questions lead and answers follow. The same question most often lead to the same answers which invariably produce the same result. To change the result means to change the question.
New measures lead to new questions. [Management needs to focus on new measures like …….. rather than outputs and budgets].
As questions arise, vision emerges, direction becomes apparent and ambiguity diminishes. In turn, people become organized and mobilized to common action.
When people take common action, the organization's ability to survive and prosper will increase, owing to the discovery of answers to problems heretofore not known.
“Insanity is doing the same thing over and over again but expecting different results” – Rita Mae Brown (Author)
Quality with Statistics 10
The Value of Measurement
Measurement Question Search Knowledge
Improved Measurement
We don’t know what we don’t know
We can’t act on what we don’t know
We won’t know until we search
We won’t search for what we don’t question
We don’t question what we don’t measure
Hence, we just don’t know Mikel J. Harry
Quality with Statistics 11
The Role of Training
Undoubtedly the most important aspect of Quality is people and their knowledge. Without this golden asset all is for nothing. At the risk of redundancy, you don’t know what you don’t know and if you don’t know something nothing will happen. Obviously the key is knowledge. Successful change can not occur without it.
Today, the best-in-class companies provide a tremendous amount of training and education to their employees. Many of such companies have made significant investments in training, and are discovering the rewards. For example, Motorola Inc. has discovered a 10:1 return on their budget. In fact, they require every employee to receive 40 hours or more of training annually, of which 40% must be in the area of Quality.
Quality with Statistics 12
What is Quality• Quality means different things to different people. There is no universally accepted definition.
• However, there is a broad agreement on the following–Very difficult to define
–Determined by customer
–Multi dimensional
–Dynamic
–Needs to be TOTAL
• Usually, TOTAL QUALITY refers to the fact that all departments have roles in quality.
Quality with Statistics 13
ISO 9000:2000 Definition of Quality• “Degree to which a set of inherent characteristics fulfills requirements”
–Requirements are needs or expectations that are stated or implied–Requirements can be generated by different interested parties–Inherent characteristics are the distinguishing features that exist in the product/process/system, specially as a permanent characteristic
• Inherent characteristics are called quality characteristics• Assigned characteristics (e.g. product price) are not quality characteristics Note: This definition is an improvement over its 1994 version.
However, it can still be argued that all inherent characteristics are not quality characteristics.
Quality with Statistics 14
How to Measure Quality
CustomerSatisfaction
• Appropriateness of requirements
• Degree of conformance to requirements
• Cost of identifying and meeting the requirements
Product Quality
Marketing Quality + Design Quality + Mfg.Quality + …. + Service Quality
=
f
Quality with Statistics 15
How to Measure Quality (Contd.)• Customer satisfaction can be measured but it is not very useful as a stand-alone measure.
• Establishing the function ‘f’ is a highly challenging task
• Presently, all quality measures (e.g. Defect Rate, Process Capability, Quality Cost, Cycle Time) address only a part of the whole.
• Points to remember– Quality is customer satisfaction but customer satisfaction is not quality
– Reducing internal rejection and rework reduces producer’s cost but not that of the customer
Quality with Statistics 16
Components of Quality
Main Componen
t
Sub-componen
t
Examples of features
Quality of Design
Product Design
Power rating of an engine Robustness Operating cost Ease of use
Process Design
Rated efficiency Process capability Cycle time Downtime for regulatory inspection
Quality of Conformance
Process Conformance
Process instability Process failures Late deliveries Loss of efficiency/yield
Product Conformance
Field failures Factory scrap and rework Deviation from target Incorrect invoices
Quality of Design
•Decides the level of customer attraction
•Related to market segmentation based on product ‘grade’
•Improving design quality may lead to higher cost but the same need not be the case always.
Quality of Conformance• Refers to the deficiencies resulting from lack of control
• Decides the level of customer dissatisfaction
• Improving quality of conformance always leads to reduction of costs. It is in this sense that Crosby says “quality is free”
Quality with Statistics 17
Quality with Statistics
Quality
Quality of Design
Quality of Conformance
System Design
Parameter Design
Tolerance Design
Process Monitoring and Adjustment
Problem Solving
Product Disposal
Statistical Tools
Tasks
Quality with Statistics 18
Quality with Statistics – This Programme
Quality by
Tasks Scope in
AECL
Statistical Tools This Programm
e
Product Design
System Design
Limited* QFD, FMEA, Reliability Engineering
Nil
Parameter
Design
-do- Statistical Designs, S/N Ratio, ANOVA
The concept of robustness
onlyTolerance Design
-do- Statistical Designs, Loss Function, Simulation,
Regression
Nil
Process Design
System design
Limited** Same as those mentioned against product design
PLUS optimization tools for inventory management,
transportation, scheduling etc.
Nil
Parameter
Design
Very High
The concept of robustness
onlyTolerance Design
High Illustration with an
example
* Applicable only for intermediate products and services** Applicable mostly for management and service delivery processes
Quality with Statistics 19
Quality with Statistics – This Programme (Contd.)
Quality by
Tasks Scope in
AECL
Statistical Tools This Programm
e
Process Conforman
ce
Process Monitorin
g and Adjustme
nt
Very High
Probability Distributions, Control Charts, GR&R Studies,
PCA, Process adjustment methods
Principles and tools of process
monitoring only
Problem Solving
Very High
Simple tools like Histogram and C&E diagram, (Z, t, 2, F)-tests, Advanced tools
PLUS all the tools mentioned above
Concepts, disciplines and simple
tools of problem solving
Product Conforman
ce
Product Disposal
High Bulk Sampling, Acceptance Sampling,
Loss Function
Issues in bulk sampling only
Field Service
High Nil Nil
Chapter 2:
Data and Data Collection
Quality with Statistics 21
Data• Data are facts or figures related to any characteristic of an individual
Also called a variableA m/c, an year, a casting, a dimension, a person
• Power station outages (up to 31/03/01 since commissioning)
Station Date of commi-ssioning
Avail-ability(%)
No. of outages
Average duration of non-stop operation
(days)
Average loss per outage (hours)
Main cause of outage
Capa-city
utiliza-tion
Forced Planned
C:15 12/11/98 92.59
30 27 64 52 Leakage
High
C:16 10/05/97 93.04
47 28 52 52 Leakage
Mod.
D 12/10/78 88.32
124 58 261 164 Gen* V. Low
E 31/12/84 82.77
116 42 440 158 Gen* Low
F 29/09/88 89.23
82 50 379 79 Gen* High
VARIABLES
INDIVIDUALS * Generator stator / rotor problem
Quality with Statistics 22
Types of Data/Variable
C on tin u ou s D isc re te
N u m erica l/Q u an tita tive
O rd in a l N om in a l
C a teg orica l/Q u a lita t ive
D ata /V ariab le
• Continuous: An infinite number of values (positive or negative) are possible, e.g. measurements of weight, length, chemical composition.
• Discrete: The variable can take values 0,1,2,3, ….. e.g. count of frequency (# of defects, breakdowns etc.)
• Ordinal: Data classified in ordered categories, e.g. quality of service provided is classified as poor, moderate, good or yearly rainfall classified as very low, low, moderate, good and very good.
• Nominal: Data classified in categories having no inherent or explicit order, e.g. location classified as east, west, north, south or names of departments.
Quality with Statistics 23
Types of Data - Outage Data Example
Variable Name Variable Type1. Date of commissioning2. Availability (%)3. Number of outages since commissioning4. Average duration of non-stop operation (days)5. Average loss per outage (hours)6. Main cause of outage7. Capacity utilization
Quality with Statistics 24
Types of Data - Further Considerations
Continuous data may appear as discrete either due to rounding (see the outage data example) or due to measurement limitations. We should treat such data as continuous unless the number of levels in the data set is very few (say 2-4). However, hourly records of steam pressure at turbine inlet (station F) show that the values are either 126 or 127 or 128. Great care must be exercised while analyzing such data.
Discrete data having seven or more levels may be treated as continuous data.
Dichotomous data (O.K/Not O.K, Pass/Fail etc.) may be treated as discrete data after coding the two categories as 1 (O.K) and 0 (Not O.K).
In the field of Quality Control, various types of data are classified as- VARIABLE DATA : Continuous data- ATTRIBUTE DATA: Others - Discrete, Dichotomous, Ordinal and NominalHenceforth we shall use this later classification.
Quality with Statistics 25
Data Gateway
Problem/Hypothesis
Data
Solution/Fact
DATA COLLECTION
DATA ANALYSIS
• Quality problems can not be solved merely based on experience.• Any claim not backed by data is only a hypothesis.• Data Gates: Quality of the data gates and their placement at appropriate locations of a process are extremely important for process control.• Data Quality: Data collection step is vital – garbage in, garbage out
Quality with Statistics 26
Data Quality ScaleMost Data are of Poor QualityWhenever you see data, doubt it
Quality category
Impact Example Rank*
Wrong data Misleading information
Cooked data 1
Noisy Data Potentially misleading information
High gauge R&R
2
Irrelevant data Useless information Old data 3Inadequate data
Partial information Small sample 4
Hard data Difficult to process Censored data
5
Redundant data
Useful but adds to cost
Multiple copies
6
Right data Useful and economical
- 7
* Higher the better
Quality with Statistics 27
Information Content in Data for Process Control
Source of Data Attribute Data
Variable Data
General literature Very low LowPast data: In-house routine Q.C records
Low Moderate
Past Data: Statistically designed experiments
Moderate High
Live data: Passive observation of the process
Moderate High
Live Data: Statistically designed experiments
High Very High
Do not transform variable data to attribute data. That will be like burning diamond for heat.
Quality with Statistics 28
Data Collection Process
INDIVI-DUALS
VARIABLESVar.
1Var.
2Var.
3. . . Var.
pInd. 1 Data Data Data DataInd. 2 Data Data Data DataInd. 3 Data Data Data Data. . . . .. . . . .Ind. n Data Data Data Data
Population . . Sample
Measurement . .
Recording
Editing, Storage, Retrieval
Quality with Statistics 29
Linking Data Quality to Data Collection Process
Process Elements
Wrong
Noisy Irrelevant
Inadequat
e
Hard
Redundant
Population
Individual
Issues related to data base
mgmt.
Variables
Sample
Procedure
Size
Measuremen
t
Gauge Appraise
r
Others
Recording
Format Recorder
Editing, Storage, Retrieval
Quality with Statistics 30
Poor Data Quality- Cause and Effect Diagram
Population
Sample
Measurement
Recording
Editing, storage, retrieval
Individual
Variable
SizeSamplingMethod
Gauge
Appraiser
Measurand
Method
Recorder
Format
OperatorHardwar
e
SoftwareData base
Mgmt. policy
PoorDataQuality
Note: Due to limitations of space, only the main sub-causes are shown in the CE diagram.
Quality with Statistics 31
Measurement Related Causes forPoor Data Quality
Calibration
Status
Not done
Done long back
ResultsNot used
Not traceable
Number
ManyVariable least count
Different makes
Capability
Operating range
Beyond limit
Type of dataUnwante
d
Lowrepeatability
Low least count
Precision
Operation
Malfunctioning
Breakdown
Gauges
Bias
Inadvertent error
Number
Reproducibility
Appraisers
Measurand
Unstable
Inhomogeneous
Method
Standard procedureNot
availableNot followedCommunicatio
n
PoorDataQuality
Measurement
Quality with Statistics 32
Data Collection Planning- Principle of Inverse Loading
The Planning QuestionsPlan1) What do you want to know?2) How do you want to see what it is that you need to know?3) What type of tool will generate what it is that you need to see?4) What type of data is required of the selected tool?
5) Where can you get the required type of data?
....
.. . ..
..
X1 X2 X3Y11 Y21 Y31. . .Y1n Y2p Y3q
X YX1 Y1. .Xn Yn
Has X any effect on Y?
Execute
. .... ....
Histogram
Scatter diagram
Final inspection and production log book
Nowhere- to be collected
Illustration
Y X
YX1 X2 X3
Quality with Statistics 33
Data Collection ToolsForegoing discussion indicates that collection of right data, by no means, is a trivial task. One can go wrong in various ways at different stages of the data collection process.The two basic requirements for data collection are Clarity of purpose Use of a structured approach
Commonly used data collection tools, that satisfies the two requirements are Check Sheet Data Sheet
Check Sheet: Checks (/, , x etc.) are made against a category of a variable or combination of categories of several variables. Used primarily for collecting attribute data.Data Sheet: Measurement results are recorded against an individual and its characteristics. Used for collecting both attribute and variable data.Many consider all check sheets as data sheets and vice versa. However, we shall distinguish between the two as above.
Quality with Statistics 34
Process Distribution Check Sheet
Power Generation Process (Moving Target)Month:
September Process average (Y1 bar): 420 MW
Characteristic: Y1= Total generation (MW), Y2= System demandSampling interval: Every 3.5
hoursTarget: Min(420, Y1)
Data: Target - Y1 barClass
IntervalCheck Fr
q<-54.99 5
-54.99 to –44.99
7
-44.99 to –34.99
5
-34.99 to –24.99
5
-24.99 to –14.99
5
-14.99 to –04.99
8
-04.99 to 05.01
126
05.01 to 15.01
16
15.01 t0 25.01
2
25.01 to 35.01
10
35.01 to 45.01
11
45.01 to 55.01
2
> 55.01 4
Total No. of observations: 206
Import limit = +20
Export limit = -10
Wasteful importdue to lack of
control
Wasteful exportdue to lack of
control
Defect rate = 27 %
Quality with Statistics 35
Causes for Wasteful Import of Power
0.05.0
10.015.020.025.030.035.0
1
104
207
310
413
516
619
722
825
928
1031
1134
1237
1340
Run Chart of half-hourly readings of generation at station C15 in September
2001
A
B CD
A: Process failure B: Process deficiency C: Early slow down D: Late pick up
Quality with Statistics 36
Defect Cause Check Sheet
StationDefect C15 C16 D E F Tota
lProcess failure
52
Process deficiency
81
Early slowdown
15
Late pick up
34
Total 54 22 65 21 20 182
Month: September, 2001 Data: # of hours of generation affected
Note: Criticality of the defects is not same over all stations
Quality with Statistics 37
Identifying Critical Causesfor Wasteful Import
C15
C16
D E F
PF 30 0 15 7 0PD 11 9 36 14 11ES 2 2 9 0 2LP 11 11 5 0 7
C15 C16 D E FPF 29.
029.5
107.0 103.5 110.0
PD 5 2 10 5 5ES 10 4 30 - 15LP 10 4 30 - 15
C15 C16 D E F Total
PF 870 0 1605
725 0 3200
PD 55 18 360 70 55 558ES 20 8 270 0 30 328LP 110 44 150 0 105 409
Total
1055
70 2385
795 190 4495
Hours of low generation
Average generation loss at each instant
Total generation loss (MWH)
Х
=
PF= Process failurePD= Process deficiencyES= Early slow downLP= Late pick up
Quality with Statistics 38
Other Types of Check SheetsDefective item check sheet
Checks are made against various causes of rejection/rework of an item.Defect location check sheet
Instead of a table a diagram is made of the defect space. Checks are made at the location where defect occurs. Locational segregation of defects, if any, provides valuable clue.
Leakage in a cooling system Cracks in castings Wear out of moving parts
Check-up confirmation check sheet Used to make a comprehensive check-up of product/process quality (usually
at the final stage). Preprinted items of checks avoids duplication and missing of tests to be
performed. It is a variation of check list, which is used for checking if all the tasks have
been performed or not.C-E diagram check sheet
Checks are made against the cause of a problem in the C-E diagram.
Quality with Statistics 39
Data Sheet – General FormatTitle
Common relevant information
Individual
Var. 1
Var. 2
Var. p Remark
Ind. 1Ind. 2
Ind. n Important summary of dataNotes:
Quality with Statistics 40
Data Sheet - ExampleUp-load detention report for the month of July, 2001Rak
eN0.
Date
Arrivaltime
Qua
lity
# ofwagons
Form
date
Form
time
Depart.
date
Depart.
time
Deten.
hours
Demur.
hours
Reason
Actual unloading time -
Hr.01 01 19.45 Envi
ro58 02 05.3
502 15.30 09.55 - - 09.00
. . . . . . . . . . . . .20 14 07.50 Du.
hill58 15 16.4
516 00.20 07.35 23 S(19)
+I(4)14.30
. . . . . . . . . . . . .42 31 20.20 . . . . . . . . . 14.45
Purpose? Estimation of demurrage
hoursControl of demurrage hoursImportant reasons cited are receipt in quick succession, successive
detentions and wet coal. These are beyond the control of the coal handling section. Inadequate
Data!
Chapter 3:
Summarization of Data
Quality with Statistics 42
Data Analysis – Getting Started
102.8 105.2 103.2 104.0 105.2 104.8 105.6 105.0 105.0 104.0 104.0 105.2 106.0 106.4 103.2 104.2 102.0 103.6 103.8 105.0 105.2 105.2 106.0 105.0 103.0 103.2 103.0 103.0 104.2 105.8 105.4 104.8 104.8 105.2 105.2 106.0 104.0 104.2 103.8 104.4 104.0 102.2 103.4 104.4 104.4 104.2 104.8 106.2 106.4 104.8 102.8 103.6 104.8 104.4 104.8 104.0 104.0 104.0 104.0 104.0 104.4 104.0 102.6 103.0 104.8 102.8 104.0 103.4 103.6 104.0 104.0 103.4 106.0 104.4 104.4 102.4 102.8 105.0 105.2 105.2
Hours Generation (MW)10.00 – 13.3014.00 – 17.3018.00 – 21.3022.00 – 01.3002.00 – 05.3006.00 – 09.3010.00 – 13.3014.00 – 17.3018.00 – 21.3022.00 – 01.30
Half-hourly record of generation by station ‘E’ during 19/9/01 (10 hrs.) to 21/9/01 (1.30 hrs.) under normal operating condition
What are your conclusions?
Quality with Statistics 43
Frequency Distribution- Analyzing a large data set on the same variable
Class Interval Tally Frequency101.7 – 102.3 02102.3 – 102.9 06102.9 – 103.5 10103.5 – 104.1 19104.1 – 104.7 11104.7 – 105.3 22105.3 – 105.9 03105.9 – 106.5 07 Total 80
Generation data set (previous slide)The eighty observations are grouped in eight classes of
equal length
Does the frequency distribution provide better insight into the process?
DATA + ANALYSIS = INFORMATION
Data are not information
Quality with Statistics 44
Constructing Frequency Distributions- Variable Data
Data set Number of observations (N): About 100 on the same variable.
Formation of the classes (first column) Number of classes (k)
Too many classes obscure the pattern of the distribution due to sampling fluctuations. Details are lost with too few classes. Optimum number of classes is given by k = 1 + 3.3 log10 (N)
The simpler formula k = N also works well in practice. For better visual impact, it is preferable to have 5 k 12. For the generation data set we have N = 80. Therefore, k
= 1+3.3*log(80) = 7.3. This means the number of classes should be either 7 or 8. We have chosen 7 classes.
Quality with Statistics 45
Constructing Frequency Distributions (..contd.)
Class width (h) h = (R + w) / k where R = Range of the observations = Maximum – Minimum and w = Least count of measurement. Next, h is rounded to the nearest integer multiple of w. This
means, if the least unit of measurement (w) is 0.1, then h = 2.312 should be rounded to 2.3. However, if w = 0.2, then the same h should be rounded to 2.4.
In our generation data example, R = 106.4 – 102.0 = 4.4, and w = 0.2. Thus, h = (4.4+0.2) / 7 = 0.657, which is rounded to 0.6. We shall explain later, why taking h = 0.7 will be erroneous.
Note that if h is rounded down then we shall need (k+1) classes to cover the whole range of the observations. How many classes shall we need if h is rounded up?
Quality with Statistics 46
Constructing Frequency Distributions (..Contd.)
Class limits The minimum value of the generation data is 102.0 and the class
width has been determined as 0.6. So we can form the classes as 102.0 – 102.6, 102.7 – 103.3, 103.4 – 103.9, . . .
The problem with the above classification is that there is a gap between two successive class intervals. This is not desirable since we are dealing with continuous data.
Discontinuity can be removed by forming the classes as 102.0 – 102.6, 102.6 – 103.2, 103.2 – 103.8, . . . However, this classification has another problem. Suppose we have
an observation 102.6. In which class shall we place it, first or second?
In order to avoid such confusion we take Lower limit of the first class = Minimum – w/2 and then successively add the class width to this lower limit to
obtain the other class limits.
Quality with Statistics 47
Constructing Frequency Distributions (..Contd.)
Class limits (..Contd.) Thus, for the generation data we have the classes as 101.9 – 102.5 102.5 – 103.1 103.1 – 103.7 103.7 – 104.3 104.3 – 104.9 104.9 – 105.5 105.5 – 106.1 106.1 – 106.7
Note that now we have - 8 classes (since h has been rounded down from 0.657 to 0.6) - no confusion in classification (since there are no observations
which fall on the class limits) and - an extended last class (ideally the upper limit of the last class
should have been 106.5).
In the example, we have extended the first class instead of the last one since this has brought out the process abnormalities better. Thus the eight classes used are
101.7 – 102.3, 102.3 – 102.8, … , 105.9 – 106.5
Quality with Statistics 48
Constructing Frequency Distributions (..Contd.)
Tally marking (second column) Start with the first observation. Find the class to which the observation
belongs. Put a tally against the class. Classify all the remaining observations as above. Tally marks are grouped in five, with the fifth tally crossed through the
previous four tallies. This provides a better visual display and helps in counting the frequency of each class.
Note that all the above observations get classified as we go through the observations only once. However, if we concentrate on a class and then try to find out the number of observations in the class then we have to go through the observations k times. This not only consumes more time but also increases the chance of committing error.
Counting frequency (third column) The frequency (f) of each class is obtained simply by counting the
tallies.Other columns Columns giving cumulative frequency (f1, f1+f2, ..) and relative
frequency (f1/N, f2/N, ..) may also be added, if required.
Quality with Statistics 49
Constructing Frequency Distributions- Getting the class intervals right
Why class width (h) is rounded to nearest integer multiple of w Consider the same generation data example. Here w=0.2. Assume that
h = 0.657 is rounded to 0.7 (which is not an integer multiple of 0.2) instead of 0.6. Thus the classes will be 101.9 – 102.6, 102.6 – 103.3, ..
Now in order to overcome the problem of classifying observations like 102.6, we are forced to consider w=0.1 and have the classes as
101.95 – 102.65, 102.65 – 103.35, 103.35 – 104.05, 104.05 – 104.75, 104.75 – 105.45, 105.45 – 106.15, 106.15 – 106.85
Note that the number of observation units covered by each class are not same. For example, the second class covers three units (102.8, 103.0 and 103.2) but the third class covers four units (103.4, 103.6, 103.8 and 104.0). As a result the frequency distribution is likely to show many peaks.
Balancing end points Assuming w=0.1, the seven classes shown above should be
appropriate. However, note that the last class is extended by four units beyond the maximum observed value of 106.4. It is desirable to distribute this imbalance to the two end classes by starting the first class from 101.75 and ending at 106.65.
Quality with Statistics 50
Frequency Distribution of The Generation Data – Further analysis The frequency distribution shows an abnormal pattern (nearly alternative peaks). Does this mean the process mean is jumping randomly by about 1.2 unit? Following two frequency distributions constructed out of the same data provide some additional clues.Fractional part Frequenc
y.0 27.2 18.4 15.6 5.8 15Total 80
Noisy Data !!
Class interval Frequency101.7 – 102.7 04102.7 – 103.7 17103.7 – 104.7 26104.7 – 105.7 25105.7 – 106.7 08
Total 80 0’s occur more frequently at the cost of 6’s. Does this indicate measurement bias?
Smooth pattern (left skewed). Smoothness has been achieved not only by reducing the number of classes but also by including the adjacent 0’s and 6’s in the same interval.
Quality with Statistics 51
Histogram Histogram is a graphical representation of a frequency distribution of variable data. The histogram of the generation data having five classes is shown below.
101.7 103.7 105.7 Generation in ‘E’ station (MW)
05
1015
20 25
Freq
uenc
y
30 Bars of equal width (= class width) Heights of the bars are proportional to the frequencies of the classes Bar width of about 1 cm. (7-10 classes) Horizontal axis is about 1.6 times longer than the vertical axis Central tendency: About 104.2. Pattern of variation: Slightly left skewed
Specification limits: Should be shown wherever applicable. Class mid-point: Marking the class mid-points may be helpful in certain cases. Open ended classes: Avoid adding too many classes at the ends having zero or very low frequencies. Shown as open ended bars with arbitrarily reduced heights.
Quality with Statistics 52
Construction of Histogram- An exercise
Half-hourly record of power (MW) generated by station ‘E’ during 29.9.2001 (10.00 hours) to 30.9.2001 (24.00 hours) gives us the following data. 6.4 6.4 6.8 6.0 5.2 4.8 6.4 4.4 5.2
6.07.6 8.0 7.4 6.6 8.0 5.6 7.2 7.2 7.0 4.06.4 8.0 8.0 6.0 6.0 6.4 7.8 7.6 7.6 7.47.6 7.6 7.4 4.6 4.2 4.8 6.0 5.6 5.4 5.06.2 7.8 7.4 7.2 7.4 7.8 6.6 6.4 6.8 6.86.8 6.8 6.6 6.8 6.6 6.8 6.8 6.8 7.0 7.06.0 5.6 4.4 4.6 4.6 4.8 6.2 7.0 6.6 6.45.2 5.2 7.2 7.4 6.0 5.0 7.0 7.6 7.6 7.45.2 7.2 7.2 7.0 7.2 6.8 6.0 6.0 6.0 5.2
Construct a histogram of the above data set. Compare with the histogram for the period 19.9.01 to 21.9.01 ( previous slide) and offer your comments.
29/9 (10 hrs.)
30/9(24 hrs.)
Quality with Statistics 53
Commonly Observed Histogram Patterns
Single peak, symmetric, bell shaped, commonly observed pattern of a stable process
Single peak, positively skewed (long tail on the right)
Single peak, negatively skewed (Long tail on the left)
Many characteristics follow such patterns. We have already seen that generation data is negatively skewed while breakdown data is positively skewed. However such shapes may also indicate process instability.
LSL USL
Single peak, thick tail
Two peaks (bi-modal)
How?
How?
Quality with Statistics 54
Frequency Distribution of Discrete Data
Number of plant outages in each year since commissioningStation
Period Type of outage
# of outages in a year
D 1978-79To
2000-01
Forced 2, 3, 1, 0, 3, 2, 1, 0, 2, 2, 0, 2, 3, 0, 2, 1, 2, 1, 1, 0, 1, 0, 2
Planned 3, 5, 1, 4, 2, 5, 2, 1, 6, 3, 7, 7, 4, 7, 6, 5, 6, 4, 2, 2, 2, 6, 2
E 1985-86To
2000-01
Forced 2, 2, 5, 3, 0, 0, 1, 0, 1, 0, 2, 1, 1, 0, 1, 4Planned 15, 7, 8, 3, 7, 5, 2, 6, 3, 8, 7, 4, 5, 4, 3, 4
F 1988-89To
2000-01
Forced 4, 1, 1, 0, 0, 1, 1, 2, 0, 1, 0, 1, 6Planned 3, 11, 6, 12, 4, 0, 1, 2, 8, 2, 4, 4, 6 Ideally we should construct six frequency distributions (for each type
of outage in each station). However, due to shortage of data we shall construct only two - one for forced outage and the other for planned outage. What can you say about the occurrence of two types of outages from the above data set?
Quality with Statistics 55
48
12
16
0 1 2 3 4 5 6Number of outages
Freq
uenc
y 48
12
16
0 1 2 3 4 5 6 7 8 9 11,12,15 Number of
outagesFr
eque
ncy
Distribution of number of yearly outages of stations ‘D’, ‘E’ and ‘F’ since commissioning
Forced outages (Line graph)
Planned outages (Bar graph)
Line graph is showing the frequencies of individual outcomes. Bar graph is similar to the histogram. But there are gaps between the bars since we are dealing with discrete (attribute) data. Planned outages occur more frequently than forced outages. Number of planned outages is uniformly distributed between 2 and 7 with very few outages outside this band. Such a pattern is somewhat odd. Planned outages need to be defined properly. Do we undertake unnecessary planned outages?
Line Graph and Bar Graph
Quality with Statistics 56
Measures of Central Tendency – The Typical value
Mean
• Most effective measure for numerical data. Let {X1, X2, … , XN-1, XN} be the data set. Then Mean = X = (X1 + X2 + ………+ XN) /N = Xi / N• May be used for ordinal data but not for nominal data• Sensitive to extreme values
Median
• Ordinal data: Category containing the (N+1)/2 case• Numerical data: (N+1)/2 th ordered observation, when N is odd and average of N/2 th and (N/2)+1 th ordered observations, when N is even.• Can be computed even for open ended classes at the extremes provided each of the end classes contain less than 50% of the observations.• Insensitive to outliers.
Mode
• Category or the value occurring with greatest frequency• Only measure of center for nominal data • May not be unique and highly sensitive to how the classes or categories are formed.
Quality with Statistics 57
Interpretation of MeanIn a rising voltage test the alternating breakdown voltage(kV) of 24 samples of an insulation arrangement were found to be as follows:210; 208; 208; 175; 182; 206; 190; 194; 198; 205; 212; 200; 205;202; 207; 210; 202; 201; 188; 205; 209; 201; 216; 196
170 180 190 210 220
MEAN = [210 + 208 + … + 216 + 196] / 24 = 201.25 kV
Mean
DotPlot
• Mean is the balance point (or fulcrum) for the distribution of the values • Mean is analogous to centre of gravity• Sum of negative deviations from mean exactly equals the sum of positive deviations. Thus the total sum of the deviations from mean is always zero• In the above example, the mean should be interpreted as a measure of centre and not that of central tendency or typical value
Quality with Statistics 58
Data Analysis – Getting Started
Reportable accidents (#) in AEC Ltd., Sabarmati
during 1995-2000Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Total
1995 24 17 27 19 10 25 19 22 23 16 18 15 235
1996 22 10 22 18 16 21 21 20 21 18 18 18 225
1997 19 14 12 15 15 9 15 24 19 16 14 19 192
1998 14 14 12 20 19 23 10 16 13 15 17 19 192
1999 19 13 15 13 16 18 17 16 20 17 13 16 193
2000 12 14 15 22 12 4 6 13 7 9 7 13 134
Total
110 82 103 108 88 100 88 111 103 91 87 100 1171
What are your conclusions?