extreme metrics analysis for fun and profit paul below

55
Extreme Metrics Analysis for Fun and Profit Paul Below

Upload: stephany-chapman

Post on 25-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Extreme Metrics Analysis for Fun and ProfitExtreme Metrics Analysis for Fun and Profit

Paul BelowPaul Below

February 20, 2003 2

AgendaAgenda

• Statistical Thinking

• Metrics Use: Reporting and Analysis

• Measuring Process Improvement

• Surveys and Sampling

• Organizational Measures

February 20, 2003 3

“Experiments should be reproducible. They should all fail in the same way.”

AgendaAgenda

Statistical Thinking

February 20, 2003 4

Statistical ThinkingStatistical Thinking

• You already use it, at home and at work

• We generalize in everyday thinking

• Often, our generalizations or predictions are wrong

February 20, 2003 5

Uses for StatisticsUses for Statistics

• Summarize our experiences so others can understand

• Use information to make predictions or estimates

• Goal is to do this more precisely than we would in everyday conversation

February 20, 2003 6

Listen for QuestionsListen for Questions

• We are not used to using numbers in professional lives

– “What does this mean?”

– “What should we do with this?”

• We need to take advantage of our past experience

February 20, 2003 7

Statistical ThinkingStatistical Thinking

is more important than methods or technologyAnalysis is iterative, not one shot

Data

Model

Induction

Deduction

(Modification of Shewhart/Deming cycle byGeorge Box, 2000 Deming lecture, Statistics for Discovery)

I I

D D

Learning

February 20, 2003 8

"It ain't so much the things we don't know that get us in trouble. It's the things we know that ain't so." Artemus Ward, 19th Century American Humorist

AgendaAgenda

Metrics Use: Reporting and Analysis

February 20, 2003 9

Purpose of MetricsPurpose of Metrics

• The purpose of metrics is to take action. All types of analysis and reporting have the same high-level goal: to provide information to people who will act upon that information and thereby benefit.

• Metrics offer a means to describe an activity in a quantitative form that would allow a knowledgeable person to make rational decisions. However,

–  Good statistical inference on bad data is no help.

–  Bad statistical analysis, even on the right variable, is still bad statistics.

February 20, 2003 10

Therefore…Therefore…

• Metrics use requires implemented processes for:

– metrics collection,

– reporting requirements determination,

– metrics analysis, and

– metrics reporting.

February 20, 2003 11

Types of Metrics UseTypes of Metrics Use

“You go to your tailor for a suit of clothes and the first thing that he does is make some measurements; you go to your physician because you are ill and the first thing he does is make some measurements. The objects of making measurements in these two cases are different. They typify the two general objects of making measurements. They are: (a) To obtain quantitative information (b) To obtain a causal explanation of observed phenomena.”

Walter Shewhart

February 20, 2003 12

The Four Types of AnalysisThe Four Types of Analysis

1. Ad hoc: Answer specific questions, usually in a short time frame. Example: Sales support

2. Reporting: Generate predefined output (graphs, tables) and publish or disseminate to defined audience, either on demand or on regular schedule.

3. Analysis: Use statistics and statistical thinking to investigate questions and reach conclusions. The questions are usually analytical (e.g., “Why?” or “How many will there be?”) in nature.

4. Data Mining: Data mining starts with data definition and cleansing, followed by automated knowledge extraction from historical data. Finally, analysis and expert review of the results is required.

February 20, 2003 13

Body of Knowledge (suggestions)Body of Knowledge (suggestions)

• Reporting

– Database query languages, distributed databases, query tools, graphical techniques, OLAP, Six Sigma Green Belt (or Black Belt), Goal-Question-Metric

• Analysis

– Statistics and statistical thinking, graphical techniques, database query languages, Six Sigma black belt, CSQE, CSQA

• Data Mining

– Data mining, OLAP, data warehousing, statistics

February 20, 2003 14

Analysis Decision TreeAnalysis Decision Tree

Enumerative Analytical

Ad hoc Reporting Analysis Data Mining and Analysis

One Time?Yes No

Type of Question?

Factors Analyzed:Few Many

February 20, 2003 15

Extreme ProgrammingExtreme Programming

February 20, 2003 16

Extreme AnalysisExtreme Analysis

• Short deadlines, small releases

• Overall high level purposes defined up front, prior to analysis start

• Specific questions prioritized prior to analysis start

• Iterative approach with frequent stakeholder reviews to obtain interim feedback and new direction

• Peer synergy, metrics analysts work in pairs.

• Advanced query and analysis tools, saved work can be reused in future engagements

• Data warehousing techniques, combining data from multiple sources where possible

• Data cleansing done prior to analysis start (as much as possible)

• Collective ownership of the results

February 20, 2003 17

Extreme Analysis TipsExtreme Analysis Tips

Produce clean graphs and tables displaying important information. These can be used by various people for multiple purposes. Explanations should be clear, organization should make it easy to find information of interest. However,

It takes too long to analyze everything -- we cannot expect to produce interpretations for every graph we produce. And even when we do, the results are superficial because we don't have time to dig into everything.

"Special analysis", where we focus in on one topic at a time, and study it in depth, is a good idea. Both because we can complete it in a reasonable time, and also because the result should be something of use to the audience.

Therefore, ongoing feedback from the audience is crucial to obtaining useful results

February 20, 2003 18

Measuring Process Improvement

“Is there any way that the data can show improvement when things aren’t improving?” -- Robert Grady

AgendaAgenda

February 20, 2003 19

Measuring Process ImprovementMeasuring Process Improvement

• Analysis can determine if a perceived difference could be attributed to random variation

• Inferential techniques are commonly used in other fields, we have used them in software engineering for years

• This is an overview, not a training class

Expand our Set of TechniquesExpand our Set of Techniques

Metrics are used for:

• Benchmarking

• Process improvement

• Prediction and trend analysis

• Business decisions

• …all of which require confidence analysis!

February 20, 2003 21

Is This a Meaningful Difference?Is This a Meaningful Difference?

1.0

0.5

1.5

2.0

0

1 2 3

CMM Maturity Level

Re

lative

Pe

rform

ance

February 20, 2003 22

Pressure to Product ResultsPressure to Product Results

“If you torture the data long enough, it will confess.” -- Ronald Coase

• Why doesn’t the data show improvement?

• “Take another sample!”

• Good inference on bad data is no help

February 20, 2003 23

Types of StudiesTypes of Studies

• Anecdote: “I heard it worked once”, cargo cult mentality

• Case Study: some internal validity

• Quasi-Experiment: can demonstrate external validity

• Experiment: can be repeated, need to be carefully designed and controlled

Anecdote Case Study Quasi-experimental Experiment

February 20, 2003 24

Attributes of ExperimentsAttributes of Experiments

• Random Assignment

• Blocked and Unblocked

• Single Factor and Multi Factor

• Census or Sample

• Double Blind

• When you really have to prove causation (can be expensive)

Subject Treatment Reaction

February 20, 2003 25

Limitations of Retrospective StudiesLimitations of Retrospective Studies

• No pretest, we use previous data from similar past projects

• No random assignment possible

• No control group

• Cannot custom design metrics (have to use what you have)

February 20, 2003 26

Quasi-Experimental DesignsQuasi-Experimental Designs

• There are many variations

• Common theme is to increase internal validity through reasonable comparisons between groups

• Useful when formal experiment is not possible

• Can address some limitations of retrospective studies

February 20, 2003 27

Causation in Absence of ExperimentCausation in Absence of Experiment

• Strength and consistency of the association

• Temporal relationship

• Non-spuriousness

• Theoretical adequacy

February 20, 2003 28

What Should We Look For?What Should We Look For?

Some information to accompany claims:•measure of variation•sample size•confidence intervals•data collection methods used•sources•analysis methods

Are the Conclusions Warranted?

February 20, 2003 29

Decision Without AnalysisDecision Without Analysis

• Conclusions may be wrong or misleading

• Observed effects tend to be unexplainable

• Statistics allows us to make honest, verifiable conclusions from data

February 20, 2003 30

Types of Confidence AnalysisTypes of Confidence Analysis

C orre la tion

Q u an tita tive

Tw o-W ay Tab les

C ateg orica l

V ariab les

February 20, 2003 31

Two Techniques We Use FrequentlyTwo Techniques We Use Frequently

• Inference for difference between two means

– Works for quantitative variables

– Compute confidence interval for the difference between the means

• Inference for two-way tables

– Works for categorical variables

– Compare actual and expected counts

February 20, 2003 32

Quantitative VariablesQuantitative Variables

119120120119N =

Project Productivity

ISBSG release 6

Quartiles of Project Size

4321

AF

P p

er

ho

ur

1.0

.9

.8

.7

.6

.5

.4

.3

.2

.1

0.0

Comparison of means of quartiles 2 and 4 yields p value of 88.2%, not a significant difference at 95% level)

February 20, 2003 33

Categorical VariablesCategorical Variables

EffortVariance

Low PM MediumPM

High PM

Met 3 6 7

Not Met 9 10 9

P value is approximately 50%

February 20, 2003 34

Categorical VariablesCategorical Variables

DateVariance

Low PM MediumPM

High PM

Met 2 10 13

Not Met 10 6 3

P value is greater than 99.9%

February 20, 2003 35

Expressing the Results “in English”Expressing the Results “in English”

• “We are 95% certain that the difference in average productivity for these two project types is between 11 and 21 FP/PM.”

• “Some project types have a greater likelihood of cancellation than other types, we would be unlikely to see these results by chance.”

February 20, 2003 36

What if...What if...

• Current data is insufficient

• Experiment can not be done

• Direct observation or 100% collection cannot be done

• or, lower level information is needed?

February 20, 2003 37

Surveys and Samples

In a scientific survey every person in the population has some known positive probability of being selected.

AgendaAgenda

February 20, 2003 38

What is a Survey?What is a Survey?

• A way to gather information about a population from a sample of that population

• Varying purposes

• Different ways:

– telephone

– mail

– internet

– in person

February 20, 2003 39

What is a Sample?What is a Sample?

• Representative fraction of the population

• Random selection

• Can reliably project to the larger population

February 20, 2003 40

What is a Margin of Error?What is a Margin of Error?

• An estimate from a survey is unlikely to exactly equal to quantity of interest

• Sampling error means results differ from a target population due to “luck of the draw”

• Margin of error depends on sample size and sample design

February 20, 2003 41

What Makes a Sample Unrepresentative?What Makes a Sample Unrepresentative?

• Subjective or arbitrary selection

• Respondents are volunteers

• Questionable intent

February 20, 2003 42

How Large Should the Sample Be?How Large Should the Sample Be?

• What do you want to learn?

• How reliable must the result be?

– Size of population is not important

– 1500 people is reliable enough for entire U.S.

• How large CAN it be?

February 20, 2003 43

“Dewey Defeats Truman”“Dewey Defeats Truman”

• Prominent example of a poorly conceived survey

• 1948 pre-election poll

• Main flaw: non-representative sample

• 2000 election: methods not modified to new situation

February 20, 2003 44

Is Flawed Sample the Only Type of Problem That Happens?Is Flawed Sample the Only Type of Problem That Happens?

• Non-response

• Measurement difficulties

• Design problems, leading questions

• Analysis problems

February 20, 2003 45

Some RemediesSome Remedies

• Stratify sample

• Adjust for incomplete coverage

• Maximize response rate

• Test questions for

– clarity

– objectivity

• Train interviewers

February 20, 2003 46

Organizational Measures

“Whether measurement is intended to motivate or to provide information, or both, turns out to be very important.” -- Robert Austin

AgendaAgenda

February 20, 2003 47

Dysfunctional MeasuresDysfunctional Measures

• Disconnect between measure and goal

– Can one get worse while the other gets better?

• Is one measure used for two incompatible goals?

• The two general types of measurement are...

February 20, 2003 48

Measurement in OrganizationsMeasurement in Organizations

• Motivational Measurements

– intended to affect the people being measured, to provoke greater expenditure of effort in pursuit of org’s goals

• Informational Measurements

– logistical, status, or research information, provide insight to provide short term management and long term improvement

February 20, 2003 49

Informational MeasurementsInformational Measurements

• Process Refinement Measurements

– reveals detailed structure of processes

• Coordination Measurements

– logistical purpose

February 20, 2003 50

Mixed MeasurementsMixed Measurements

• “Dashboard” concept is incomplete

• We have Gremlins

The desire to be viewed favorably provides an incentive for people being measured to tailor, supplement, repackage, or censor information that flows upward.

February 20, 2003 51

The Right Kind of CultureThe Right Kind of Culture

• Ask yourself what is driving the people around you to do a good job:

– Do they identify with the organization and fellow team members? (Work hard to avoid letting coworkers down)

– Are they only focused on the next performance review and getting a big raise?

Internal or external motivation?

February 20, 2003 52

Why is this important?Why is this important?

• Each of us makes dozens of small decisions each day

– Motivational measures influence us

– These small decisions add up to large impacts

• Are these decisions aligned with the organization’s goals?

February 20, 2003 53

Conclusion: It Has Been DoneConclusion: It Has Been Done

• There are organizations in which people have given themselves completely to pursuit of organizational goals

• These people want measurements as a tool that helps get the job done

• If this is your organization, fight hard to keep it

February 20, 2003 54

A Few Selected Resources:A Few Selected Resources:

• Measuring and Managing Performance in Organizations, Robert D. Austin, 1996.

• Schaum’s Outlines: Business Statistics, Leonard J. Kazmier, 1996.

• International Software Benchmarking Standards Group, http://www.isbsg.org.au

• American Statistical Association, http://www.amstat.org/education/Curriculum_Guidelines.html

• Graphical techniques books by E. Tufte

• Contact a statistician for help

eds.com