critical thinking for software testers

TA Full Day Tutorial

10/14/2014 8:30:00 AM

"Critical Thinking for Software Testers"

Presented by:

Michael Bolton

DevelopSense

Brought to you by:

340 Corporate Way, Suite 300, Orange Park, FL 32073

888-268-8770 ∙ 904-278-0524 ∙ [email protected] ∙ www.sqe.com

Michael Bolton

DevelopSense Tester, consultant, and trainer Michael Bolton is the coauthor (with James Bach) of Rapid Software Testing, a course that presents a methodology and mindset for testing software expertly in uncertain conditions and under extreme time pressure. A leader in the context-driven software testing movement, Michael has twenty years of experience testing, developing, managing, and writing about software. Currently, he leads DevelopSense, a Toronto-based consultancy. Prior to DevelopSense, he was with Quarterdeck Corporation, where he managed the company’s flagship products and directed project and testing teams—both in-house and worldwide. Contact Michael at [email protected].

Critical Thinking for Testers Michael Bolton and James Bach

1

Critical Thinking for Testers

James Bachhttp://[email protected]

Twitter: @jamesmarcusbach

Michael Boltonhttp://[email protected]: @michaelbolton

Simple Problems


2

A Simple Puzzle

“Steve,anAmericanman,isveryshyandwithdrawn,invariablyhelpfulbutwithlittleinterestinpeopleorintheworldofreality.Ameekandtidysoul,hehasaneedfororderandstructure,andapassionfordetail.”

IsStevemorelikelytobealibrarian?afarmer?

Is Steve more likely to bea farmer or a librarian?

• It’s got to be farmer, because there are 18 times more male farm‐workers in the USA than male librarians as of 2010(source: US Bureau of Labor Statistics) and probably a much higher percentage worldwide.

• (By contrast, for females the numbers of librarians and farmers are roughly equal.)

• You must consider prior probability before answering questions like this.


3

Do You Know Hexadecimal?

• Without using a calculator or a reference, please give your answers to the following quiz on an anonymous (unsigned) index card.

1. Of the people in this room, how many do you think (or guess) know hexadecimal numbering?

2. What is 162 in hexadecimal notation?

3. What is the decimal equivalent of 0xB7?

4. What is the sum of 0x7C and 0x13 in hex?

5. What is the sum of 0x7C and 0x13 in decimal?

Wait, let’s try something really simple…

Can we agree?Can we share common ground?

“There are four geometric figures on this slide.”“There is one square among those figures.”

“The square is shaded in blue.”


4

Beware ofShallow Agreement!

Wait, let’s try something really simple…

Reflex is IMPORTANTBut Critical Thinking is About Reflection

REFLEX

REFLECTION

FasterLooser

SlowerSurer

get moredata

System 2

System 1See Thinking Fast and Slow, by Daniel Kahneman


5

The Nature of Critical Thinking

• “Critical thinking is purposeful, self‐regulatory judgment which results in interpretation, analysis, evaluation, and inference, as well as explanation of the evidential, conceptual, methodological, criteriological, or contextual considerations upon which that judgment is based.” ‐ Critical Thinking: A Statement of Expert Consensus for Purposes of Educational Assessment and Instruction, Dr. Peter Facione

(Critical thinking is, for the most part, about getting all the benefits of your “System 1” thinking reflexes while avoiding self‐deception and other mistakes.)

Bolton’s Definition of Critical Thinking

• Michael Bolton

Testing is enactment of critical thinking about software.Critical thinking must begin with our belief in the likelihood of errors in our thinking.


6

The Nature of Critical Thinking

• We call it critical thinking whenever we systematically doubt something that the “signs” tell us is probably true. Working through the doubt gives us a better foundation for our beliefs.

• Critical thinking is a kind of de‐focusing tactic, because it requires you to seek alternatives to what is already believed or what is being claimed.

• Critical thinking is also a kind of focusing tactic, because it requires you to analyze the specific reasoning behind beliefs and claims.

Why You Should Care

Technology is way more tricky than regular life.

But testersare not supposed

to get tricked.


7

Don’t Be A Turkey

• Every day the turkey adds one more data point to his analysis proving that the farmer LOVES turkeys.

• Hundreds of observationssupport his theory.

• Then, a few days beforeThanksgiving…

Based on a story told by Nassim Taleb, who stole it from Bertrand Russell, who stole it from David Hume.

Graph of My Fantastic Life! Page 25!(by the most intelligent Turkey in the world)

Well

Bein

g! DATAESTIMATED

POSTHUMOUSLYAFTER THANKSGIVING

“Corn meal a little off today!”

Don’t Be A Turkey

• No experience of the past can LOGICALLY be projected into the future, because we have no experience OF the future.

• No big deal in a world ofstable, simple patterns.

• BUT SOFTWARE IS NOTSTABLE OR SIMPLE.

• “PASSING” TESTS CANNOTPROVE SOFTWARE GOOD.

Based on a story told by Nassim Taleb, who stole it from Bertrand Russell, who stole it from David Hume.


8

Themes

• Technology consists of complex and ephemeral relationships that can seem simple, fixed, objective, and dependable even when they aren’t.

• Testers are people who ponder and probe complexity.

• Basic testing is a straightforward technical process.

• But, excellent testing is a difficult social and psychological process in addition to the technical stuff.

A tester is someone who knows thatthings can be different.

Jerry Weinberg


9

Critical Thinking About Testing:Call this “Checking” not Testing

Observe Evaluate Report

Interact with the product in specific ways to collect specific observations.

Apply algorithmicdecision rules tothose observations.

Report any failed checks.

means

operating a product to check specific facts about it…


10

Acquiring the competence, motivation, and credibility to…

Testing is…

create the conditions necessary to…

…so that you help your clients to make informed decisions about risk.

evaluate a product by learningabout it through experimentation, which includes to

some degree: questioning, study, modeling, observation and inference, including…

operating a product to check specific facts about it…


11

This is what people think testers do

described actual

“Compare the product to its specification”

This is more like what testers really do

imagined

actualdescribed

“Compare the ideaof the product toa description of it”

“Compare the actual productto a description of it”

“Compare the ideaof the product to

the actual product”


12

This is what you find…

The designer INTENDS the product to be Firefox compatible,

but never says so, and it actually is not.

The designer INTENDS the product to be Firefox compatible, SAYS SO IN

THE SPEC, but it actually is not.

The designer assumesthe product is not Firefox compatible, and it actually is not, but the ONLINE

HELP SAYS IT IS.

The designerINTENDS

the product to beFirefox compatible,

SAYS SO, and IT IS.

The designer assumesthe product is not Firefox compatible,

but it ACTUALLY IS, and the ONLINE HELP SAYS IT IS.

The designer INTENDS the productto be Firefox compatible,

MAKES IT FIREFOX COMPATIBLE, but forgets to say so in the spec.

The designer assumesthe product is not Firefoxcompatible, and no one

claims that it is, but it ACTUALLY IS.

ExerciseCalculator Test

“I was carrying a calculator.

I dropped it!

Perhaps it is damaged!

What might you do to test it?”


13

and dry it out before attempting to test it.

When did I drop it? Was I in the middle of a calculation? If so part of my testing might be to visually inspect the status of the display to determine whether the calculator appears to still be in the state it was at the time I dropped it. If so, I might continue the calculation from that point, unless I believe that the drop probably damaged the calculator.

Did I drop it on a hard surface with a force that makes me suspect internal damage? If so then I would expect possible hairline fractures. I imagine that would lead to intermittent or persistent short circuits or broken circuits. I also suspect damage to moving parts, battery or solar cell connections, or screen.

Did I drop it into a destructive chemical environment? If so, I might worry more about the progressive decay of the components.

Did I drop it into a dangerous biological or radiological environment? If so, the functions of the calculator maybe less concern than contaminants. I may have to test it with a Geiger counter.

Was the calculator connected to anything else whereby the connection (data cable or AC/cable or duct tape that fastened it to a Faberge egg) could have been damaged, or could have damaged the thing it was connected to?

Did I detect anything while it was dropping that leads me to suspect any damage in particular (e.g. an electrical flash, or maybe a loud popping sound)?

Am I aware of a history of "drop" related problems with this calculator? Have I ever dropped it before?

Is the calculator ruggedized? Is it designed to be dropped in this way?

What is my relationship to this calculator? Is it mine or someone else's? Maybe I'm just borrowing it.

What is the value of this calculator. I assume that this is not a precious artifact from a museum. The exercise as presented appears to be about a calculator as calculating machine, rather than as a precious Minoan urn that happens to have calculator functions built into it.

Assumptions vs. Inferences• An inference is something we treat as true based on evidence.

• An assumption is a something that we treat as true regardless of evidence.

• A premise is an assumption that begins a chain of reasoning. All logic is based on premises.

• Testers question assumptions & premises and gather data for better inferences.

Assumption

Inferen

ce

weakinference

plausible assumption


14

Levels of Assumptions

Reckless Assumptions that are too risky regardless of how they are managed. Obviously bad assumptions. Don’t make them.

Risky Assumptions that might be wrong or cause trouble, but can be okay with proper management. If you use them, declare them.

Safe Assumptions that are acceptable to make without any special management or declaration, but still might cause trouble.

Obvious Assumptions so safe that they cause trouble only IF you manage them, because people will think you are joking, crazy, or insulting.

It is silly to say “don’t make assumptions.”

Instead, say “let’s be careful about risky assumptions.”

ExerciseWhat makes an assumption more dangerous?

• Not “what specific assumptions are more dangerous?”…

• But “what factors would make one assumption more dangerous than another?”

• Or “what would make the same assumption more dangerous from one time to another?”


15

What makes an assumption more dangerous?1. Consequential: required to support critical plans and activities. (Changing the

assumption would change important behavior.)

2. Unlikely: may conflict with other assumptions or evidence that you have. (The assumption is counter‐intuitive, confusing, obsolete, or has a low probability of being true.)

3. Blind: regards a matter about which you have no evidence whatsoever.

4. Controversial: may conflict with assumptions or evidence held by others. (The assumption ignores controversy.)

5. Impolitic: expected to be declared, by social convention. (Failing to disclose the assumption violates law or local custom.)

6. Volatile: regards a matter that is subject to sudden or extreme change. (The assumption may be invalidated unexpectedly.)

7. Unsustainable: may be hard to maintain over a long period of time. (The assumption must be stable.)

8. Premature: regards a matter about which you don’t yet need to assume.

9. Narcotic: any assumption that comes packaged with assurances of its own safety.

10.Latent: Otherwise critical assumptions that we have not yet identified and dealt with. (The act of managing assumptions can make them less critical.)

What are We Seeing Here?

• Mental models and modeling are often dominated by unconscious factors.

• Familiar environments and technologies allow us to “get by” on memory and habit.

• Social conventions may cause us to value politeness over doing our disruptive job.

• Lack of pride and depth in our identity as testers saps our motivation to think better.


16

• You may not understand. (errors in interpreting and modeling a situation, communication errors)

• What you understand may not be true. (missing information, observations not made, tests not run)

• You may not know the whole story. (perhaps what you see is not all there is)

• The truth may not matter, or may matter much more than you think. (poor understanding of risk)

How to Think Critically:Introducing Pauses

Giving System 2 time to wake up!

Huh?

Really?

And?

So?

To What Do We Apply Critical Thinking?

• The Product• What it is• Descriptions of it is• Descriptions of what it does• Descriptions of what it's

supposed to be• Testing• Context• Procedures• Coverage• Oracles• Strategy

• The Project• Schedule• Infrastructure• Processes• Social orders

• Words• Language• Pictures• Problems• Biases• Logical fallacies• Evidence• Causation• Observations• Learning• Design• Behavior• Models• Measurement• Heuristics• Methods• …


17

“Huh?”Critical Thinking About Words

• Among other things, testers question premises.

• A suppressed premise is an unstated premise that an argument needs in order to be logical.

• A suppressed premise is something that should be there, but isn’t…

• (…or is there, but it’s invisible or implicit.)

• Among other things, testers bring suppressed premises to light and then question them.

• A diverse set of models can help us to see the things that “aren’t there.”

34

Example: Generating Interpretations

• Selectively emphasize each word in a statement; also consider alternative meanings.

MARY had a little lamb.

Mary HAD a little lamb.

Mary had A little lamb.

Mary had a LITTLE lamb.

Mary had a little LAMB.

35


18

“Really?”The Data Question

“And?”A vs. THE

• Example: “A problem…” instead of “THE problem…”

• Using “A” instead of “THE” helps us to avoid several kinds of critical thinking errors– single path of causation

– confusing correlation and causation

– single level of explanation

When trying to explain something,prefer "a" to "the".


19

“And?”Unless…

• When someone asks a question based on a false or incomplete premise, try adding “unless…” to the premise

• When someone offers a Grand Truth about testing, append “unless…” or “except in the case of…”

At the end of a statement,try adding "unless..."

“And?”Also…

• The product gives the correct result! Yay!

• …It also may be silently deleting system files.

• There may be more where that come from.

Whatever is happening,something else may ALSO be happening.


20

Whatever is true nowmay not be true for long.

“And?”“So far” & “Not yet”

• The product works… so far.

• We haven’t seen it fail… yet.

• No customer has complained… yet.

• Remember: There is no test for ALWAYS.

Whatever your theory about “what is”, other theories could describe “what is” as well, or better.

“And?”“What else could this be?”

• The Rule of Three: if you haven’t thought of at least three plausible and non‐trivial interpretations of what you’ve taken in, you probably haven’t thought enough.

– Jerry Weinberg

• See also How Doctors Think, Dr. Jerome Groopman


21

“So?”Critical Thinking About Risk

“The system shall operate at an input voltage range of nominal 100 ‐ 250 VAC.”

“Try it with an input voltage in the range of 100-250.”

Poor answer:

How do you test this?

Exercise:Use “Huh? Really? And? So?”

to critique this sentence:“"It is generally accepted that is more difficult for an author to find defects in their own work than it is for an independent tester to find the

same defects."


22

Treating absolute statements as a heuristic helps to defends you against critical thinking errors.

And yes, that’s a heuristic.

Heuristic Model:The Four‐Part Risk Story

• Victim. Someone that experiences the impact of a problem. Ultimately no bug can be important unless it victimizes a human.

• Problem: Something the product does that we wish it wouldn’t do.

• Vulnerability: Something about the product that causes or allows it to exhibit a problem, under certain conditions.

• Threat: Some condition or input external to the product that, were it to occur, would trigger a problem in a vulnerable product.

Some personmay be hurt or annoyedbecause of something that might go wrong

while operating the product, due to some vulnerability in the product

that is triggered by some threat.


23

How Do We Know What “Is”?

“We know what is because we see what is.”

We believe we know what is because we see what we interpret as signs that indicate what is based on our prior beliefs about the worldand our (un)awareness of things around us.

How Do We Know What “Is”?

“If I see X, then probably Y, because probably A, B, C, D, etc.”

• THIS CAN FAIL:– Ice cream that wasn’t

– Getting into a car– oops, not my car.

– Bad driving– Why?

– Bad work– Why?

– Ignored people at my going away party– Why?

– Couldn’t find soap dispenser in restroom– Why?

– Ordered orange juice at seafood restaurant– waitress misunderstood


24

Remember this, you testers!

Models Link Observation and Inference

• A model is an idea, activity, or object…

• …that represents another idea, activity, or object…

• …whereby understanding the model may help you understand or manipulate what it represents.

51

such as an idea in your mind, a diagram, a list of words, a spreadsheet, a person, a toy, an equation, a demonstration, or a program

such as something complex that you need to work with or study.

‐ A map helps navigate across a terrain.‐ 2+2=4 is a model for adding two apples to a basket that already has two apples.‐ Atmospheric models help predict where hurricanes will go.‐ A fashion model helps understand how clothing would look on actual humans.‐ Your beliefs about what you test are a model of what you test.


25

Models Link Observation & Inference

• Testers must distinguish observation from inference!

• Our mental models form the link between them

• Defocusing is lateral thinking.

• Focusing is logical (or “vertical”) thinking.

52

My modelof the world

“I see…”

“I believe…”

53

Modeling Bugs as Magic Tricks

• Our thinking is limited• We misunderstand probabilities

• We use the wrong heuristics

• We lack specialized knowledge

• We forget details

• We don’t pay attention to the right things

• The world is hidden• states

• sequences

• processes

• attributes

• variables

• identities

Magic tricks workfor the same reasons

that bugs exist

Studying magic canhelp you developthe imagination

to find better bugs.

Testing magic isindistinguishable fromtesting sufficiently

advanced technology


26

Observation vs. Inference

• Observation and inference are easily confused.

• Observation is direct sensory data, but on a very low level it is guided and manipulated by inferences and heuristics.

• You sense very little of what there is to sense.

• You remember little of what you actually sense.

• Some things you think you see in one instance may be confused with memories of other things you saw at other times.

• It’s easy to miss bugs that occur right in front of your eyes.

• It’s easy to think you “saw” a thing when in fact you merely inferred that you must have seen it.

54

Observation vs. Inference

• Accept that we’re all fallible, but that we can learn to be better observers by learning from mistakes.

• Pay special attention to incidents where someone notices something you could have noticed, but did not.

• Don’t strongly commit to a belief about any important evidence you’ve seen only once.

• Whenever you describe what you experienced, notice where you’re saying what you saw and heard, and where you are instead jumping to a conclusion about “what was really going on.”

• Where feasible, look at things in more than one way, and collect more than one kind of information about what happened (such as repeated testing, paired testing, loggers and log files, or video cameras).

55


27

How many test cases are needed to testthe product represented by this flowchart?

Critical Thinking About DiagramsAnalysis

• [pointing at a box]What if the function in this box fails?

• Can this function ever be invoked at the wrong time?

• [pointing at any part of the diagram] What error checking do you do here?

• [pointing at an arrow]What exactly does this arrow mean? What would happen if it was broken?

Web ServerDatabase

LayerApp Server

Browser


28

Guideword Heuristics for Diagram AnalysisWhat’s there? What happens? What could go wrong?

Boxes• Interfaces (testable)• Missing/Drop‐out• Extra/Interfering/Transient• Incorrect• Timing/Sequencing• Contents/Algorithms• Conditional behavior• Limitations• Error Handling

Lines• Missing/Drop‐out

• Extra/Forking

• Incorrect

• Timing/Sequencing

• Status Communication

• Data Structures

Web ServerDatabase

LayerApp Server

Browser

Paths• Simplest

• Popular

• Critical

• Complex

• Pathological

• Challenging

• Error Handling

• Periodic

Testability!

Visualizing Test Coverage: Annotation

59

Web Server

App Server

Browser

DatabaseLayer

BuildError Monitor

Survey

BuildError Monitor

Coverage analysis

Force fail

Force fail

Man-in-middle

Data generator

Build stressbotsServer stress

Performance data Inspect reports

Table consistency oracle

Datagen Oracle

Performance historyBuild history oracle

History oracle

History oracle

Review Error Output


29

Beware Visual Bias!

• setup

• browser type & version

• cookies

• security settings

• screen size

• review client-side scripts & applets

• usability

• specific functions 60

Web Server

App Server

Browser

DatabaseLayer

Testing against requirementsis all about modeling.

“The system shall operate at an input voltage range of nominal 100 ‐ 250 VAC.”

“Try it with an input voltage in the range of 100-250.”

Poor answer:

How do you test this?


30

“Pass Rate” is a Popular Metric

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2/1 2/8 2/15 2/22 3/1

Pass Rate

Pass Rate

Totally different situations…same exact graph!

Pass Rate Passed Failed Total

10% 100 900 1000

50% 500 500 1000

70% 700 300 1000

80% 800 200 1000

90% 900 100 1000

90% 900 100 1000


10% 1 9 10

50% 5 5 10

70% 7 3 10

80% 8 2 10

90% 9 1 10

90% 9 1 10


10% 1 9 10

50% 25 25 50

70% 70 30 100

80% 160 40 200

90% 450 50 500

90% 900 100 1000


10% 100 900 1000

50% 75 75 150

70% 70 30 100

80% 40 10 50

90% 36 4 40

90% 27 3 30


31

Critical Thinking About Measurementfrom an actual client

DER – Defect Escape RateThe Defect Escape Rate measures the number of undiscovered defects that escaped detection in the product development cycle and were released to customers. An escape is a defect found while using a released product. DER is defined as:

DER= (Defect Escapes /Total Defects)∗100DER is a lagging indicator of product quality. The number of escapes is always zero until after product is released. It is reported as a percentage and a low number is desired. Each business unit has a target DER percentage and an Escape Analysis should be performed on each defect to improve test coverage. It is desirable for the DER for a product line to decline over time. See appendix for calculation details.

Look! Measurement!What could possibly go wrong?

Construct Validity & External Validity

• Construct validity is (informally) the degree to which your attributes and measurements can justified within an experiment or observation– How do you demarcate the difference between one of something and not‐one of something?

– How do you know that you’re measuring what you think you’re measuring?

• External validity is the degree to which your experiment or observation can be generalized to the world outside– How do you know that your experiment or observation will be relevant at other times or in other places, with other people?


32

Kaner & Bond’s Tests for Construct Validityfrom http://www.kaner.com/pdfs/metrics2004.pdf

• What is the purpose of your measurement? The scope?• What is the attribute you are trying to measure?• What are the scale and variability of this attribute?• What is the instrument you’re using? What is its scale and

variability?• What function (metric) do you use to assign a value to the

attribute?• What’s the natural scale of the metric?• What is the relationship of the attribute to the metric’s value?• What are the natural, foreseeable side effects of using this

measure?

The essence of good measurement is a model that incorporates answers to questions like these.

If you don’t have solid answers, you aren’t doing measurement; you are just playing with numbers.

Test Framing

• Test framing is the set of logical connections that structure and inform a test and its result

• The framing of a test consists of

– premises; essentially ordinary statements

– logical “connectors”

• formal: if, then, else, and, or

• informal: although, maybe,

• A change in ONE BIT in the framing of the test can invert its result.


33

Exercise:Test Framing

• “I performed the tests. All my tests passed. Therefore, the product works.”

• “The programmer said he fixed the bug. I can’t reproduce it anymore. Therefore it must be fixed.”

• “Microsoft Word frequently crashes while I am using it. Therefore it’s a bad product.”

• “It’s way better to find bugs earlier than to find them later.”

68

Safety Language(aka “epistemic modalities”)

• “Safety language” in software testing, means to qualify or otherwise draft statements of fact so as to avoid false confidence.

• Examples:

So far…

The feature worked

It seems…

I think…It appears…

apparently…I infer…

I assumed…

I have not yet seen any failures in the feature…


34

Who Says?Critical Thinking About Research

• Research varies in quality• Research findings often contradict one another• Research findings do not prove conclusions• Researchers have biases• Writers and speakers may simplify or distort• “Facts” change over time• Research happens in specific environments• Human desires affect research outcomes

Asking the Right Questions: A Guide to Critical ThinkingM. Neil Browne & Stuart M. Keeley

72

ALL of these things apply to testing, too.

Critical Thinking AboutCommon Beliefs About Testing

• Every test must have an expected, predicted result.

• Effective testing requires complete, clear, consistent, and unambiguous specifications.

• Bugs found earlier cost less to fix than bugs found later.

• Testers are the quality gatekeepers for a product.

• Repeated tests are fundamentally more valuable.

• You can’t manage what you can’t measure.

• Testing at boundary values is the best way to find bugs.


35

Critical Thinking AboutCommon Beliefs About Testing

• Test documentation is needed to deflect legal liability.

• The more bugs testers find before release, the better the testing effort has been.

• Rigorous planning is essential for good testing.

• Exploratory testing is unstructured testing, and is therefore unreliable.

• Adopting best practices will guarantee that we do a good job of testing.

• Step by step instructions are necessary to make testing a repeatable process.

Critical thinking about practicesWhat does “best practice” mean?

• Someone:Who is it? What do they know?

• Believes:What specifically is the basis of their belief?

• You: Is their belief applicable to you?

• Might: How likely is the suffering to occur?

• Suffer: So what? Maybe it’s worth it.

• Unless: Really? There’s no alternative?

• You do this practice:What does it mean to “do” it? What does it cost? What are the side effects? What if you do it badly? What if you do something else really well?


36

Beware of…

• Numbers: “We cut test time by 94%.”

• Documentation: “You must have a written plan.”

• Judgments: “That project was chaotic. This project was a success.”

• Behavior Claims: “Our testers follow test plans.”

• Terminology: Exactly what is a “test plan?”

• Contempt for Current Practice: CMM Level 1 (initial) vs.

CMM level 2 (repeatable)

• Unqualified Claims: “A subjective and unquantifiable requirement

is not testable.”

Look For…

• Context: “This practice is useful when you want the power of creative testing but you need high accountability, too.”

• People: “The test manager must be enthusiastic and a real hands‐on leader or this won’t work very well.”

• Skill: “This practice requires the ability to tell a complete story about testing: coverage, techniques, and evaluation methods.”

• Learning Curve: “It took a good three months for the testers to get good at producing test session reports.”

• Caveats: “The metrics are useless unless the test manager holds daily debriefings.”

• Alternatives: “If you don’t need the metrics, you ditch the daily debriefings and the specifically formatted reports.”

• Agendas: “I run a testing business, specializing in exploratory testing.”


37

The Regression Testing Fantasy

“I rerun my old tests to ensure that nothing has broken.”

The Regression Testing Fantasy

“I rerun my old tests to ensure that nothing has broken.”


38

The Regression Testing Reality

“We run a smattering of old checks to ensure that they still find no bugs... And we assume that any bug not foundis also not important.”

Critical Thinking About Processes

• This is a description of a bug investigation process that a particular company uses. Does it make sense?

See James Bach, Investigating Bugs: A Testing Skills Study http://www.satisfice.com/articles/investigating‐bugs.pdf


39

Appendices

Critical Thinking About Projects

• “You will have five weeks to test the product”

5 weeks


40

ExerciseEvents Testing

• You want to test the interaction between two potentially overlapping events.

• How would you test this?

time

Event A

Event B

Some Common Thinking Errors

• Reification Error

– giving a name to a concept, and then believing it has an objective existence in the world

– ascribing material attributes to mental constructs—“that product has quality”

– mistaking relationships for things—“its purpose is…”

– purpose and quality are relationships, not attributes; they depend on the person

– how can we count ideas? how can we quantify relationships?


41


• Fundamental Attribution Error– “it always works that way”; “he’s a jerk”

– failure to recognize that circumstance and context play a part in behaviour and effects

• The Similarity‐Uniqueness Paradox– “all companies are like ours”; “no companies are like ours”

– failure to consider that everything incorporates similarities and differences

• Missing Multiple Paths of Causation– “A causes B” (even though C and D are also required)

Some Common Thinking Errors• Assuming that effects are linear with causes

– “If we have 20% more traffic, throughput will slow by 20%”

– this kind of error ignores non‐linearity and feedback loops—c.f. general systems

• Reactivity Bias– the act of observing affects the observed– a.k.a. “Heisenbugs”, the Hawthorne Effect

• The Probabilistic Fallacy– confusing unpredictability and randomness– after the third hurricane hits Florida, is it time to relax?


42


• Binary Thinking Error / False Dilemmas– “all manual tests are bad”; “that idea never works”

– failure to consider gray areas; belief that something is either entirely something or entirely not

• Unidirectional Thinking– expresses itself in testing as a belief that “the application works”

– failure to consider the opposite: what if the application fails?

– to find problems, we need to be able to imagine that they might exist


• Availability Bias– the tendency to favor prominent or vivid instances in making a decision or evaluation

– example: people are afraid to fly, yet automobiles are far more dangerous per passenger mile

– to a tech support person (or to some testers), the product always seems completely broken

– spectacular failures often get more attention than grinding little bugs

• Confusing concurrence with correlation– “A and B happen at the same time; they must be related”


43


• Nominal Fallacies– believing that we know something well because we can name it

• “equivalence classes”

– believing that we don’t know something because we don’t have a name for it at our fingertips

• “the principle of concomitant variation”; “inattentionalblindness”

• Evaluative Bias of Language– failure to recognize the spin of word choices– …or an attempt to game it– “our product is full‐featured; theirs is bloated”


• Selectivity Bias– choosing data (beforehand) that fits your preconceptions or mission

– ignoring data that doesn’t fit

• Assimilation Bias– modifying the data or observation (afterwards) to fit the model

– grouping distinct things under one conceptual umbrella– Jerry Weinberg refers to this as “lumping”– for testers, the risk is in identifying setup, pinpointing, investigating, reporting, and fixing as “testing”


44


• Narrative Bias– a.k.a “post hoc, ergo propter hoc”

– explaining causation after the facts are in

• The Ludic Fallacy– confusing complex human activities with random, roll‐of‐the‐dice games

– “Our project has a two‐in‐three chance of success”

• Confusing correlation with causation– “When I change A, B changes; therefore A must be causing B”


• Automation bias

– people have a tendency to believe in results from an automated process out of all proportion to validity

• Formatting bias

– It’s more credible when it’s on a nicely formatted spreadsheet or document

– (I made this one up)

• Survivorship bias

– we record and remember results from projects (or people) who survived

– the survivors prayed to Neptune, but so did the sailors who died

– What was the bug rate for projects that were cancelled?


45

Do you prefer A or B?

Imagine that the US is preparing for the outbreak of an unusual Asian disease, which is expected to kill 600 people. Two alternative programs to combat the disease have been proposed. Assume that the exact scientific estimates of the consequences of the programs are as follows.

Program A: If Program A is adopted, 200 people will be saved.

Program B: If Program B is adopted, there is 1/3 probability that 600 people will be saved, and 2/3 probability that no people will be saved.

See Daniel Kahneman, Thinking Fast and Slow

Do you prefer C or D?

Imagine two more programs to combat the disease are proposed:

Program C: If Program C is adopted, 400 people will die.

Program D: If Program D is adopted, there is 1/3 probability that 600 people will be saved, and 2/3 probability that no people will be saved.


46

A = C B = D A > B D > C

Program A: If Program A is adopted, 200 people will be saved.(3/4 surveyed prefer this to B)

Program B: If Program B is adopted, there is 1/3 probability that 600 people will be saved, and 2/3 probability that no people will be saved.

Program C: If Program A is adopted, 400 people will die.

Program D: If Program B is adopted, there is 1/3 probability that 600 people will be saved, and 2/3 probability that no people will be saved.(3/4 surveyed prefer this to C)

Isn’t it all “just semantics”?

• What does “semantics” mean?– Semantics is the branch of linguistics concerned with logic and meaning, so dismissing something as “semantics” is cool as long as logic and meaning don’t matter to you.

• How many “events” were there when the twin towers were hit? Does it matter?– To insurers and property owners (who were insured per event), it matters a great deal.

• What’s the difference between “ad hoc” testing and “exploratory testing”? Don’t they mean the same thing?– We can tell you what the structures and skills of exploratory testing are. Can you tell us what the structures skills of “ad hoc” testing are?


47

Tacit vs. Explicit Knowledge

• Explicit knowledge is knowledge that has been told

• Tacit knowledge comes in three forms– Relational (“weak”) tacit knowledge: knowledge, residing in a human mind, that could be told, but has not been for various reasons

– Somatic (“medium”) tacit knowledge: knowledge that comes from residing in a human body

– Collective (“strong”) tacit knowledge: knowledge that is embodied in society

See Harry Collins, Tacit and Explicit Knowledge

The Role of Tacit Knowledge

• Although people tend to favour the explicit (because we can talk about it relatively easily), much of what we do in testing is based on using and developing tacit knowledge.

• A key problem with test‐case‐focused testing is that much of the tacit knowledge necessary for excellent testing cannot be encoded.

What are the elements of tacit knowledge in your testing? What parts can be made explicit?What parts cannot?


48

Conditional ProbabilityReasoning about it is HARD.

Unhappy test result

Th

ese

peo

ple

hav

e th

e d

isea

se

These people are fine.Happy test result

These people are fine too.

Conditional ProbabilityUh-oh.

Yay

! B

ut…

Oo

op

s. M

isse

d it

.

No problemo.Yay!

Sorry we freaked you out.Bad

news.

Try considering the number of people in the overall population who have the disease BEFORE considering the test result.

critical thinking for software testers

Technology

testing software

trainer michael bolton

testing teamsboth

software testers

contact michael

years of experience

geometric figures

isver itsgottobe farmer