l6 2019 organization toolsfileadmin.cs.lth.se/cs/education/etsn20/lectures/l6_organization_t… ·...

47
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group Software Testing ETSN20 http://cs.lth.se/etsn20 Tools; Naik 12.10-12.16, Jonsson Organization: Naik16.1-16.4 Professor Per Runeson

Upload: others

Post on 30-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Software TestingETSN20http://cs.lth.se/etsn20

Tools; Naik 12.10-12.16, JonssonOrganization: Naik16.1-16.4

Professor Per Runeson

Page 2: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Lecture

• Chapter 12.10-12.15– Tools– Test automation

• Chapter 16.1-16.4– Organization– Competences

Page 3: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Test tools – the tester’s workbenchPhoto: CC avotius at Flickr

Page 4: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Tools – the workbench

• Good at repeating tasks

• Good at organising data

• Requires training• Introduced

incrementally• No “silver bullet”

Evaluation criteria• Ease of use• Power• Robustness• Functionality• Ease of insertion• Quality of support• Cost• Company policies and goals

Page 5: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Test tools – by process

Test management toolsDefect management tools

Testexecution and comparison tools

Debuggingtools

Coverage toolsStatic analysis tools

Test design tools

Architectural design

Detailed design

Code

Requirement specification

Unit test

Integration test

Performancesimulator tools

System test

Acceptance test

[Redrawn from Fewster and Graham. Software Test Automation, 1999]

Page 6: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Test tools – by example

• Test execution tools• Static analysis tools• Test management tools• Defect management tools• Test design tools• Performace simulator tools• Debugging tools• Coverage tools

• xUnit, Selenium• lint• HP Quality Center• Jira• ACTS• LoadRunner• embedded in IDE• EclEmma

Page 7: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Choosing a toolTest tool evaluation criteria• Test development – languague, scripts• Test maintenance – version control, browse, tags• Test execution – sequencing, control, integration• Test results – logging, mapping to versions,

analysis• Test management – storage, authorization• GUI testing – recording, modification• Vendor – sustainability, responsiveness• Price – life cycle costs vs value

Page 8: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Buy or share?It is More Blessed to Give than to Receive –

Open Software Tools Enable Open Innovation

Per Runeson1, Hussan Munir1 and Krzysztof Wnuk2

1Lund University, 2Blekinge Institute of Technology

ABSTRACT

Open Innovation (OI) has attracted scholarly interest from a wide range of disciplines since intro-duced by Chesbrough [1], i.e. ”a paradigm that assumes that firms can and should use external ideasas well as internal ideas, and internal and external paths to market, as they look to advance theirtechnology”. However, OI remains unexplored for software engineering (SE), although widespreadin practice through Open Source Software (OSS). We studied the relation between SE and OI andin particular how OSS tools impact on software-intensive organization’s innovation capability.

We surveyed the literature on SE and OI [3] and found that studies conclude that start-ups havehigher tendency to opt for OI compared to established companies. The literature also suggests thatfirms assimilating external knowledge into their internal R&D activities, have higher likelihood ofgaining financial advantages.

In a case study, we observed how OSS tools Jenkins and Gerrit enabled open innovation [2]. Wemined software commits to identify major contributors, found them be a�liated to Sony Mobile,contacted five of them for interviews about their and their employer’s principles and practices withrespect to OI and tools, which they gave a consistent view of.

Our findings indicate that the company’s transition to OI was part of a major paradigm shifttowards OSS, while the adoption of open tools was driven bottom up by engineers with supportfrom management. By adopting OI, Sony Mobile achieved freed-up developers’ time, better qualityassurance, inner source initiatives, flexible development environment, faster releases and upgrades.Particularly, the introduction of a test framework was proposed by Sony Mobile but implementedby other contributors [2]. However, the benefits are gained through investing significant attentionand resources to the OSS community in terms of technical contributions and leadership.

BODY

Sharing software tools enables open innovation, brings faster upgrades andfrees up resources, but demands investments in the open community

REFERENCES

[1] H. W. Chesbrough. Open innovation: The new imperative for creating and profiting from

technology. Harvard Business School Press, Boston, Mass., 2003.[2] H. Munir and P. Runeson. Software testing in open innovation: An exploratory case study of

the acceptance test harness for Jenkins. In Proceedings of the 2015 International Conference

on Software and System Process, ICSSP 2015, pages 187–191, New York, NY, USA, 2015.ACM.

[3] H. Munir, K. Wnuk, and P. Runeson. Open innovation in software engineering: A systematicmapping study. Empirical Software Engineering, DOI 10.1007/s10664-015-9380-x, 2015.

Volume 4 of Tiny Transactions on Computer Science

This content is released under the Creative Commons Attribution-NonCommercial ShareAlike License. Permission tomake digital or hard copies of all or part of this work is granted without fee provided that copies are not made ordistributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.CC BY-NC-SA 3.0: http://creativecommons.org/licenses/by-nc-sa/3.0/.

Page 9: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Consequence of bad toolsCognitive load drivers

Cognitive Load Drivers in Large Scale Software Development – An industrial case study

Research QuestionsRQ1 Which types of cognitive load

drivers can be observed in large-

scale software engineering, primarily

as a consequence of tool use?

RQ2 How do software engineers

perceive the identified cognitive

load drivers in their digital work

environment?

Daniel Helgesson, Emelie Engström, Per Runeson, Elizabeth BjarnasonDept. of Computer Science, Lund University Lund, Sweden <firstname.lastname>@cs.th.se

Research Method1. Literature review2. Interview study

(test manager, 2 testers)

3. Extended interview study (tool architect, SW developer)

4. Knowledge synthesis

Conclusions• Little published beyond program comprehension

• Cognitive science relevant for software engineering

• Evidence of cognitive load in large-scale software engineering (RQ1)

• Cognitive load is indeed a problem for practitioners (RQ2)

Identified Cognitive Load DriversTABLE ICOGNITIVE LOAD DRIVERS IN SOFTWARE ENGINEERING, STRUCTURED ACCORDING TO THE THEMATIC ANALYSIS.

Main Clusters Themes Items DescriptionWork process — Lack of automation Absence of automated tool support (e.g. automated testing) forcing the user to do

work manually.Wasted effort Unnecessary or redundant work mandated by absence of tool support or by process.Ad hoc Tool support (or processes) implemented differently in different parts of the

organisation.Lack of understanding Missing information or support on account of the shifting nature of a large

organization.Tool Intrinsic Adaptation/Suitability Use of tools that are not really suited for the purpose.

Lack of functionality Use of tools missing functionality needed to solve a task efficiently.Stability/Reliability Use of tools that suffer from stability or reliability issues.Overlap Use of several tools that can do the same thing, or almost the same thing, in parallel.Lack of Integration Use of several tools, in parallel, that are not (or poorly) integrated, forcing the user

to do redundant and/or manual work.Comprehension Actually understanding what needs to be done in the tool in order to complete a

task.Delay Response (micro) Delays in response forcing the user to stay overly focused, putting a strain on short

term memory.Downtime (macro) Tools or systems that are completely unresponsive for a longer period than a few

seconds/minutes.Interaction Unintuitive Functionality (or interaction) is implemented in an unintuitive way.

Inconsistent Functionality (or interaction) is inconsistently implemented in two different toolsor in two different views of the same tool.

Cumbersome Functionality (or interaction) is implemented in a way that users find clumsy.Information Integrity Incompleteness Lack of complete information is causing the user to spend effort in asserting that

information is complete.Reliability Lack of reliable information is causing the user to spend effort in asserting that

information is correct and up-to-date.Temporal traceability The user needs to bridge a temporal gap in order to assess the current situation.

Organisation Location Where to find the information.Distribution Where to store and whom to distribute the information to.Retrieval How to access the information and retrieve it.Overview/zoom How to navigate the information.Structure How the information is organised.

TABLE IIMAPPING OF OUR MAIN CLUSTERS FROM TABLE I AND GULLIKSEN ET

AL’S COGNITIVE WORK ENVIRONMENT PROBLEMS [4]

Cognitive work environment problems

Wor

kpr

oces

s

Tool

s

Info

rmat

ion

Unneccessary cognitive load and interruption ofthought process

– x –

Unneccessary strain on working memory – x –Problems orientatiing and lack of overview – x xIdentifying and interpreting information – x xDecision making/support – x xDifficulties with time coordination of data – x xWork processes determined by tools – x –Many unintegrated information systems x x xPoor support for learning x – –Lack of understanding automation x – –Difficulties with different system modes N.A. N.A. N.A.

* !?!

Software engineers handle a lot of information in their daily work.

We explore how software engineers interact with information

management systems/tools, and to what extent these systems

expose users to increased cognitive load.

Cognitive Load Drivers in Large Scale Software Development – An industrial case study

Research QuestionsRQ1 Which types of cognitive load

drivers can be observed in large-

scale software engineering, primarily

as a consequence of tool use?

RQ2 How do software engineers

perceive the identified cognitive

load drivers in their digital work

environment?

Daniel Helgesson, Emelie Engström, Per Runeson, Elizabeth BjarnasonDept. of Computer Science, Lund University Lund, Sweden <firstname.lastname>@cs.th.se

Research Method1. Literature review2. Interview study

(test manager, 2 testers)

3. Extended interview study (tool architect, SW developer)

4. Knowledge synthesis

Conclusions• Little published beyond program comprehension

• Cognitive science relevant for software engineering

• Evidence of cognitive load in large-scale software engineering (RQ1)

• Cognitive load is indeed a problem for practitioners (RQ2)

Identified Cognitive Load DriversTABLE ICOGNITIVE LOAD DRIVERS IN SOFTWARE ENGINEERING, STRUCTURED ACCORDING TO THE THEMATIC ANALYSIS.

Main Clusters Themes Items DescriptionWork process — Lack of automation Absence of automated tool support (e.g. automated testing) forcing the user to do

work manually.Wasted effort Unnecessary or redundant work mandated by absence of tool support or by process.Ad hoc Tool support (or processes) implemented differently in different parts of the

organisation.Lack of understanding Missing information or support on account of the shifting nature of a large

organization.Tool Intrinsic Adaptation/Suitability Use of tools that are not really suited for the purpose.

Lack of functionality Use of tools missing functionality needed to solve a task efficiently.Stability/Reliability Use of tools that suffer from stability or reliability issues.Overlap Use of several tools that can do the same thing, or almost the same thing, in parallel.Lack of Integration Use of several tools, in parallel, that are not (or poorly) integrated, forcing the user

to do redundant and/or manual work.Comprehension Actually understanding what needs to be done in the tool in order to complete a

task.Delay Response (micro) Delays in response forcing the user to stay overly focused, putting a strain on short

term memory.Downtime (macro) Tools or systems that are completely unresponsive for a longer period than a few

seconds/minutes.Interaction Unintuitive Functionality (or interaction) is implemented in an unintuitive way.

Inconsistent Functionality (or interaction) is inconsistently implemented in two different toolsor in two different views of the same tool.

Cumbersome Functionality (or interaction) is implemented in a way that users find clumsy.Information Integrity Incompleteness Lack of complete information is causing the user to spend effort in asserting that

information is complete.Reliability Lack of reliable information is causing the user to spend effort in asserting that

information is correct and up-to-date.Temporal traceability The user needs to bridge a temporal gap in order to assess the current situation.

Organisation Location Where to find the information.Distribution Where to store and whom to distribute the information to.Retrieval How to access the information and retrieve it.Overview/zoom How to navigate the information.Structure How the information is organised.

TABLE IIMAPPING OF OUR MAIN CLUSTERS FROM TABLE I AND GULLIKSEN ET

AL’S COGNITIVE WORK ENVIRONMENT PROBLEMS [4]

Cognitive work environment problems

Wor

kpr

oces

s

Tool

s

Info

rmat

ion

Unneccessary cognitive load and interruption ofthought process

– x –

Unneccessary strain on working memory – x –Problems orientatiing and lack of overview – x xIdentifying and interpreting information – x xDecision making/support – x xDifficulties with time coordination of data – x xWork processes determined by tools – x –Many unintegrated information systems x x xPoor support for learning x – –Lack of understanding automation x – –Difficulties with different system modes N.A. N.A. N.A.

* !?!

Software engineers handle a lot of information in their daily work.

We explore how software engineers interact with information

management systems/tools, and to what extent these systems

expose users to increased cognitive load.

Cognitive Load Drivers in Large Scale Software

DevelopmentDaniel Helgesson

Dept. of Computer ScienceLund UniversityLund, [email protected]

Emelie EngstromDept. of Computer Science

Lund UniversityLund, [email protected]

Per RunesonDept. of Computer ScienceLund UniversityLund, [email protected]

Elizabeth BjarnasonDept. of Computer Science

Lund UniversityLund, [email protected]

Abstract—Software engineers handle a lot of information in

their daily work. We explore how software engineers interact with

information management systems/tools, and to what extent these

systems expose users to increased cognitive load. We reviewed the

literature of cognitive aspects, relevant for software engineering,

and performed an exploratory case study on how software engi-

neers perceive information systems. Data was collected through

five semistructured interviews. We present empirical evidence of

the presence of cognitive load drivers, as a consequence of tool

use in large scale software engineering.

Index Terms—Cognition, Cognitive Load, Software Develop-

ment, Software Engineering, Software Development Tools, Soft-

ware Engineering Tools, Industrial Case StudyI. INTRODUCTION

Software engineering is a socio-technical endeavour where

the technical side of the phenomena seems to be more studied

than the social side [1], and as a consequence knowledge of

a cognitive/ergonomic perspective of software development,

and the tools associated with these activities, appears rather

small. Further, we see no clear indications of a significant

impression on the software engineering community in terms

of understanding the cognitive work environment of software

engineers [2] [3].In a 2002 dissertation, Walenstein observed that there is

a need for cognitive reasoning in the design process of

software development tools, and further that there has been

little research done in the area [2], a claim largely substantiated

by Lenberg et al. [1].More recently, in a 2015 report ’Digital Work Environ-

ment’, Gulliksen et al. made an effort to analyse the societal

consequences of large-scale digitalisation of human labour,

in general [4]. In the report the authors present a literature

survey, providing updated insight into the research area. The

survey found only 36 relevant articles. In addition, the authors

also present a taxonomy of ’Cognitive work environment

problems’.In this study we aim to explore, and establish, a broader

understanding of the cognitive work environment of software

engineers and the cognitive dimensions of the tools used.

Specifically, we aim to explore cognitive load, induced on

users by information systems or tools. We present results from

an exploratory industrial case study based on thematic analysis

of interviews, as well as a literature overview. Our contribution

lies in presenting in vivo observation of cognitive problems

associated with tool use in large-scale software engineering.

II. RESEARCH QUESTIONS

The purpose of this study is to gain insight into the

problem domain of cognitive load, primarily as a consequence

of tool use, in large scale software development. Hence it

is exploratory in nature, and focuses on two tools central

for communication and knowledge management at the case

company.The overall exploratory purpose is refined into two research

questions:RQ1 Which types of cognitive load drivers can be observed

in large-scale software engineering, primarily as a con-

sequence of tool use?RQ2 How do software engineers perceive the identified cog-

nitive load drivers in their digital work environment?

The research questions are anchored in software engineering

and cognitive science literature, and addressed by interviewing

practitioners. The first question uses the cognitive literature as

a lens, while presenting empirical observations from the inter-

view material. The second question reports the interviewees’

perception of problems found in RQ1.III. METHOD

We conducted a four stage case study, using a flexible

design [5]; consisting of literature review, interview study,

extended interview study, and knowledge synthesis. To mature

the knowledge, we iterated reviewing literature, conducting

interviews, transcribing and analysing data. Figure 1 describes

the study1.The case company is an international corporation, the

studied division develops consumer products in the Android

ecosystem. The software is embedded in handheld devices.

The studied development site of the company has 1000 em-

ployees and developers work in cross-functional teams using

an agile development process. The development environment

is primarily based on the toolchain associated with the Android

ecosystem.1Please see preprint for details of study, including in-

terview guide, table of result and table of validation

https://lucris.lub.lu.se/admin/files/61811452/cog load manuscript.pdf

Page 10: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Test executionWhat to automate?

Test cases that are• Less volatile• Repeatable• High risk• Easy to automate• Manually difficult• Boring and time consuming

396 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION

The general guideline shown in Figure 12.4 may be used in evaluating the suitabilityof test cases to be automated as follows:

Less Volatile: A test case is stable and is unlikely to change over time. Thetest case should have been executed manually before. It is expected thatthe test steps and the pass–fail criteria are not likely to change any more.

Repeatability : Test cases that are going to be executed several times shouldbe automated. However, one-time test cases should not be considered forautomation. Poorly designed test cases which tend to be difficult to reuseare not economical for automation.

High Risk : High-risk test cases are those that are routinely rerun after everynew software build. The objectives of these test cases are so important thatone cannot afford to not reexecute them. In some cases the propensity ofthe test cases to break is very high. These test cases are likely to be fruitfulin the long run and are the right candidates for automation.

Easy to Automate: Test cases that are easy to automate using automation toolsshould be automated. Some features of the system are easier to test thanother features, based on the characteristics of a particular tool. Customobjects with graphic and sound features are likely to be more expensiveto automate.

Manually Difficult : Test cases that are very hard to execute manually shouldbe automated. Manual test executions are a big problem, for example,causing eye strain from having to look at too many screens for too long in aGUI test. It is strenuous to look at transient results in real-time applications.These nasty, unpleasant test cases are good candidates for automation.

Boring and Time Consuming : Test cases that are repetitive in nature andneed to be executed for longer periods of time should be automated. Thetester’s time should be utilized in the development of more creative andeffective test cases.

Less volatile

Repeatability

High risk

Easy to automate

Manually dificult

Boring and timeconsuming

Guidelinesfor test

automation

Figure 12.4 Test selection guideline for automation.

Page 11: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Evolution of test automation

1. Recorded Scripts

2. Engineered Scripts

3. Data-driven Testing

4. Keyword-driven Testing

5. Model-based Testing

First Last Data

Pekka Pukaro 1244515

Teemu Tekno 587245

Page 12: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Recorded Scripts• Unstructured• Scripts generated using

capture and replay tools• Scripts non-maintainable,

in practice– If the system changes they

need to be captured again

• Relatively quick to set up

Capture Replay Tools• Record user’s actions to a script

(keyboard, mouse)– Tool specific scripting language

• Scripts access the (user) interface of the software

– Input fields, buttons and other widgets• Simple checks can be created in the

scripts– Existence of texts and objects in the

UI– Data of GUI objects

Page 13: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Engineered Scripts• Scripts are well-designed, modular, robust,

documented, and maintainable• Separation of common tasks

– E.g. setup, teardown, and error detection• Test data is still embedded into the scripts• Code is mostly written manually • Implementation and maintenance require

programming skills• “Just like any other software development project”• Most well-known example: jUnit

Page 14: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Data-Driven Testing

• Test inputs and expected outcomes stored as data– Normally in a tabular format– Test data are read from an external data source

• One driver script can execute all of the designed test cases • External test data can be edited without programming skills

– Test data can now come from business people, customers, …• Avoids the problems of embedded test data

– The tests are hard to understand in the middle of all scripting details

• Updating tests or creating similar tests with slightly different test data always requires programming

– Leads to copy-paste scripting

First Last Data

Pekka Pukaro 1244515

Teemu Tekno 587245

Page 15: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Keyword-Driven Testing• Keywords (= action words) abstract the navigation and actions

from the script• Keywords and test data are read from an external data source• When test cases are executed keywords are interpreted by a test

library which is called by a test automation framework (The test library = the test scripts)

• Keyword-driven testing improves data-driven• Example: Login: admin, t5t56y;

AddCustomers: newCustomers.txtRemoveCustomer: Pekka Pukaro

• Notice : “dirty” details of gui testing are missing• Keyword driven testing ~= domain specific languages (DSL)• Tool: http://code.google.com/p/robotframework/

Page 16: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Model based testing• System under test is modelled

– UML-state machines, domain specific languages (DSL) • Test cases are automatically generated from the model

– More accurate model -> better tests• Generate a large amount of tests that cover the model

– Many different criteria for covering the model– Execution time of test cases might be a factor

• The model can provide also the expected results for the generated tests

• Challenges: – Personnel competencies– Data-intensive systems (cannot be modelled as a state machine)

• Simple MBT tool http://sourceforge.net/projects/graphwalker/

Page 17: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Evolution of system testing approaches

1. Recorded Scripts– Cheap to set up, quick & dirty

2. Engineered Scripts– Structured

3. Data-driven Testing– Data separation

4. Keyword-driven Testing– Action separation, DSL

5. Model-based Testing– Modeling & Automatic

test case generation

First Last Data

Pekka Pukaro 1244515

Teemu Tekno 587245

Page 18: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Automation and oracles• Automated testing depends on the ability to

programmatically detect when the software fails• An automated test is not equivalent to a similar manual test

– Automatic comparison is typically more precise – Automatic comparison will be tripped by irrelevant

discrepancies– The skilled human comparison will sample a wider

range of dimensions, noting oddities that one wouldn't program the computer to detect

• “Our ability to automate testing is fundamentally constrained by our ability to create and use oracles”

Cem Kaner

Page 19: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Types of outcome to compareScreen-based• Character-based

applications• GUI applications

– Correct message, display attributes, displayed correctly

– GUI components and their attributes

• Graphical images– Avoid bitmap comparisons

Disk-based• Comparing text files• Comparing non-textual forms

of data• Comparing databases and

binary filesOthers• Multimedia applications

– Sounds, video clips, animated pictures

Page 20: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Test automation code quality

Less important• Only inhouse - not

delivered tocustomers

• Only temporary

More important• Used to assess

production codequality

• Maintain with code• Volume at least 1:1

Page 21: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Avoid in test code

• Bad assertions• Retry (timing issues)• Sleeps and delays• Hard-coded values and configurations• Wrong level of abstraction• Lack of atomicity (too big tests)

https://dev.to/juperala/review-your-test-automation-code-271a

Page 22: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Automating different steps

Select/Identify test cases to run

Set up test environment- create test environment

- load test data

Repeat for each test case:- set up test prerequisites

- execute- compare results

- log results-analyze test failures

-report defect(s)- clear up after test case

Clear up test environment:- delete unwanted data- save important data

Summarize results

Select/Identify test cases to run

Set up test environment:- create test environment

- load test data

Repeat for each test case:- set up test prerequisites

- execute-compare results

- log results-clear up after test case

Clear up test environment:- delete unwanted data-save important data

Summarize results

Analyze test failuresReport defects

Automated tests Automated testing

Manual process

Automated process

!Redrawn from Fewster et al. Software Test Automation, 1999.

Page 23: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Relationship of testing activities

!Redrawn from Fewster et al. Software Test Automation, 1999.

!Edit tests (maintenance)

!Set up !Execute !Analyze failures !Clear up

!Manual testing

!Same tests automated

!More mature automation

!Time

Page 24: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Automated failure clusteringMaster thesis @ Qlik

Navigating Information Overload Caused byAutomated Testing – A Clustering Approach in

Multi-Branch DevelopmentNicklas Erman

QlikLund, Sweden

[email protected]

Vanja TufvessonAccenture

Copenhagen, Denmarkvanja.tufvesson@

accenture.com

Markus Borg, Anders Ardo and Per RunesonLund University, Sweden

markus.borg, [email protected]@eit.lth.se

Abstract—Background. Test automation is a widely used tech-nique to increase the efficiency of software testing. However,executing more test cases increases the effort required to analyzetest results. At Qlik, automated tests run nightly for up to 20development branches, each containing thousands of test cases,resulting in information overload. Aim. We therefore developa tool that supports the analysis of test results. Method. Wecreate NIOCAT, a tool that clusters similar test case failures,to help the analyst identify underlying causes. To evaluate thetool, experiments on manually created subsets of failed test casesrepresenting different use cases are conducted, and a focus groupmeeting is held with test analysts at Qlik. Results. The casestudy shows that NIOCAT creates accurate clusters, in line withanalyses performed by human analysts. Further, the potentialtime-savings of our approach is confirmed by the participantsin the focus group. Conclusions. NIOCAT provides a feasiblecomplement to current automated testing practices at Qlik byreducing information overload.

Index Terms—Software testing, test automation, test resultanalysis, clustering, case study.

I. INTRODUCTION

When trying to improve software test efficiency, test au-tomation is often brought forward as a key solution [1],[2]. However, while automated testing (auto testing) providesthe benefits of reducing manual testing, minimizing humanerror, and enabling a higher testing frequency [3, p. 466],new challenges are introduced. With higher testing frequencythe volume of test results increases drastically. Consequently,there is a need for tools to navigate the potential informationoverload [4].

Qlik1, a software company in the business intelligencedomain, has adopted auto testing to save time, improvetest coverage and to enable development of new featuresin parallel, while assuring a high quality product. At Qlik,automated tests (autotests) run every night on multiple sourcecode branches using Bamboo (see Section III-B). However,Bamboo simply groups test results based on the test case (TC)names, and it is both difficult and time consuming to manuallyanalyze the large amount of test results.

1http://www.qlik.com

To support the analysis of test results, Qlik developedPasCal, a tool that clusters TC failures based on the errormessage generated by the failed TCs (see Section III-B).However, PasCal still uses a naıve clustering approach: exactmatching of the error messages. Moreover, PasCal was notmainly developed to provide an overview of the auto testing,but to automatically generate new bug reports based on TCfailures on the main development branch.

Although PasCal is an important first step toward improvedanalysis of results from autotests, there are still several openchallenges. First, there is no efficient way to determine thatspecific TCs fail on multiple branches. Second, intermittentfailures due to variations in the testing environment makethe results unreliable, thus triggering re-execution of autotests.Third, concluding that multiple TCs fail because of the sameroot cause is difficult. All three challenges are amplified bythe information overload caused by the auto testing, and thetest analysts request support.

To improve the overview of test results, we developedNIOCAT, a tool that clusters TC failures from auto testing inmulti-branch environments. The clustering goes beyond exactstring matching by calculating relative similarities of textualcontent using the vector space model [5]. Furthermore, wecomplement the TC name and error message by executioninformation, in line with previous work [6].

We evaluate the accuracy of NIOCAT in a case study onthe development of a web application, using three manuallyconstructed scenarios representative for the work of test ana-lysts at Qlik. In the study, we also explore different weightingof textual information and execution information using space-filling experimental design [7]. Also, we qualitatively evaluateNIOCAT through a focus group interview, following the casestudy methodology proposed by Runeson et al. [8].

In this paper, we briefly summarize related work in Sec-tion II and describe the case company in Section III. Sec-tion IV jointly presents NIOCAT and its evaluation in Sec-tion IV. We present the results in Section V, and finallyconclude the paper in Section VI.

Fig. 3. Overview presented by Bamboo showing results from nine consecutiveautotest executions on the same branch. Neither the SUT nor the TCs changedbetween the test runs, indicating an intermittent failure.

ft-personal”, “Ver12.00-dev-ft-ratatosk” and “Ver12.00-dev-ft-ratatosk-responsive-grid”. Since Bamboo does not provide away to cross reference TC failures between the branches,the test analyst must manually navigate into each branch todetermine if the same TC has failed on all three branches.

Intermittent Failures (“Is this really a problem?”): Qlikrefers to TCs that irregularly fail because of variations in thetesting environment (e.g. timing issues caused by unbalancedload of test servers) as “intermittent failures”. Figure 3, alsoa screen shot from Bamboo, shows that consecutive executionof autotests for the branch “main for stability testing” yieldsdifferent results. Note that neither the SUT nor the TCs havechanged between the different runs, but still the test resultsvary. To determine whether a TC failure is due to a “real”problem in the SUT, or to an intermittent failure, typicallythe autotests are executed several times. If an overview of allbranches with a particular TC failure was available, the timespent re-executing the autotests could be saved.

Root Cause Analysis (“Do these TCs fail for the samereason?”): The same TCs can fail in different ways, i.e. thesame TC may fail in different ways in different branches.For example, a six-step TC could fail at any step, but stillthe same TC name would be presented by Bamboo. Toidentify differences between the two TC failures, additionalinformation about the failure, such as the error message, has tobe taken into account. Similarly, two different TCs might failin a step that both TCs have in common, e.g. an intial setup

Fig. 4. NIOCAT generates a clustering of TC failures based on user selectedautotest results.

step. These problems should not be treated as two differentissues, as the common trigger is the setup phase. Again, anaıve comparison using only the TC name would not identifythis common root cause. A clustering of all TCs that fail duringthe same step, i.e. share a common root cause, would supporta timely resolution of the issue.

IV. STUDY DESIGN AND SOLUTION APPROACH

As the development and evaluation of NIOCAT were tightlyconnected, this section contains a joint presentation. First,based on the background and challenges described in SectionIII, we state two research questions. Then, the rest of thissection presents the details of NIOCAT, the correspondingevaluation, and the major threats to validity.

RQ1 How can clustering of test case failures help test analystsat Qlik navigate the information overload caused byautomated testing?

RQ2 Can execution data be used in addition to textual infor-mation to improve the clustering of test case failures?

A. Solution Approach – NIOCAT

Our approach to support the test analysts at Qlik is to introducea tool for high-level analysis of autotest results. We name thetool NIOCAT – Navigating Information Overload Caused byAutomated Testing. The output from NIOCAT is a clusteringof TC failures from a user selected set of autotest results.NIOCAT aims to group similar TCs failures, i.e. each clustershould represent a unique issue in the SUT, containing one orseveral TC failures.

Figure 4 illustrates how NIOCAT processes autotest resultsfrom multiple branches to generate clusters of TC failures.The small circles represent TC failures, whereas larger circlesdepict TC failures that have been grouped together. To supportinteractive navigation of the NIOCAT output, we use QlikView(see Section III) to present the results.

Clustering TC failures provides the test analysts a startingpoint for further investigation. Test analysts can use NIOCATin different ways, what use case is supported depends on theanalyst’s choice of input autotest results. The use case for adevelopment team leader might be to analyze data from testruns within the last seven days, for the team’s branch only.A configuration manager on the other hand, might look for a

Page 25: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Automated bug assignment

Empir Software Eng

sent down to the second layer. If the customer support organization believes a CSR to bea fault in the product, they file a bug report based on the CSR in the second layer BTS. Inthis way, the second layer organization can focus on issues that are likely to be faults in thesoftware. In spite of this approach, some bug reports can be configuration issues or otherproblems not directly related to faults in the code. In this study, we have only used data fromthe second layer BTS, but there is nothing in principle that prevents the same approach to beused on the first layer CSR’s. The BTS is the central point in the bug handling process andthere are several process descriptions for the various employee roles. Tracking of analysis,implementation proposals, testing, and verification are all coordinated through the BTS.

4.3 State-of-Practice Bug Assignment: A Manual Process

The bug handling process of both Company Automation and Telecom are substantiallymore complex than the standard process described by Bugzilla (Mozilla 2013). The twoprocesses are characterized by the development contexts of the organizations. CompanyAutomation develops safety-critical systems, and the bug handling process must thereforeadhere to safety standards as described in Section 4.1. The standards put strict require-ments on how software is allowed to be modified, including rigorous change impactanalyses with focus on traceability. In Company Telecom on the other hand, the sheersize of both the system under development and the organization itself are reflected onthe bug handling process. The resource allocation in Company Telecom is complex andinvolves advanced routing in a hierarchical organization to a number of developmentteams.

We generalize the bug handling processes in the two case companies and present anoverview model of the currently manual process in Fig. 4. In general, three actors can filebug reports: i) the developers of the systems, ii) the internal testing organization, and iii)customers that file bug reports via helpdesk functions. A submitted bug report starts in a bugtriaging stage. As the next step, the Change Control Board (CCB) assigns the bug reportto a development team for investigation. The leader of the receiving team then assigns thebug report to an individual developer. Unfortunately, the bug reports often end up with thewrong developer, thus bug tossing (i.e., bug report re-assignment) is common, especiallybetween teams. The longer history the BTS stores about the bug tossing that takes place, themore detailed estimate of the savings of our approach can be done. With only the last entrysaved in the BTS one can estimate the prediction accuracy of the system but to calculate thefull saving, the full bug tossing history is needed.

Development

Test

Customer Support

Bug

Tracking System (BTS)

Team 1

Team N

Developer 1

Developer N

Team Leader CCB

New Bug Report

New Bug Report

New Bug Report

Automatic Assignment

Bug Tossing

Fig. 4 A simplified model of bug assignment in a proprietary context

Jonsson, 2016

Page 26: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Test automation promises

1. Efficient regression test2. Run tests more often3. Perform difficult tests (e.g. load, outcome check)4. Better use of resources5. Consistency and repeatability6. Reuse of tests7. Earlier time to market8. Increased confidence

Page 27: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Common problems

1. Unrealistic expectations2. Poor testing practice

”Automatic chaos just gives faster chaos”3. Expected effectiveness4. False sense of security5. Maintenance of automatic tests6. Technical problems (e.g. Interoperability)7. Organizational issues

Page 28: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

What can be automated?

IntellectualPerformed once

RepeatedClerical

1. Identify

2. Design

3. Build

4. Execute

5. Check

Page 29: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Limits of automated testing

• Does not replace manual testing• Manual tests find more defects than automated

tests– Does not improve effectiveness

• Greater reliance on quality of tests– Oracle problem

• Test automation may limit the software development

– Costs of maintaining automated tests

Page 30: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

What is ET?Exploratory software testing (ET) is a style of software testing that emphasizes the personal freedom and responsibility of the individual tester to continually optimize the value of her work by treating test-related learning, test design, test execution, and test result interpretation as mutually supportive activities that run in parallel throughout the project.

Cem Kaner

Page 31: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Sounds promising…

…but…– impossible to automate– highly dependent on tester skills– hard to replicate failures (if testing is not traced)

And, do we really know?

Page 32: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Variations of Exploratory Testing

Freestyle Pure scripted

Test object only

Test object, test steps, test

data

Test goals, constraints

Page 33: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Formal Training in Exploratory Testing

• Experiment with 20 professionals [Micalef 2016]– with/without formal test training– 20 injected faults in e-commerce system– up to 40 minute session with eye-tracking device

TABLE IIIPARTICIPANT DEMOGRAPHICS AND BACKGROUND INFORMATION (G:

GENDER, A: AGE, E: YEARS OF EXPERIENCE IN A TESTING ROLE)

Carmen GeorgeID Domain G A E Domain G A E1 Design M 36 0 E-comm F 31 32 E-comm F 51 1 Payments M 25 3.53 Content M 30 0 E-comm M 39 134 E-comm M 34 0 Payments M 27 1.55 E-comm F 37 0 Telco M 32 36 Gaming M 26 1 Payments F 23 37 E-comm F 26 0 E-comm F 27 78 E-comm F 29 0 Payments M 31 29 E-comm F 24 0 Networks M 28 6

10 Virtualiz. M 24 2 Payments M 21 1Median 29.5 0 Median 27.5 3

Mean 31.7 0.4 Mean 28.4 4.3

were allowed to test the system (exploratory) for up to40 minutes whilst a researcher observed their activitiesfrom a remote monitor (and taking notes unobtrusively).The participant’s terminal was equipped with an eye-tracking device (Tobii X-120), a camera (with face-tracking capabilities) as well as a microphone (see Figure3). Participants were encouraged to think-aloud duringthe test session, commenting on specific bugs beingdiscovered as well as general issues with the system. TobiiStudio (Eye Tracking Software for Analysis) was usedto capture eye-gaze data together with mouse activity,audio and video streams. Most participants suggestedimprovements to the e-commerce site while others foundbugs that were not intentionally injected as part of thestudy. Think-aloud allows participants to focus on theirprimary task without the need to interrupt their workflowto log bugs (manually or online). A researcher tooknote of any points of interest (POIs) that may haveoccurred during test session (e.g., participant’s gaze wasfixed for a long time on a specific element) in whichcase the participant was invited for a reflective think-aloud (RTA) session right after the test session. Here theresearcher plays back portions of the session (POIs) tothe participant upon which a discussion ensues. This addsa second layer of understanding to the otherwise sterilegaze-data. Following the test session (and RTA, if appli-cable), a short debriefing exercise is conducted wherebythe researcher concludes the session through a short semi-structured interview (capturing participants’ biographicinformation, knowledge of exploratory testing strategiesas well as insights into their professional experience).

Data processing1) Reviewing raw data: Over 13 hours of gaze-point data

together with the corresponding audio and video streamswere reviewed systematically using a set of predefinedscoring sheets (see Figure 4). For each session theresearcher took note of a) predefined observable be-havioural patterns (e.g., tester hovered around the same

Fig. 3. Participant eye-tracking terminal (left) and remote monitoring station(right)

area where a bug was previously found), b) bugs reported(if at all) and c) corresponding timestamps (down to aminute by minute granularity).

Fig. 4. Scoring sheet used during the data processing stage. This helpedthe researcher to systematically annotate observed behavioural patterns (fromvideo, audio and gaze-data) together with bugs reported by each participant.

2) Data staging: Strategies were abstracted as a sequentialset of behavioural patterns (e.g., ES2 = [BP21, BP5,BP22, BP20]), therefore allowing the researcher to mapa series of observed participant behavioural patterns toa probability that a particular strategy was being usedduring a specific timeframe (knowingly or unknowingly).Furthermore the exploratory strategies selected for thisstudy were grouped into three broad categories: guidedstrategies, semi-guided strategies and unguided strategiesin descending order of rigour and technical know-howrequired for each strategy. Given that we were monitoringthe use of seven strategies by twenty participants in thestudy, it was decided that it would be more manageable togroup strategies together based on where they lay on theguided/unguided scale as depicted in Figure 5. Based onthis scale, ES1 and ES3 were classified as unguided, ES2,ES4 and ES7 were classified as guided whilst ES6 andES5 were classified as semi-guided because they exhibita balanced amount of characteristics from both extremes.The resulting information was synthesised in a tabularformat which represented a minute by minute log ofwhich tester type was using which strategy type and

309

Do Exploratory Testers Need Formal Training?

An Investigation Using HCI Techniques

Mark MicallefFaculty of ICTUniversity of [email protected]

Chris PorterFaculty of ICTUniversity of [email protected]

Andrea BorgFaculty of ICTUniversity of [email protected]

Abstract—Exploratory software testing is an activity which

can be carried out by both untrained and formally trained

testers. We personify the former as Carmen and the latter as

George. In this paper, we outline a joint research exercise between

industry and academia that contributes to the body of knowledge

by (1) proposing a data gathering and processing methodology

which leverages HCI techniques to characterise the differences in

strategies utilised by Carmen and George when approaching an

exploratory testing task; and (2) present the findings of an initial

study amongst twenty participants, ten formally trained testers

and another ten with no formal training. Our results shed light

on the types of strategies used by each type of tester, how they are

used, the effectiveness of each type of strategy in terms of finding

bugs, and the types of bugs each tester/strategy combination

uncovers. We also demonstrate how our methodology can be

used to help assemble and manage exploratory testing teams in

the real world.

I. INTRODUCTION

Carmen and George are both employed as software testers.

However, whilst George is formally trained and certified,

Carmen has no training but got the job because she is ‘good at

finding bugs’. A heated debate about whether or not software

testers should be formally trained and certified was sparked

by the announcement in 2011 of the development of a new

ISO standard on software testing [1] and further fuelled by

the perception that unlike other roles in software engineering,

testing can be carried out by anyone, providing they under-

stand an application’s domain. One side argues that certified

testers will approach testing with discipline and consistency

whilst the other argues that much like a driving license does

not make one a good driver, a testing certification does not

guarantee a good tester. Furthermore, end-users regularly find

and report bugs in systems even though they are not trained

as testing professionals. Testing encompasses a wide range of

skills (e.g., planning, design, automation, exploratory testing),

proficiency in most of which requires a certain level of formal

training.However, an interesting opportunity for joint research be-

tween academia and the industry presents itself when one

considers that exploratory testing can be carried out by both

trained and untrained testers. Exploratory testing involves a

software tester interacting with a system in an unscripted man-

ner guided mainly by her intuition and experience. Although it

is a recognised approach, the technique is frequently referred

to as ad-hoc testing and suffers from the reputation of deliv-

ering inconsistent results depending on the tester executing it.

Despite the fact that there are documented exploratory testing

strategies which can be utilised [2], effectiveness has been

shown to be dependent on the tester’s knowledge [3], learning

style [4] and even personality [5]. The debate as to whether

or not formal training is a positive or negative influence on

the quality of exploratory testing motivates our hypothesis.

A. Hypothesis and Research Questions

If one partitions testers into two broad groups such that one

group consists of testers with a formal qualification in software

testing, and the second group consists of testers with no such

formal training, then we hypothesise that the two groups of

testers intuitively use different yet complimentary exploratory

testing strategies. In order to explore this hypothesis, we

propose to investigate three research questions:

(RQ1) Which types of exploratory testing strategies are

utilised by testers in each group?

(RQ2) Which types of bugs are found by testers in each

group?(RQ3) Is there a link between the bugs found and the testing

strategies adopted by the tester groups?

B. Research ChallengeThe main research challenge here is instigated mostly by the

fact that our subjects of interest (software testers) probably

do not possess explicit and standardised knowledge of the

strategies that they utilise to do their testing. To use a medical

metaphor, whereas two doctors are highly likely to refer to

any number of medical procedures by their technical names

and understand each other perfectly well, very little such

standardisation exists in exploratory software testing. This

is further compounded by the fact that one of the groups

we want to study is not even formally trained and would

therefore be even less likely to know any jargon. Therefore,

we needed to design a methodology which non-intrusively

extracts information about which strategies are being used by

participants at any point in time.C. How can HCI Help?Human Computer Interaction (HCI) is an interdisciplinary

area of research and practice which has evolved throughout the

2016 IEEE International Conference on Software Testing, Verification and Validation Workshops

/16 $31.00 © 2016 IEEEDOI 10.1109/ICSTW.2016.31

305

Page 34: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Do Exploratory Testing need Formal Training?

Fig. 5. The scale of test strategies - from unguided to guided

whether any bugs (by type) were reported. The datastaging phase produced the necessary level of detailrequired to conduct an in-depth analysis as discussed inthe following step.

Evaluation and Conclusions

1) Data analysis: Here, we leveraged preprocessed data inorder to produce outcomes and recommendations. Quan-titative tests on scoring data allowed us to identify di-verging a) behavioural patterns between tester categories,b) rigour in the use of test strategies and c) effectivenessof strategy use in terms of bugs discovered (and theirrespective categories). For this purpose pivot charts wereadopted so as to stay as close to the data as possible whilebeing able to transform the observed data to uncoverpotentially hidden patterns and correlations.

2) Expert evaluation: A questionnaire was sent out to around40 software quality assurance professionals to obtain ameasure of perceived importance for the various bugcategories covered in our study (i.e., which bugs, in orderof importance, should be reported and fixed prior to re-lease?). Finally we briefed a group of testing profession-als with our main results and interpretation thereof, andthese in turn contributed with their own practice-basedopinions, interpretations and recommendations. Thesetwo feedback loops were considered while formulatingour discussion (see Section V) as well as in our finalremarks (see Section VI) and potential future researchavenues (see Section VI-A).

3) Conclusions: A set of recommendations were finallyproduced with respect to the assembly and managementof exploratory testing teams, which could in turn improveefficiency and return on investment.

IV. RESULTS

As outlined in Section I-A, this work was driven bythree research questions which were designed to collectivelycharacterise the differences between how formally trainedtesters (George) and untrained testers (Carmen) approach anexploratory testing task. In this section, we present an outlineof the empirical results relating to each of these questions.We then go on to a synthesised discussion of these results inSection V.

TABLE IVDISTRIBUTION OF BUGS FOUND ACCORDING TO CATEGORY AND THE

TYPE OF TESTER THAT FOUND THEM.

Category Carmen George TotalContent Bugs 35 (54%) 30 (46%) 65Input Validation Bugs 6 (21%) 23 (79%) 29Logical Bugs 5 (50%) 5 (50%) 10Functional UI Bugs 10 (48%) 11 (52%) 21Nonfunctional UI Bugs 1 (11%) 8 (89%) 9

57 77 134

A. RQ1: Which types of strategies are use by testers in eachgroup?

1) Testers with no formal training (Carmen) overwhelminglyrely on unguided strategies (65%), using guided strategiesonly 31% of the time and semi-guided strategies for 4%of the time.

2) Formally trained testers (George) split their use of guidedand unguided strategies evenly (45% and 46% respec-tively) whilst making double the use of semi-guidedstrategies as untrained testers (9%).

3) Interestingly, whilst Carmen seems to always prefer un-guided strategies, George tends to alternate between them,using unguided strategies to find opportunities for moreguided testing (see Figure 6 and Figure 7).

Fig. 6. Strategy types collectively used by untrained testers (Carmen)throughout the 40 minute test sessionsg

Fig. 7. Strategy types collectively used by formally trained testers (George)throughout the 40 minute test sessions

B. RQ2: Which bugs are found by testers of each group?

1) We split bugs into five categories (see Section III-D).

310

w trainingw/o training

[Micalef 2016]

Page 35: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

When to use exploratory vs scripted?

SPECIAL SECTION ON SOFTWARE STANDARDS AND THEIR IMPACT IN REDUCING

SOFTWARE FAILURESReceived March 15, 2018, accepted May 6, 2018, date of publication May 10, 2018, date of current version June 5, 2018.

Digital Object Identifier 10.1109/ACCESS.2018.2834957

Levels of Exploration in Exploratory Testing:From Freestyle to Fully ScriptedAHMAD NAUMAN GHAZI 1, KAI PETERSEN 1, ELIZABETH BJARNASON 2,

AND PER RUNESON 2, (Member, IEEE)1Department of Software Engineering, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

2Department of Computer Science, Lund University, 221 00 Lund, Sweden

Corresponding author: Ahmad Nauman Ghazi ([email protected])This work was partly funded by the EASE Industrial Excellence Center for Embedded Applications Software Engineering,

(http://ease.cs.lth.se).

ABSTRACT Exploratory testing (ET) is a powerful and efficient way of testing software by integrating

design, execution, and analysis of tests during a testing session. ET is often contrasted with scripted testing

and seen as a choice of either exploratory testing or not. In contrast, we pose that exploratory testing can

be of varying degrees of exploration from fully exploratory to fully scripted. In line with this, we propose

a scale for the degree of exploration and define five levels. In our classification, these levels of exploration

correspond to the way test charters are defined. We have evaluated this classification through focus groups at

four companies and identified factors that influence the choice of exploration level. The results show that the

proposed levels of exploration are influenced by different factors such as ease to reproduce defects, better

learning, and verification of requirements and that the levels can be used as a guide to structure test charters.

Our study also indicates that applying a combination of exploration levels can be beneficial in achieving

effective testing.

INDEX TERMS Exploratory testing, test charter, test mission, session-based test management, levels of

exploration, exploratory testing classification, software testing.I. INTRODUCTIONAdvocates of exploratory testing (ET) stress the benefits of

providing the tester with freedom to act based on his/her

skills, paired with the reduced effort for test script design and

maintenance. ET can be very effective in detecting critical

defects [1]. We have found that exploratory testing can be

more effective in practice than traditional software testing

approaches, such as scripted testing [1], [2]. ET supports

testers in learning about the system while testing [1], [3].

The ET approach also enables a tester to explore areas of

the software that were overlooked while designing test cases

based on system requirements [4]. However, ET does come

with some shortcomings and challenges. In particular, ET can

be performed inmany different ways, and thus there is no one-

way of training someone to be an exploratory tester. Also,

exploratory testing tends to be considered an ad-hoc way of

testing and some argue that defects detected using ET are

difficult to reproduce [5].The benefits of exploratory testing are discussed both

within industry and academia, but only little work relates to

how to perform this kind of testing [6]. Bach introduced a

technique named Session Based Test Management (SBTM)

[7] that provides a basic structure and guidelines for ET using

test missions. In the context of SBTM, a test mission is an

objective to provide focus on what to test or what problems

to identify within a test session [7]. SBTM provides a strong

focus on designing test charters to scope exploration to the

test missions assigned to exploratory testers. A test charter

provides a clear goal and scopes the test session, and can

be seen as a high level test plan [8] thus the level of detail

provided in the test charter influences the degree of explo-

ration in the testing.However, little guidance exists on how to define test char-

ters in order to achieve different, or combine various degrees

of exploration. Though, there is a need in the industry to have

support for choosing the ‘‘right’’ degree of exploration see

e.g. [9]. In order to make an informed decision there is a need

to define what is meant by ‘‘degree of exploration.’’We pose that testing can be performed at varying degrees

of exploration from freestyle ET to fully scripted, and

propose a scale for the degree of exploration defined by

five distinct levels of exploration. In this paper, we present

a classification consisting of five levels of exploratory

testing (ET) ranging from free style testing to fully scripted

264162169-3536 ⌦ 2018 IEEE. Translations and content mining are permitted for academic research only.

Personal use is also permitted, but republication/redistribution requires IEEE permission.

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. VOLUME 6, 2018

Page 36: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Organization

CC BY 2.0 Theater der Künste

Page 37: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Organization

• Support for decision making• Enhance teamwork• Independency• Balance testing – quality• Assist test management• Ownership of test technology• Resources utilization• Career path

Page 38: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Hierarchical approach498 CHAPTER 16 TEST TEAM ORGANIZATION

Executivemanagement

Softwaregroup

Softwaredevelopers

System testgroup

Integrationtest group

Developmenttest group

Performancetest group

scalabilitytest group

Automationtest group

Sustainingtest group

Hardwaregroup

Figure 16.1 Structure of test groups.

value of the ratio depends upon the nature of the software under development. Avalue of the ratio is estimated during the development of a test planning document,and it has been discussed in Chapter 12.

Development Test Group The focus of this group is on the testing ofnew features in a particular release. This includes basic tests, functionality tests,robustness tests, interoperability tests, stress tests, load and stability tests, regressiontests, documentation tests , and business acceptance tests .

Performance Test Group This group puts emphasis on system perfor-mance. Tests are conducted to identify system bottlenecks and recommendationsare made to the developers for improving system performance. The group usestest, measurement, and analysis tools to carry out its tasks. This group may takeup additional responsibilities such as reliability testing.

Scalability Test Group The focus of this group is on determining whetheror not the system can scale up to its engineering limits. For example, a cellularphone network might have been designed with certain engineering limits in mind,such as the maximum number of base stations it can support, the maximum numberof simultaneous calls it can handle, and so on. The group tests whether the designedsystem can reach those limits. This group may take up additional responsibilitiessuch as load and stability testing.

Automation Test Group The responsibility of this group is to develop testautomation infrastructure, test libraries, and tests tools. This group assists othergroups in the development of automated test suites.

Sustaining Test Group This group maintains the software quality through-out the product’s market life. This team is responsible for maintaining the correctiveaspect of software maintenance. The group works very closely with customers andconducts regression testing of patch software.

Page 39: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Team approach

Product developers

M

P P P P

P

T

Product developers

M

P P P T

PT

Page 40: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Test Organisation[Kit p 166 ff]

Degrees of freedom– Tall or flat– Market or product– Centralized or decentralized– Hierarchical or diffused– Line or staff– Functional or project

Page 41: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

7 approaches to test organisation1. Each person’s responsibility2. Each unit’s responsibility3. Dedicated resource4. Test organisation in QA5. Test organisation in development6. Centralized test organisation7. Test technology centre

[Kit, Software Testing in the Real World Ch 13, 1995]

Page 42: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Stuckenbruck, L. C. (1979). The matrix organization. Project Management Quarterly, 10(3), 21–33.

Matrix organization

Page 43: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Scaled Agile (LeSS, SAFe, Scrum@Scale)

Page 44: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Which organization should we choose?

• Depending on– size– maturity– focus – localization

• Testing dimensions and testing organization?

• The solution is often a mixture of different approaches

unit

integration

system

efficiencymaintainability

functionality

white box black box

Level of detail

Accessibility

Characteristics

usabilityreliability

(module)

portability

Page 45: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

High performing testers – Interview Study

• Experience and skills– With the product– In the domain– In programming– In testing techniques– Writes good defect

reports• Motivation

– Has a mission– Knows the importance

of testing

• Reflection– Maintain big picture– Understands the effects of

defects in production environment

– Independent, knows own skill and limits

– Criticizes product and process• Personal characteristics

– Thoroughness, patience or persistency, Accurateness

[Joonas Iivonen, Mika Mäntylä, Juha Itkonen: Characteristics of high performing testers: a case study. ESEM 2010]

Page 46: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Test certificationhttp://istqb.org

Page 47: L6 2019 Organization Toolsfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L6_Organization_T… · •Good at organising data •Requires training •Introduced incrementally •No

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Recommended exercises

• Chapter 12– 11, 12, 13

• Chapter 16– 2, 3, 4, 9, 10