l6 2019 organization toolsfileadmin.cs.lth.se/cs/education/etsn20/lectures/l6_organization_t… ·...
TRANSCRIPT
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Software TestingETSN20http://cs.lth.se/etsn20
Tools; Naik 12.10-12.16, JonssonOrganization: Naik16.1-16.4
Professor Per Runeson
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Lecture
• Chapter 12.10-12.15– Tools– Test automation
• Chapter 16.1-16.4– Organization– Competences
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Test tools – the tester’s workbenchPhoto: CC avotius at Flickr
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Tools – the workbench
• Good at repeating tasks
• Good at organising data
• Requires training• Introduced
incrementally• No “silver bullet”
Evaluation criteria• Ease of use• Power• Robustness• Functionality• Ease of insertion• Quality of support• Cost• Company policies and goals
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Test tools – by process
Test management toolsDefect management tools
Testexecution and comparison tools
Debuggingtools
Coverage toolsStatic analysis tools
Test design tools
Architectural design
Detailed design
Code
Requirement specification
Unit test
Integration test
Performancesimulator tools
System test
Acceptance test
[Redrawn from Fewster and Graham. Software Test Automation, 1999]
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Test tools – by example
• Test execution tools• Static analysis tools• Test management tools• Defect management tools• Test design tools• Performace simulator tools• Debugging tools• Coverage tools
• xUnit, Selenium• lint• HP Quality Center• Jira• ACTS• LoadRunner• embedded in IDE• EclEmma
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Choosing a toolTest tool evaluation criteria• Test development – languague, scripts• Test maintenance – version control, browse, tags• Test execution – sequencing, control, integration• Test results – logging, mapping to versions,
analysis• Test management – storage, authorization• GUI testing – recording, modification• Vendor – sustainability, responsiveness• Price – life cycle costs vs value
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Buy or share?It is More Blessed to Give than to Receive –
Open Software Tools Enable Open Innovation
Per Runeson1, Hussan Munir1 and Krzysztof Wnuk2
1Lund University, 2Blekinge Institute of Technology
ABSTRACT
Open Innovation (OI) has attracted scholarly interest from a wide range of disciplines since intro-duced by Chesbrough [1], i.e. ”a paradigm that assumes that firms can and should use external ideasas well as internal ideas, and internal and external paths to market, as they look to advance theirtechnology”. However, OI remains unexplored for software engineering (SE), although widespreadin practice through Open Source Software (OSS). We studied the relation between SE and OI andin particular how OSS tools impact on software-intensive organization’s innovation capability.
We surveyed the literature on SE and OI [3] and found that studies conclude that start-ups havehigher tendency to opt for OI compared to established companies. The literature also suggests thatfirms assimilating external knowledge into their internal R&D activities, have higher likelihood ofgaining financial advantages.
In a case study, we observed how OSS tools Jenkins and Gerrit enabled open innovation [2]. Wemined software commits to identify major contributors, found them be a�liated to Sony Mobile,contacted five of them for interviews about their and their employer’s principles and practices withrespect to OI and tools, which they gave a consistent view of.
Our findings indicate that the company’s transition to OI was part of a major paradigm shifttowards OSS, while the adoption of open tools was driven bottom up by engineers with supportfrom management. By adopting OI, Sony Mobile achieved freed-up developers’ time, better qualityassurance, inner source initiatives, flexible development environment, faster releases and upgrades.Particularly, the introduction of a test framework was proposed by Sony Mobile but implementedby other contributors [2]. However, the benefits are gained through investing significant attentionand resources to the OSS community in terms of technical contributions and leadership.
BODY
Sharing software tools enables open innovation, brings faster upgrades andfrees up resources, but demands investments in the open community
REFERENCES
[1] H. W. Chesbrough. Open innovation: The new imperative for creating and profiting from
technology. Harvard Business School Press, Boston, Mass., 2003.[2] H. Munir and P. Runeson. Software testing in open innovation: An exploratory case study of
the acceptance test harness for Jenkins. In Proceedings of the 2015 International Conference
on Software and System Process, ICSSP 2015, pages 187–191, New York, NY, USA, 2015.ACM.
[3] H. Munir, K. Wnuk, and P. Runeson. Open innovation in software engineering: A systematicmapping study. Empirical Software Engineering, DOI 10.1007/s10664-015-9380-x, 2015.
Volume 4 of Tiny Transactions on Computer Science
This content is released under the Creative Commons Attribution-NonCommercial ShareAlike License. Permission tomake digital or hard copies of all or part of this work is granted without fee provided that copies are not made ordistributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.CC BY-NC-SA 3.0: http://creativecommons.org/licenses/by-nc-sa/3.0/.
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Consequence of bad toolsCognitive load drivers
Cognitive Load Drivers in Large Scale Software Development – An industrial case study
Research QuestionsRQ1 Which types of cognitive load
drivers can be observed in large-
scale software engineering, primarily
as a consequence of tool use?
RQ2 How do software engineers
perceive the identified cognitive
load drivers in their digital work
environment?
Daniel Helgesson, Emelie Engström, Per Runeson, Elizabeth BjarnasonDept. of Computer Science, Lund University Lund, Sweden <firstname.lastname>@cs.th.se
Research Method1. Literature review2. Interview study
(test manager, 2 testers)
3. Extended interview study (tool architect, SW developer)
4. Knowledge synthesis
Conclusions• Little published beyond program comprehension
• Cognitive science relevant for software engineering
• Evidence of cognitive load in large-scale software engineering (RQ1)
• Cognitive load is indeed a problem for practitioners (RQ2)
Identified Cognitive Load DriversTABLE ICOGNITIVE LOAD DRIVERS IN SOFTWARE ENGINEERING, STRUCTURED ACCORDING TO THE THEMATIC ANALYSIS.
Main Clusters Themes Items DescriptionWork process — Lack of automation Absence of automated tool support (e.g. automated testing) forcing the user to do
work manually.Wasted effort Unnecessary or redundant work mandated by absence of tool support or by process.Ad hoc Tool support (or processes) implemented differently in different parts of the
organisation.Lack of understanding Missing information or support on account of the shifting nature of a large
organization.Tool Intrinsic Adaptation/Suitability Use of tools that are not really suited for the purpose.
Lack of functionality Use of tools missing functionality needed to solve a task efficiently.Stability/Reliability Use of tools that suffer from stability or reliability issues.Overlap Use of several tools that can do the same thing, or almost the same thing, in parallel.Lack of Integration Use of several tools, in parallel, that are not (or poorly) integrated, forcing the user
to do redundant and/or manual work.Comprehension Actually understanding what needs to be done in the tool in order to complete a
task.Delay Response (micro) Delays in response forcing the user to stay overly focused, putting a strain on short
term memory.Downtime (macro) Tools or systems that are completely unresponsive for a longer period than a few
seconds/minutes.Interaction Unintuitive Functionality (or interaction) is implemented in an unintuitive way.
Inconsistent Functionality (or interaction) is inconsistently implemented in two different toolsor in two different views of the same tool.
Cumbersome Functionality (or interaction) is implemented in a way that users find clumsy.Information Integrity Incompleteness Lack of complete information is causing the user to spend effort in asserting that
information is complete.Reliability Lack of reliable information is causing the user to spend effort in asserting that
information is correct and up-to-date.Temporal traceability The user needs to bridge a temporal gap in order to assess the current situation.
Organisation Location Where to find the information.Distribution Where to store and whom to distribute the information to.Retrieval How to access the information and retrieve it.Overview/zoom How to navigate the information.Structure How the information is organised.
TABLE IIMAPPING OF OUR MAIN CLUSTERS FROM TABLE I AND GULLIKSEN ET
AL’S COGNITIVE WORK ENVIRONMENT PROBLEMS [4]
Cognitive work environment problems
Wor
kpr
oces
s
Tool
s
Info
rmat
ion
Unneccessary cognitive load and interruption ofthought process
– x –
Unneccessary strain on working memory – x –Problems orientatiing and lack of overview – x xIdentifying and interpreting information – x xDecision making/support – x xDifficulties with time coordination of data – x xWork processes determined by tools – x –Many unintegrated information systems x x xPoor support for learning x – –Lack of understanding automation x – –Difficulties with different system modes N.A. N.A. N.A.
* !?!
Software engineers handle a lot of information in their daily work.
We explore how software engineers interact with information
management systems/tools, and to what extent these systems
expose users to increased cognitive load.
Cognitive Load Drivers in Large Scale Software Development – An industrial case study
Research QuestionsRQ1 Which types of cognitive load
drivers can be observed in large-
scale software engineering, primarily
as a consequence of tool use?
RQ2 How do software engineers
perceive the identified cognitive
load drivers in their digital work
environment?
Daniel Helgesson, Emelie Engström, Per Runeson, Elizabeth BjarnasonDept. of Computer Science, Lund University Lund, Sweden <firstname.lastname>@cs.th.se
Research Method1. Literature review2. Interview study
(test manager, 2 testers)
3. Extended interview study (tool architect, SW developer)
4. Knowledge synthesis
Conclusions• Little published beyond program comprehension
• Cognitive science relevant for software engineering
• Evidence of cognitive load in large-scale software engineering (RQ1)
• Cognitive load is indeed a problem for practitioners (RQ2)
Identified Cognitive Load DriversTABLE ICOGNITIVE LOAD DRIVERS IN SOFTWARE ENGINEERING, STRUCTURED ACCORDING TO THE THEMATIC ANALYSIS.
Main Clusters Themes Items DescriptionWork process — Lack of automation Absence of automated tool support (e.g. automated testing) forcing the user to do
work manually.Wasted effort Unnecessary or redundant work mandated by absence of tool support or by process.Ad hoc Tool support (or processes) implemented differently in different parts of the
organisation.Lack of understanding Missing information or support on account of the shifting nature of a large
organization.Tool Intrinsic Adaptation/Suitability Use of tools that are not really suited for the purpose.
Lack of functionality Use of tools missing functionality needed to solve a task efficiently.Stability/Reliability Use of tools that suffer from stability or reliability issues.Overlap Use of several tools that can do the same thing, or almost the same thing, in parallel.Lack of Integration Use of several tools, in parallel, that are not (or poorly) integrated, forcing the user
to do redundant and/or manual work.Comprehension Actually understanding what needs to be done in the tool in order to complete a
task.Delay Response (micro) Delays in response forcing the user to stay overly focused, putting a strain on short
term memory.Downtime (macro) Tools or systems that are completely unresponsive for a longer period than a few
seconds/minutes.Interaction Unintuitive Functionality (or interaction) is implemented in an unintuitive way.
Inconsistent Functionality (or interaction) is inconsistently implemented in two different toolsor in two different views of the same tool.
Cumbersome Functionality (or interaction) is implemented in a way that users find clumsy.Information Integrity Incompleteness Lack of complete information is causing the user to spend effort in asserting that
information is complete.Reliability Lack of reliable information is causing the user to spend effort in asserting that
information is correct and up-to-date.Temporal traceability The user needs to bridge a temporal gap in order to assess the current situation.
Organisation Location Where to find the information.Distribution Where to store and whom to distribute the information to.Retrieval How to access the information and retrieve it.Overview/zoom How to navigate the information.Structure How the information is organised.
TABLE IIMAPPING OF OUR MAIN CLUSTERS FROM TABLE I AND GULLIKSEN ET
AL’S COGNITIVE WORK ENVIRONMENT PROBLEMS [4]
Cognitive work environment problems
Wor
kpr
oces
s
Tool
s
Info
rmat
ion
Unneccessary cognitive load and interruption ofthought process
– x –
Unneccessary strain on working memory – x –Problems orientatiing and lack of overview – x xIdentifying and interpreting information – x xDecision making/support – x xDifficulties with time coordination of data – x xWork processes determined by tools – x –Many unintegrated information systems x x xPoor support for learning x – –Lack of understanding automation x – –Difficulties with different system modes N.A. N.A. N.A.
* !?!
Software engineers handle a lot of information in their daily work.
We explore how software engineers interact with information
management systems/tools, and to what extent these systems
expose users to increased cognitive load.
Cognitive Load Drivers in Large Scale Software
DevelopmentDaniel Helgesson
Dept. of Computer ScienceLund UniversityLund, [email protected]
Emelie EngstromDept. of Computer Science
Lund UniversityLund, [email protected]
Per RunesonDept. of Computer ScienceLund UniversityLund, [email protected]
Elizabeth BjarnasonDept. of Computer Science
Lund UniversityLund, [email protected]
Abstract—Software engineers handle a lot of information in
their daily work. We explore how software engineers interact with
information management systems/tools, and to what extent these
systems expose users to increased cognitive load. We reviewed the
literature of cognitive aspects, relevant for software engineering,
and performed an exploratory case study on how software engi-
neers perceive information systems. Data was collected through
five semistructured interviews. We present empirical evidence of
the presence of cognitive load drivers, as a consequence of tool
use in large scale software engineering.
Index Terms—Cognition, Cognitive Load, Software Develop-
ment, Software Engineering, Software Development Tools, Soft-
ware Engineering Tools, Industrial Case StudyI. INTRODUCTION
Software engineering is a socio-technical endeavour where
the technical side of the phenomena seems to be more studied
than the social side [1], and as a consequence knowledge of
a cognitive/ergonomic perspective of software development,
and the tools associated with these activities, appears rather
small. Further, we see no clear indications of a significant
impression on the software engineering community in terms
of understanding the cognitive work environment of software
engineers [2] [3].In a 2002 dissertation, Walenstein observed that there is
a need for cognitive reasoning in the design process of
software development tools, and further that there has been
little research done in the area [2], a claim largely substantiated
by Lenberg et al. [1].More recently, in a 2015 report ’Digital Work Environ-
ment’, Gulliksen et al. made an effort to analyse the societal
consequences of large-scale digitalisation of human labour,
in general [4]. In the report the authors present a literature
survey, providing updated insight into the research area. The
survey found only 36 relevant articles. In addition, the authors
also present a taxonomy of ’Cognitive work environment
problems’.In this study we aim to explore, and establish, a broader
understanding of the cognitive work environment of software
engineers and the cognitive dimensions of the tools used.
Specifically, we aim to explore cognitive load, induced on
users by information systems or tools. We present results from
an exploratory industrial case study based on thematic analysis
of interviews, as well as a literature overview. Our contribution
lies in presenting in vivo observation of cognitive problems
associated with tool use in large-scale software engineering.
II. RESEARCH QUESTIONS
The purpose of this study is to gain insight into the
problem domain of cognitive load, primarily as a consequence
of tool use, in large scale software development. Hence it
is exploratory in nature, and focuses on two tools central
for communication and knowledge management at the case
company.The overall exploratory purpose is refined into two research
questions:RQ1 Which types of cognitive load drivers can be observed
in large-scale software engineering, primarily as a con-
sequence of tool use?RQ2 How do software engineers perceive the identified cog-
nitive load drivers in their digital work environment?
The research questions are anchored in software engineering
and cognitive science literature, and addressed by interviewing
practitioners. The first question uses the cognitive literature as
a lens, while presenting empirical observations from the inter-
view material. The second question reports the interviewees’
perception of problems found in RQ1.III. METHOD
We conducted a four stage case study, using a flexible
design [5]; consisting of literature review, interview study,
extended interview study, and knowledge synthesis. To mature
the knowledge, we iterated reviewing literature, conducting
interviews, transcribing and analysing data. Figure 1 describes
the study1.The case company is an international corporation, the
studied division develops consumer products in the Android
ecosystem. The software is embedded in handheld devices.
The studied development site of the company has 1000 em-
ployees and developers work in cross-functional teams using
an agile development process. The development environment
is primarily based on the toolchain associated with the Android
ecosystem.1Please see preprint for details of study, including in-
terview guide, table of result and table of validation
https://lucris.lub.lu.se/admin/files/61811452/cog load manuscript.pdf
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Test executionWhat to automate?
Test cases that are• Less volatile• Repeatable• High risk• Easy to automate• Manually difficult• Boring and time consuming
396 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION
The general guideline shown in Figure 12.4 may be used in evaluating the suitabilityof test cases to be automated as follows:
Less Volatile: A test case is stable and is unlikely to change over time. Thetest case should have been executed manually before. It is expected thatthe test steps and the pass–fail criteria are not likely to change any more.
Repeatability : Test cases that are going to be executed several times shouldbe automated. However, one-time test cases should not be considered forautomation. Poorly designed test cases which tend to be difficult to reuseare not economical for automation.
High Risk : High-risk test cases are those that are routinely rerun after everynew software build. The objectives of these test cases are so important thatone cannot afford to not reexecute them. In some cases the propensity ofthe test cases to break is very high. These test cases are likely to be fruitfulin the long run and are the right candidates for automation.
Easy to Automate: Test cases that are easy to automate using automation toolsshould be automated. Some features of the system are easier to test thanother features, based on the characteristics of a particular tool. Customobjects with graphic and sound features are likely to be more expensiveto automate.
Manually Difficult : Test cases that are very hard to execute manually shouldbe automated. Manual test executions are a big problem, for example,causing eye strain from having to look at too many screens for too long in aGUI test. It is strenuous to look at transient results in real-time applications.These nasty, unpleasant test cases are good candidates for automation.
Boring and Time Consuming : Test cases that are repetitive in nature andneed to be executed for longer periods of time should be automated. Thetester’s time should be utilized in the development of more creative andeffective test cases.
Less volatile
Repeatability
High risk
Easy to automate
Manually dificult
Boring and timeconsuming
Guidelinesfor test
automation
Figure 12.4 Test selection guideline for automation.
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Evolution of test automation
1. Recorded Scripts
2. Engineered Scripts
3. Data-driven Testing
4. Keyword-driven Testing
5. Model-based Testing
First Last Data
Pekka Pukaro 1244515
Teemu Tekno 587245
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Recorded Scripts• Unstructured• Scripts generated using
capture and replay tools• Scripts non-maintainable,
in practice– If the system changes they
need to be captured again
• Relatively quick to set up
Capture Replay Tools• Record user’s actions to a script
(keyboard, mouse)– Tool specific scripting language
• Scripts access the (user) interface of the software
– Input fields, buttons and other widgets• Simple checks can be created in the
scripts– Existence of texts and objects in the
UI– Data of GUI objects
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Engineered Scripts• Scripts are well-designed, modular, robust,
documented, and maintainable• Separation of common tasks
– E.g. setup, teardown, and error detection• Test data is still embedded into the scripts• Code is mostly written manually • Implementation and maintenance require
programming skills• “Just like any other software development project”• Most well-known example: jUnit
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Data-Driven Testing
• Test inputs and expected outcomes stored as data– Normally in a tabular format– Test data are read from an external data source
• One driver script can execute all of the designed test cases • External test data can be edited without programming skills
– Test data can now come from business people, customers, …• Avoids the problems of embedded test data
– The tests are hard to understand in the middle of all scripting details
• Updating tests or creating similar tests with slightly different test data always requires programming
– Leads to copy-paste scripting
First Last Data
Pekka Pukaro 1244515
Teemu Tekno 587245
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Keyword-Driven Testing• Keywords (= action words) abstract the navigation and actions
from the script• Keywords and test data are read from an external data source• When test cases are executed keywords are interpreted by a test
library which is called by a test automation framework (The test library = the test scripts)
• Keyword-driven testing improves data-driven• Example: Login: admin, t5t56y;
AddCustomers: newCustomers.txtRemoveCustomer: Pekka Pukaro
• Notice : “dirty” details of gui testing are missing• Keyword driven testing ~= domain specific languages (DSL)• Tool: http://code.google.com/p/robotframework/
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Model based testing• System under test is modelled
– UML-state machines, domain specific languages (DSL) • Test cases are automatically generated from the model
– More accurate model -> better tests• Generate a large amount of tests that cover the model
– Many different criteria for covering the model– Execution time of test cases might be a factor
• The model can provide also the expected results for the generated tests
• Challenges: – Personnel competencies– Data-intensive systems (cannot be modelled as a state machine)
• Simple MBT tool http://sourceforge.net/projects/graphwalker/
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Evolution of system testing approaches
1. Recorded Scripts– Cheap to set up, quick & dirty
2. Engineered Scripts– Structured
3. Data-driven Testing– Data separation
4. Keyword-driven Testing– Action separation, DSL
5. Model-based Testing– Modeling & Automatic
test case generation
First Last Data
Pekka Pukaro 1244515
Teemu Tekno 587245
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Automation and oracles• Automated testing depends on the ability to
programmatically detect when the software fails• An automated test is not equivalent to a similar manual test
– Automatic comparison is typically more precise – Automatic comparison will be tripped by irrelevant
discrepancies– The skilled human comparison will sample a wider
range of dimensions, noting oddities that one wouldn't program the computer to detect
• “Our ability to automate testing is fundamentally constrained by our ability to create and use oracles”
Cem Kaner
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Types of outcome to compareScreen-based• Character-based
applications• GUI applications
– Correct message, display attributes, displayed correctly
– GUI components and their attributes
• Graphical images– Avoid bitmap comparisons
Disk-based• Comparing text files• Comparing non-textual forms
of data• Comparing databases and
binary filesOthers• Multimedia applications
– Sounds, video clips, animated pictures
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Test automation code quality
Less important• Only inhouse - not
delivered tocustomers
• Only temporary
More important• Used to assess
production codequality
• Maintain with code• Volume at least 1:1
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Avoid in test code
• Bad assertions• Retry (timing issues)• Sleeps and delays• Hard-coded values and configurations• Wrong level of abstraction• Lack of atomicity (too big tests)
https://dev.to/juperala/review-your-test-automation-code-271a
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Automating different steps
Select/Identify test cases to run
Set up test environment- create test environment
- load test data
Repeat for each test case:- set up test prerequisites
- execute- compare results
- log results-analyze test failures
-report defect(s)- clear up after test case
Clear up test environment:- delete unwanted data- save important data
Summarize results
Select/Identify test cases to run
Set up test environment:- create test environment
- load test data
Repeat for each test case:- set up test prerequisites
- execute-compare results
- log results-clear up after test case
Clear up test environment:- delete unwanted data-save important data
Summarize results
Analyze test failuresReport defects
Automated tests Automated testing
Manual process
Automated process
!Redrawn from Fewster et al. Software Test Automation, 1999.
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Relationship of testing activities
!Redrawn from Fewster et al. Software Test Automation, 1999.
!Edit tests (maintenance)
!Set up !Execute !Analyze failures !Clear up
!Manual testing
!Same tests automated
!More mature automation
!Time
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Automated failure clusteringMaster thesis @ Qlik
Navigating Information Overload Caused byAutomated Testing – A Clustering Approach in
Multi-Branch DevelopmentNicklas Erman
QlikLund, Sweden
Vanja TufvessonAccenture
Copenhagen, Denmarkvanja.tufvesson@
accenture.com
Markus Borg, Anders Ardo and Per RunesonLund University, Sweden
markus.borg, [email protected]@eit.lth.se
Abstract—Background. Test automation is a widely used tech-nique to increase the efficiency of software testing. However,executing more test cases increases the effort required to analyzetest results. At Qlik, automated tests run nightly for up to 20development branches, each containing thousands of test cases,resulting in information overload. Aim. We therefore developa tool that supports the analysis of test results. Method. Wecreate NIOCAT, a tool that clusters similar test case failures,to help the analyst identify underlying causes. To evaluate thetool, experiments on manually created subsets of failed test casesrepresenting different use cases are conducted, and a focus groupmeeting is held with test analysts at Qlik. Results. The casestudy shows that NIOCAT creates accurate clusters, in line withanalyses performed by human analysts. Further, the potentialtime-savings of our approach is confirmed by the participantsin the focus group. Conclusions. NIOCAT provides a feasiblecomplement to current automated testing practices at Qlik byreducing information overload.
Index Terms—Software testing, test automation, test resultanalysis, clustering, case study.
I. INTRODUCTION
When trying to improve software test efficiency, test au-tomation is often brought forward as a key solution [1],[2]. However, while automated testing (auto testing) providesthe benefits of reducing manual testing, minimizing humanerror, and enabling a higher testing frequency [3, p. 466],new challenges are introduced. With higher testing frequencythe volume of test results increases drastically. Consequently,there is a need for tools to navigate the potential informationoverload [4].
Qlik1, a software company in the business intelligencedomain, has adopted auto testing to save time, improvetest coverage and to enable development of new featuresin parallel, while assuring a high quality product. At Qlik,automated tests (autotests) run every night on multiple sourcecode branches using Bamboo (see Section III-B). However,Bamboo simply groups test results based on the test case (TC)names, and it is both difficult and time consuming to manuallyanalyze the large amount of test results.
1http://www.qlik.com
To support the analysis of test results, Qlik developedPasCal, a tool that clusters TC failures based on the errormessage generated by the failed TCs (see Section III-B).However, PasCal still uses a naıve clustering approach: exactmatching of the error messages. Moreover, PasCal was notmainly developed to provide an overview of the auto testing,but to automatically generate new bug reports based on TCfailures on the main development branch.
Although PasCal is an important first step toward improvedanalysis of results from autotests, there are still several openchallenges. First, there is no efficient way to determine thatspecific TCs fail on multiple branches. Second, intermittentfailures due to variations in the testing environment makethe results unreliable, thus triggering re-execution of autotests.Third, concluding that multiple TCs fail because of the sameroot cause is difficult. All three challenges are amplified bythe information overload caused by the auto testing, and thetest analysts request support.
To improve the overview of test results, we developedNIOCAT, a tool that clusters TC failures from auto testing inmulti-branch environments. The clustering goes beyond exactstring matching by calculating relative similarities of textualcontent using the vector space model [5]. Furthermore, wecomplement the TC name and error message by executioninformation, in line with previous work [6].
We evaluate the accuracy of NIOCAT in a case study onthe development of a web application, using three manuallyconstructed scenarios representative for the work of test ana-lysts at Qlik. In the study, we also explore different weightingof textual information and execution information using space-filling experimental design [7]. Also, we qualitatively evaluateNIOCAT through a focus group interview, following the casestudy methodology proposed by Runeson et al. [8].
In this paper, we briefly summarize related work in Sec-tion II and describe the case company in Section III. Sec-tion IV jointly presents NIOCAT and its evaluation in Sec-tion IV. We present the results in Section V, and finallyconclude the paper in Section VI.
Fig. 3. Overview presented by Bamboo showing results from nine consecutiveautotest executions on the same branch. Neither the SUT nor the TCs changedbetween the test runs, indicating an intermittent failure.
ft-personal”, “Ver12.00-dev-ft-ratatosk” and “Ver12.00-dev-ft-ratatosk-responsive-grid”. Since Bamboo does not provide away to cross reference TC failures between the branches,the test analyst must manually navigate into each branch todetermine if the same TC has failed on all three branches.
Intermittent Failures (“Is this really a problem?”): Qlikrefers to TCs that irregularly fail because of variations in thetesting environment (e.g. timing issues caused by unbalancedload of test servers) as “intermittent failures”. Figure 3, alsoa screen shot from Bamboo, shows that consecutive executionof autotests for the branch “main for stability testing” yieldsdifferent results. Note that neither the SUT nor the TCs havechanged between the different runs, but still the test resultsvary. To determine whether a TC failure is due to a “real”problem in the SUT, or to an intermittent failure, typicallythe autotests are executed several times. If an overview of allbranches with a particular TC failure was available, the timespent re-executing the autotests could be saved.
Root Cause Analysis (“Do these TCs fail for the samereason?”): The same TCs can fail in different ways, i.e. thesame TC may fail in different ways in different branches.For example, a six-step TC could fail at any step, but stillthe same TC name would be presented by Bamboo. Toidentify differences between the two TC failures, additionalinformation about the failure, such as the error message, has tobe taken into account. Similarly, two different TCs might failin a step that both TCs have in common, e.g. an intial setup
Fig. 4. NIOCAT generates a clustering of TC failures based on user selectedautotest results.
step. These problems should not be treated as two differentissues, as the common trigger is the setup phase. Again, anaıve comparison using only the TC name would not identifythis common root cause. A clustering of all TCs that fail duringthe same step, i.e. share a common root cause, would supporta timely resolution of the issue.
IV. STUDY DESIGN AND SOLUTION APPROACH
As the development and evaluation of NIOCAT were tightlyconnected, this section contains a joint presentation. First,based on the background and challenges described in SectionIII, we state two research questions. Then, the rest of thissection presents the details of NIOCAT, the correspondingevaluation, and the major threats to validity.
RQ1 How can clustering of test case failures help test analystsat Qlik navigate the information overload caused byautomated testing?
RQ2 Can execution data be used in addition to textual infor-mation to improve the clustering of test case failures?
A. Solution Approach – NIOCAT
Our approach to support the test analysts at Qlik is to introducea tool for high-level analysis of autotest results. We name thetool NIOCAT – Navigating Information Overload Caused byAutomated Testing. The output from NIOCAT is a clusteringof TC failures from a user selected set of autotest results.NIOCAT aims to group similar TCs failures, i.e. each clustershould represent a unique issue in the SUT, containing one orseveral TC failures.
Figure 4 illustrates how NIOCAT processes autotest resultsfrom multiple branches to generate clusters of TC failures.The small circles represent TC failures, whereas larger circlesdepict TC failures that have been grouped together. To supportinteractive navigation of the NIOCAT output, we use QlikView(see Section III) to present the results.
Clustering TC failures provides the test analysts a startingpoint for further investigation. Test analysts can use NIOCATin different ways, what use case is supported depends on theanalyst’s choice of input autotest results. The use case for adevelopment team leader might be to analyze data from testruns within the last seven days, for the team’s branch only.A configuration manager on the other hand, might look for a
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Automated bug assignment
Empir Software Eng
sent down to the second layer. If the customer support organization believes a CSR to bea fault in the product, they file a bug report based on the CSR in the second layer BTS. Inthis way, the second layer organization can focus on issues that are likely to be faults in thesoftware. In spite of this approach, some bug reports can be configuration issues or otherproblems not directly related to faults in the code. In this study, we have only used data fromthe second layer BTS, but there is nothing in principle that prevents the same approach to beused on the first layer CSR’s. The BTS is the central point in the bug handling process andthere are several process descriptions for the various employee roles. Tracking of analysis,implementation proposals, testing, and verification are all coordinated through the BTS.
4.3 State-of-Practice Bug Assignment: A Manual Process
The bug handling process of both Company Automation and Telecom are substantiallymore complex than the standard process described by Bugzilla (Mozilla 2013). The twoprocesses are characterized by the development contexts of the organizations. CompanyAutomation develops safety-critical systems, and the bug handling process must thereforeadhere to safety standards as described in Section 4.1. The standards put strict require-ments on how software is allowed to be modified, including rigorous change impactanalyses with focus on traceability. In Company Telecom on the other hand, the sheersize of both the system under development and the organization itself are reflected onthe bug handling process. The resource allocation in Company Telecom is complex andinvolves advanced routing in a hierarchical organization to a number of developmentteams.
We generalize the bug handling processes in the two case companies and present anoverview model of the currently manual process in Fig. 4. In general, three actors can filebug reports: i) the developers of the systems, ii) the internal testing organization, and iii)customers that file bug reports via helpdesk functions. A submitted bug report starts in a bugtriaging stage. As the next step, the Change Control Board (CCB) assigns the bug reportto a development team for investigation. The leader of the receiving team then assigns thebug report to an individual developer. Unfortunately, the bug reports often end up with thewrong developer, thus bug tossing (i.e., bug report re-assignment) is common, especiallybetween teams. The longer history the BTS stores about the bug tossing that takes place, themore detailed estimate of the savings of our approach can be done. With only the last entrysaved in the BTS one can estimate the prediction accuracy of the system but to calculate thefull saving, the full bug tossing history is needed.
Development
Test
Customer Support
Bug
Tracking System (BTS)
Team 1
Team N
Developer 1
Developer N
Team Leader CCB
New Bug Report
New Bug Report
New Bug Report
Automatic Assignment
Bug Tossing
Fig. 4 A simplified model of bug assignment in a proprietary context
Jonsson, 2016
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Test automation promises
1. Efficient regression test2. Run tests more often3. Perform difficult tests (e.g. load, outcome check)4. Better use of resources5. Consistency and repeatability6. Reuse of tests7. Earlier time to market8. Increased confidence
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Common problems
1. Unrealistic expectations2. Poor testing practice
”Automatic chaos just gives faster chaos”3. Expected effectiveness4. False sense of security5. Maintenance of automatic tests6. Technical problems (e.g. Interoperability)7. Organizational issues
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
What can be automated?
IntellectualPerformed once
RepeatedClerical
1. Identify
2. Design
3. Build
4. Execute
5. Check
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Limits of automated testing
• Does not replace manual testing• Manual tests find more defects than automated
tests– Does not improve effectiveness
• Greater reliance on quality of tests– Oracle problem
• Test automation may limit the software development
– Costs of maintaining automated tests
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
What is ET?Exploratory software testing (ET) is a style of software testing that emphasizes the personal freedom and responsibility of the individual tester to continually optimize the value of her work by treating test-related learning, test design, test execution, and test result interpretation as mutually supportive activities that run in parallel throughout the project.
Cem Kaner
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Sounds promising…
…but…– impossible to automate– highly dependent on tester skills– hard to replicate failures (if testing is not traced)
And, do we really know?
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Variations of Exploratory Testing
Freestyle Pure scripted
Test object only
Test object, test steps, test
data
Test goals, constraints
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Formal Training in Exploratory Testing
• Experiment with 20 professionals [Micalef 2016]– with/without formal test training– 20 injected faults in e-commerce system– up to 40 minute session with eye-tracking device
TABLE IIIPARTICIPANT DEMOGRAPHICS AND BACKGROUND INFORMATION (G:
GENDER, A: AGE, E: YEARS OF EXPERIENCE IN A TESTING ROLE)
Carmen GeorgeID Domain G A E Domain G A E1 Design M 36 0 E-comm F 31 32 E-comm F 51 1 Payments M 25 3.53 Content M 30 0 E-comm M 39 134 E-comm M 34 0 Payments M 27 1.55 E-comm F 37 0 Telco M 32 36 Gaming M 26 1 Payments F 23 37 E-comm F 26 0 E-comm F 27 78 E-comm F 29 0 Payments M 31 29 E-comm F 24 0 Networks M 28 6
10 Virtualiz. M 24 2 Payments M 21 1Median 29.5 0 Median 27.5 3
Mean 31.7 0.4 Mean 28.4 4.3
were allowed to test the system (exploratory) for up to40 minutes whilst a researcher observed their activitiesfrom a remote monitor (and taking notes unobtrusively).The participant’s terminal was equipped with an eye-tracking device (Tobii X-120), a camera (with face-tracking capabilities) as well as a microphone (see Figure3). Participants were encouraged to think-aloud duringthe test session, commenting on specific bugs beingdiscovered as well as general issues with the system. TobiiStudio (Eye Tracking Software for Analysis) was usedto capture eye-gaze data together with mouse activity,audio and video streams. Most participants suggestedimprovements to the e-commerce site while others foundbugs that were not intentionally injected as part of thestudy. Think-aloud allows participants to focus on theirprimary task without the need to interrupt their workflowto log bugs (manually or online). A researcher tooknote of any points of interest (POIs) that may haveoccurred during test session (e.g., participant’s gaze wasfixed for a long time on a specific element) in whichcase the participant was invited for a reflective think-aloud (RTA) session right after the test session. Here theresearcher plays back portions of the session (POIs) tothe participant upon which a discussion ensues. This addsa second layer of understanding to the otherwise sterilegaze-data. Following the test session (and RTA, if appli-cable), a short debriefing exercise is conducted wherebythe researcher concludes the session through a short semi-structured interview (capturing participants’ biographicinformation, knowledge of exploratory testing strategiesas well as insights into their professional experience).
Data processing1) Reviewing raw data: Over 13 hours of gaze-point data
together with the corresponding audio and video streamswere reviewed systematically using a set of predefinedscoring sheets (see Figure 4). For each session theresearcher took note of a) predefined observable be-havioural patterns (e.g., tester hovered around the same
Fig. 3. Participant eye-tracking terminal (left) and remote monitoring station(right)
area where a bug was previously found), b) bugs reported(if at all) and c) corresponding timestamps (down to aminute by minute granularity).
Fig. 4. Scoring sheet used during the data processing stage. This helpedthe researcher to systematically annotate observed behavioural patterns (fromvideo, audio and gaze-data) together with bugs reported by each participant.
2) Data staging: Strategies were abstracted as a sequentialset of behavioural patterns (e.g., ES2 = [BP21, BP5,BP22, BP20]), therefore allowing the researcher to mapa series of observed participant behavioural patterns toa probability that a particular strategy was being usedduring a specific timeframe (knowingly or unknowingly).Furthermore the exploratory strategies selected for thisstudy were grouped into three broad categories: guidedstrategies, semi-guided strategies and unguided strategiesin descending order of rigour and technical know-howrequired for each strategy. Given that we were monitoringthe use of seven strategies by twenty participants in thestudy, it was decided that it would be more manageable togroup strategies together based on where they lay on theguided/unguided scale as depicted in Figure 5. Based onthis scale, ES1 and ES3 were classified as unguided, ES2,ES4 and ES7 were classified as guided whilst ES6 andES5 were classified as semi-guided because they exhibita balanced amount of characteristics from both extremes.The resulting information was synthesised in a tabularformat which represented a minute by minute log ofwhich tester type was using which strategy type and
309
Do Exploratory Testers Need Formal Training?
An Investigation Using HCI Techniques
Mark MicallefFaculty of ICTUniversity of [email protected]
Chris PorterFaculty of ICTUniversity of [email protected]
Andrea BorgFaculty of ICTUniversity of [email protected]
Abstract—Exploratory software testing is an activity which
can be carried out by both untrained and formally trained
testers. We personify the former as Carmen and the latter as
George. In this paper, we outline a joint research exercise between
industry and academia that contributes to the body of knowledge
by (1) proposing a data gathering and processing methodology
which leverages HCI techniques to characterise the differences in
strategies utilised by Carmen and George when approaching an
exploratory testing task; and (2) present the findings of an initial
study amongst twenty participants, ten formally trained testers
and another ten with no formal training. Our results shed light
on the types of strategies used by each type of tester, how they are
used, the effectiveness of each type of strategy in terms of finding
bugs, and the types of bugs each tester/strategy combination
uncovers. We also demonstrate how our methodology can be
used to help assemble and manage exploratory testing teams in
the real world.
I. INTRODUCTION
Carmen and George are both employed as software testers.
However, whilst George is formally trained and certified,
Carmen has no training but got the job because she is ‘good at
finding bugs’. A heated debate about whether or not software
testers should be formally trained and certified was sparked
by the announcement in 2011 of the development of a new
ISO standard on software testing [1] and further fuelled by
the perception that unlike other roles in software engineering,
testing can be carried out by anyone, providing they under-
stand an application’s domain. One side argues that certified
testers will approach testing with discipline and consistency
whilst the other argues that much like a driving license does
not make one a good driver, a testing certification does not
guarantee a good tester. Furthermore, end-users regularly find
and report bugs in systems even though they are not trained
as testing professionals. Testing encompasses a wide range of
skills (e.g., planning, design, automation, exploratory testing),
proficiency in most of which requires a certain level of formal
training.However, an interesting opportunity for joint research be-
tween academia and the industry presents itself when one
considers that exploratory testing can be carried out by both
trained and untrained testers. Exploratory testing involves a
software tester interacting with a system in an unscripted man-
ner guided mainly by her intuition and experience. Although it
is a recognised approach, the technique is frequently referred
to as ad-hoc testing and suffers from the reputation of deliv-
ering inconsistent results depending on the tester executing it.
Despite the fact that there are documented exploratory testing
strategies which can be utilised [2], effectiveness has been
shown to be dependent on the tester’s knowledge [3], learning
style [4] and even personality [5]. The debate as to whether
or not formal training is a positive or negative influence on
the quality of exploratory testing motivates our hypothesis.
A. Hypothesis and Research Questions
If one partitions testers into two broad groups such that one
group consists of testers with a formal qualification in software
testing, and the second group consists of testers with no such
formal training, then we hypothesise that the two groups of
testers intuitively use different yet complimentary exploratory
testing strategies. In order to explore this hypothesis, we
propose to investigate three research questions:
(RQ1) Which types of exploratory testing strategies are
utilised by testers in each group?
(RQ2) Which types of bugs are found by testers in each
group?(RQ3) Is there a link between the bugs found and the testing
strategies adopted by the tester groups?
B. Research ChallengeThe main research challenge here is instigated mostly by the
fact that our subjects of interest (software testers) probably
do not possess explicit and standardised knowledge of the
strategies that they utilise to do their testing. To use a medical
metaphor, whereas two doctors are highly likely to refer to
any number of medical procedures by their technical names
and understand each other perfectly well, very little such
standardisation exists in exploratory software testing. This
is further compounded by the fact that one of the groups
we want to study is not even formally trained and would
therefore be even less likely to know any jargon. Therefore,
we needed to design a methodology which non-intrusively
extracts information about which strategies are being used by
participants at any point in time.C. How can HCI Help?Human Computer Interaction (HCI) is an interdisciplinary
area of research and practice which has evolved throughout the
2016 IEEE International Conference on Software Testing, Verification and Validation Workshops
/16 $31.00 © 2016 IEEEDOI 10.1109/ICSTW.2016.31
305
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Do Exploratory Testing need Formal Training?
Fig. 5. The scale of test strategies - from unguided to guided
whether any bugs (by type) were reported. The datastaging phase produced the necessary level of detailrequired to conduct an in-depth analysis as discussed inthe following step.
Evaluation and Conclusions
1) Data analysis: Here, we leveraged preprocessed data inorder to produce outcomes and recommendations. Quan-titative tests on scoring data allowed us to identify di-verging a) behavioural patterns between tester categories,b) rigour in the use of test strategies and c) effectivenessof strategy use in terms of bugs discovered (and theirrespective categories). For this purpose pivot charts wereadopted so as to stay as close to the data as possible whilebeing able to transform the observed data to uncoverpotentially hidden patterns and correlations.
2) Expert evaluation: A questionnaire was sent out to around40 software quality assurance professionals to obtain ameasure of perceived importance for the various bugcategories covered in our study (i.e., which bugs, in orderof importance, should be reported and fixed prior to re-lease?). Finally we briefed a group of testing profession-als with our main results and interpretation thereof, andthese in turn contributed with their own practice-basedopinions, interpretations and recommendations. Thesetwo feedback loops were considered while formulatingour discussion (see Section V) as well as in our finalremarks (see Section VI) and potential future researchavenues (see Section VI-A).
3) Conclusions: A set of recommendations were finallyproduced with respect to the assembly and managementof exploratory testing teams, which could in turn improveefficiency and return on investment.
IV. RESULTS
As outlined in Section I-A, this work was driven bythree research questions which were designed to collectivelycharacterise the differences between how formally trainedtesters (George) and untrained testers (Carmen) approach anexploratory testing task. In this section, we present an outlineof the empirical results relating to each of these questions.We then go on to a synthesised discussion of these results inSection V.
TABLE IVDISTRIBUTION OF BUGS FOUND ACCORDING TO CATEGORY AND THE
TYPE OF TESTER THAT FOUND THEM.
Category Carmen George TotalContent Bugs 35 (54%) 30 (46%) 65Input Validation Bugs 6 (21%) 23 (79%) 29Logical Bugs 5 (50%) 5 (50%) 10Functional UI Bugs 10 (48%) 11 (52%) 21Nonfunctional UI Bugs 1 (11%) 8 (89%) 9
57 77 134
A. RQ1: Which types of strategies are use by testers in eachgroup?
1) Testers with no formal training (Carmen) overwhelminglyrely on unguided strategies (65%), using guided strategiesonly 31% of the time and semi-guided strategies for 4%of the time.
2) Formally trained testers (George) split their use of guidedand unguided strategies evenly (45% and 46% respec-tively) whilst making double the use of semi-guidedstrategies as untrained testers (9%).
3) Interestingly, whilst Carmen seems to always prefer un-guided strategies, George tends to alternate between them,using unguided strategies to find opportunities for moreguided testing (see Figure 6 and Figure 7).
Fig. 6. Strategy types collectively used by untrained testers (Carmen)throughout the 40 minute test sessionsg
Fig. 7. Strategy types collectively used by formally trained testers (George)throughout the 40 minute test sessions
B. RQ2: Which bugs are found by testers of each group?
1) We split bugs into five categories (see Section III-D).
310
w trainingw/o training
[Micalef 2016]
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
When to use exploratory vs scripted?
SPECIAL SECTION ON SOFTWARE STANDARDS AND THEIR IMPACT IN REDUCING
SOFTWARE FAILURESReceived March 15, 2018, accepted May 6, 2018, date of publication May 10, 2018, date of current version June 5, 2018.
Digital Object Identifier 10.1109/ACCESS.2018.2834957
Levels of Exploration in Exploratory Testing:From Freestyle to Fully ScriptedAHMAD NAUMAN GHAZI 1, KAI PETERSEN 1, ELIZABETH BJARNASON 2,
AND PER RUNESON 2, (Member, IEEE)1Department of Software Engineering, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden
2Department of Computer Science, Lund University, 221 00 Lund, Sweden
Corresponding author: Ahmad Nauman Ghazi ([email protected])This work was partly funded by the EASE Industrial Excellence Center for Embedded Applications Software Engineering,
(http://ease.cs.lth.se).
ABSTRACT Exploratory testing (ET) is a powerful and efficient way of testing software by integrating
design, execution, and analysis of tests during a testing session. ET is often contrasted with scripted testing
and seen as a choice of either exploratory testing or not. In contrast, we pose that exploratory testing can
be of varying degrees of exploration from fully exploratory to fully scripted. In line with this, we propose
a scale for the degree of exploration and define five levels. In our classification, these levels of exploration
correspond to the way test charters are defined. We have evaluated this classification through focus groups at
four companies and identified factors that influence the choice of exploration level. The results show that the
proposed levels of exploration are influenced by different factors such as ease to reproduce defects, better
learning, and verification of requirements and that the levels can be used as a guide to structure test charters.
Our study also indicates that applying a combination of exploration levels can be beneficial in achieving
effective testing.
INDEX TERMS Exploratory testing, test charter, test mission, session-based test management, levels of
exploration, exploratory testing classification, software testing.I. INTRODUCTIONAdvocates of exploratory testing (ET) stress the benefits of
providing the tester with freedom to act based on his/her
skills, paired with the reduced effort for test script design and
maintenance. ET can be very effective in detecting critical
defects [1]. We have found that exploratory testing can be
more effective in practice than traditional software testing
approaches, such as scripted testing [1], [2]. ET supports
testers in learning about the system while testing [1], [3].
The ET approach also enables a tester to explore areas of
the software that were overlooked while designing test cases
based on system requirements [4]. However, ET does come
with some shortcomings and challenges. In particular, ET can
be performed inmany different ways, and thus there is no one-
way of training someone to be an exploratory tester. Also,
exploratory testing tends to be considered an ad-hoc way of
testing and some argue that defects detected using ET are
difficult to reproduce [5].The benefits of exploratory testing are discussed both
within industry and academia, but only little work relates to
how to perform this kind of testing [6]. Bach introduced a
technique named Session Based Test Management (SBTM)
[7] that provides a basic structure and guidelines for ET using
test missions. In the context of SBTM, a test mission is an
objective to provide focus on what to test or what problems
to identify within a test session [7]. SBTM provides a strong
focus on designing test charters to scope exploration to the
test missions assigned to exploratory testers. A test charter
provides a clear goal and scopes the test session, and can
be seen as a high level test plan [8] thus the level of detail
provided in the test charter influences the degree of explo-
ration in the testing.However, little guidance exists on how to define test char-
ters in order to achieve different, or combine various degrees
of exploration. Though, there is a need in the industry to have
support for choosing the ‘‘right’’ degree of exploration see
e.g. [9]. In order to make an informed decision there is a need
to define what is meant by ‘‘degree of exploration.’’We pose that testing can be performed at varying degrees
of exploration from freestyle ET to fully scripted, and
propose a scale for the degree of exploration defined by
five distinct levels of exploration. In this paper, we present
a classification consisting of five levels of exploratory
testing (ET) ranging from free style testing to fully scripted
264162169-3536 ⌦ 2018 IEEE. Translations and content mining are permitted for academic research only.
Personal use is also permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. VOLUME 6, 2018
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Organization
CC BY 2.0 Theater der Künste
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Organization
• Support for decision making• Enhance teamwork• Independency• Balance testing – quality• Assist test management• Ownership of test technology• Resources utilization• Career path
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Hierarchical approach498 CHAPTER 16 TEST TEAM ORGANIZATION
Executivemanagement
Softwaregroup
Softwaredevelopers
System testgroup
Integrationtest group
Developmenttest group
Performancetest group
scalabilitytest group
Automationtest group
Sustainingtest group
Hardwaregroup
Figure 16.1 Structure of test groups.
value of the ratio depends upon the nature of the software under development. Avalue of the ratio is estimated during the development of a test planning document,and it has been discussed in Chapter 12.
Development Test Group The focus of this group is on the testing ofnew features in a particular release. This includes basic tests, functionality tests,robustness tests, interoperability tests, stress tests, load and stability tests, regressiontests, documentation tests , and business acceptance tests .
Performance Test Group This group puts emphasis on system perfor-mance. Tests are conducted to identify system bottlenecks and recommendationsare made to the developers for improving system performance. The group usestest, measurement, and analysis tools to carry out its tasks. This group may takeup additional responsibilities such as reliability testing.
Scalability Test Group The focus of this group is on determining whetheror not the system can scale up to its engineering limits. For example, a cellularphone network might have been designed with certain engineering limits in mind,such as the maximum number of base stations it can support, the maximum numberof simultaneous calls it can handle, and so on. The group tests whether the designedsystem can reach those limits. This group may take up additional responsibilitiessuch as load and stability testing.
Automation Test Group The responsibility of this group is to develop testautomation infrastructure, test libraries, and tests tools. This group assists othergroups in the development of automated test suites.
Sustaining Test Group This group maintains the software quality through-out the product’s market life. This team is responsible for maintaining the correctiveaspect of software maintenance. The group works very closely with customers andconducts regression testing of patch software.
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Team approach
Product developers
M
P P P P
P
T
Product developers
M
P P P T
PT
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Test Organisation[Kit p 166 ff]
Degrees of freedom– Tall or flat– Market or product– Centralized or decentralized– Hierarchical or diffused– Line or staff– Functional or project
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
7 approaches to test organisation1. Each person’s responsibility2. Each unit’s responsibility3. Dedicated resource4. Test organisation in QA5. Test organisation in development6. Centralized test organisation7. Test technology centre
[Kit, Software Testing in the Real World Ch 13, 1995]
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Stuckenbruck, L. C. (1979). The matrix organization. Project Management Quarterly, 10(3), 21–33.
Matrix organization
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Scaled Agile (LeSS, SAFe, Scrum@Scale)
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Which organization should we choose?
• Depending on– size– maturity– focus – localization
• Testing dimensions and testing organization?
• The solution is often a mixture of different approaches
unit
integration
system
efficiencymaintainability
functionality
white box black box
Level of detail
Accessibility
Characteristics
usabilityreliability
(module)
portability
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
High performing testers – Interview Study
• Experience and skills– With the product– In the domain– In programming– In testing techniques– Writes good defect
reports• Motivation
– Has a mission– Knows the importance
of testing
• Reflection– Maintain big picture– Understands the effects of
defects in production environment
– Independent, knows own skill and limits
– Criticizes product and process• Personal characteristics
– Thoroughness, patience or persistency, Accurateness
[Joonas Iivonen, Mika Mäntylä, Juha Itkonen: Characteristics of high performing testers: a case study. ESEM 2010]
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Test certificationhttp://istqb.org
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Recommended exercises
• Chapter 12– 11, 12, 13
• Chapter 16– 2, 3, 4, 9, 10