empirical usability testing in a component-based environment: improving test efficiency with...
Post on 20-Dec-2015
220 views
TRANSCRIPT
Empirical Usability Testing in a Component-Based Environment:
Improving Test Efficiency with Component-Specific Usability Measures
Willem-Paul Brinkman Brunel University, London
Reinder Haakma Philips Research Laboratories Eindhoven
Don Bouwhuis Eindhoven University of Technology
TopicsTopics Research Motivation Testing Method Experimental Evaluation of the
Testing Method Conclusions
Research MotivationResearch MotivationStudying the usability of
a system
Research MotivationResearch Motivation
ExternalExternal ComparisonComparison relating differences in usability to differences in the systems
InternalInternal ComparisonComparison trying to link usability problems with parts of the systems
Component-Based Software Component-Based Software EngineeringEngineering
Create
S upport
Reuse
M anage
new components
components from repos ito ry
productsP roduct requirementsand exis ting so ftware
feedback
feedback
Multiple versions testing paradigm (external comparison)
Single version testing paradigm (internal comparison)
Manage
Support
Re-use
Create
Re-use
Research Research MotivationMotivation
PROBLEM
1. Only empirical analysis of the overall system such as task time, keystrokes, questionnaires etc - not powerful
2. Usability tests, heuristic evaluations, cognitive walkthroughs where experts identify problems – unreliable
SOLUTION
• Component-Specific usability measures: more powerful and reliable
Testing MethodTesting Method
Procedure Normal procedures of
a usability test User task which
requires interaction with components under investigation
Users must complete the task successfully
Component-specific component Component-specific component measuresmeasures
Perceived ease-of-use
Perceived satisfaction
Objective performance
Component-specific questionnaire helps the users to remember their interaction experience with a particular component
Component-specific component Component-specific component measuresmeasures
Perceived ease-of-use
Perceived satisfaction
Objective performance
Perceived Usefulness and Ease-of-use questionnaire (David, 1989), 6 questions, e.g.
Learning to operate [name] would be easy for me.
I would find it easy to get [name] to do what I want it to do.
Unlikely Likely
Component-specific component Component-specific component measuresmeasures
Perceived ease-of-use
Perceived satisfaction
Objective performance
Post-Study System Usability Questionnaire (Lewis, 1995)
The interface of [name] was pleasant.
I like using the interface of [name].
StronglyStrongly
disagree agree
Component-specific component Component-specific component measuresmeasures
Number of messages received directly, or indirectly from lower-level components.
The effort users put into the interaction
Perceived ease-of-use
Perceived satisfaction
Objective performance
ComponentControl process
Control loop: Each message is a cycle of the control loop
Architectural ElementArchitectural ElementInteraction componentElementary unit of an interactive
system, on which behavioural-based evaluation is possible.
A unit within an application that can be represented as a finite state machine which directly, or indirectly via other components, receives signals from the user.
Users must be able to perceive or infer the state of the
interaction component.
A PC
A PC
A PC
Interactor
CNUCE model
C M V
V
MVC
PAC
Example of suitable agents-models
Interaction layersInteraction layers
15 + 23 =
15+23=
01111
10111
Add
100110
38
Processor
Editor
Control results
Control equation
User Calculator
15
15
15 +
15 +
15 + 23
15 + 23
38
38
Control LoopControl Loop
Evaluation
Component
User message
Feedback
Reference value
User
System
Lower Level Control Lower Level Control LoopLoop
User Calculator
Higher Level Control Higher Level Control LoopLoop
User Calculator
80 users8 mobile telephones3 components were manipulated
according to Cognitive Complexity Theory (Kieras & Polson, 1985)
1. Function Selector 2. Keypad3. Short Text Messages
Experimental Evaluation Experimental Evaluation of the Testing Methodof the Testing Method
Voice M ail
TelephoneR outer
Send text m essage
R ead text m essage
R ead address
list
Ed it Address
list
R ead D iary
Ed it D iary S tand-by C all
Keypad
M odeScreen
Function selector
M enuScreen
M ain Screen
Screen Screen Keyboard Keyboard Screen
Function keys, le ft key, right
key, m enu key, ok key, cance l
key
Backspace key0..9 keys, * key, # key,
M ode key
Function request, O k, C ancel
Letter, num ber,
cursor m ove
M ode restric tion
M ode
Letter
C haracters, C ursor
position,STM m enu
d irection
Keyboard
M enu direction
M enu icons
M ode sym bol
C haracters, cursor, S TM m enu icons
Function request, O k, C ancel, le tter, num ber,
cursor m ove, backspace key, function results
F low red irection,
function results
Architecture Mobile Architecture Mobile telephonetelephone
Send Text Message
Send Text Message Function
SelectorFunction Selector
KeypadKeypad
Evaluation study Evaluation study – Function – Function SelectorSelector
Versions:
Broad/shallow
Narrow/deep
Evaluation study Evaluation study – Keypad– Keypad
Versions
Repeated-Key Method
“L”
Modified-Model-Position method
“J”
Evaluation studyEvaluation study– Send Text – Send Text MessageMessage
Versions
Simple
Complex
Statistical Tests Statistical Tests
number of keystrokes
task time
0 8
x = sample mean (estimator of µ)
s = estimation of the standard deviation (σ)
sx = estimation of the standard error of the mean, sx2
= s2/n
Statistical Tests Statistical Tests
p-value: probability of making type I, or , error, wrongly rejecting the hypothesis that underlying distribution is the same.
Statistical Tests Statistical Tests
p-value: probability of making type I, or , error, wrongly rejecting the hypothesis that underlying distribution is the same.
Results Results – Function – Function SelectorSelector
Mean df Measure Broad Deep Hyp. Er. F p η2 Normal Joint measure — — 7 66 34.47 <0.001 0.80 Time in seconds 947 1394 1 72 29.56 <0.001 0.29 Number of keystrokes 461 686 1 72 37.72 <0.001 0.34 Number of messages received 67 265 1 72 155.34 <0.001 0.68 Ease of use mobile phone 5.5 4.8 1 72 11.86 0.001 0.14 Ease of use menu 5.6 4.5 1 72 22.33 <0.001 0.24 Satisfaction of mobile phone 4.4 3.8 1 72 4.25 0.043 0.06 Satisfaction of menu 4.6 3.5 1 72 15.96 <0.001 0.18 Correcteda Joint measure — — 2 71 60.96 <0.001 0.63 Number of keystrokes 437 602 1 72 20.27 <0.001 0.22 Number of messages received 52 190 1 72 75.36 <0.001 0.51
aCorrected for all a-priori differences between versions of the components.
Results of two multivariate analyses and related univariate analyses of variance with the version of the Function Selector as independent between-subjects variable.
Results Results – Keypad– Keypad
Results of multivariate and related univariate analyses of variance with the version of the Keypad as independent between-subjects variable.
Mean df Measure RK MMP Hyp. Er. F p η2 Normal Joint measure — — 7 66 4.05 0.001 0.30 Time in seconds 872 1083 1 72 9.44 0.003 0.12 Number of keystrokes 438 537 1 72 10.34 0.002 0.13 Number of messages received 233 271 1 72 13.92 <0.001 0.16 Ease of use mobile phone 5.3 5.0 1 72 1.07 0.305 0.02 Ease of use keyboard 5.6 4.9 1 72 11.13 0.001 0.13 Satisfaction of mobile phone 4.3 3.9 1 72 1.76 0.188 0.02 Satisfaction of keyboard 4.6 3.8 1 72 8.97 0.004 0.11
Results Results – Send Text – Send Text MessageMessage
Results of two multivariate analyses and related univariate analyses of variance with the version of the STM component as independent between-subjects variable
Mean df
Measure Simple Compl
ex Hyp. Er. F p η2
Normal Joint measure — — 7 66 18.16 <0.001 0.66 Time in seconds 523 672 1 72 8.15 0.006 0.10 Number of keystrokes 269 320 1 72 4.56 0.036 0.06 Number of messages received
12 49 1 72 74.18 <0.001 0.51
Ease of use mobile phone 5.0 5.3 1 72 1.15 0.288 0.02 Ease of use STM function 5.1 4.9 1 72 0.35 0.555 0.01 Satisfaction of mobile phone 3.9 4.2 1 72 0.93 0.339 0.01 Satisfaction of STM function 3.9 3.8 1 72 0.26 0.614 0.01 Correcteda Joint measure — — 2 71 20.85 <0.001 0.37 Number of keystrokes 249 289 1 72 2.30 0.134 0.03 Number of messages received
12 34 1 72 26.23 <0.001 0.27
aCorrected for all a-priori differences between versions of the components.
Power of component-specific Power of component-specific measuresmeasures
Statistical Power: 1 - β
Type II, or β, error: failing to reject the hypothesis when it is false
Power of component-specific Power of component-specific measuresmeasures
x = sample mean (estimator of µ)
s = estimation of the standard deviation (σ)
sx = estimation of the standard error of the mean, sx2
= s2/n
Power of component-specific Power of component-specific measuresmeasures
Statistical Power: 1 - β
Component-specific measure are less affected by usability problems users may or may not encounter with other part of the system
Component-specific measure are less affected by usability problems users may or may not encounter with other part of the system
Results- Power Results- Power AnalysisAnalysis
Average probability that a measure finds a significant (α = 0.05) effect for the usability difference between the two versions of FS, STM, or the Keypad components
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80
Number of subjects
Po
wer
1. messages received
2. corrected messagesreceived
3. task duration
4. keystrokes
5. corrected keystrokes
6. comp.-spec. ease-of-use7. comp.-spec. satisfaction
8. overall eas-of-use
9. overall satisfaction
ConclusionsConclusions
Component-Specific measure can be used to test the difference in usability between different versions of an interaction component1. Objective Performance Measure: Number of
messages received directly or indirectly via lower-level components
2. Subjective Usability Measures: Ease-Of-Use
and Satisfaction questionnaire Component-specific measures are potentially more powerful than overall usability measures
Questions / Discussion
Thanks for your attention
Layered Protocol TheoryLayered Protocol Theory(Taylor, 1988)
Component-Based Interactive Component-Based Interactive SystemsSystems
ReflectionReflection
1. Different lower level versions different effort involved when sending a message
2. Usability of a component can affect the interaction users have with other components Overall measure more powerful?
3. Can instrumentation code be inserted?
Limitations
Other Evaluation Methods
Exploitation of the Testing Method
ReflectionReflection
1. Unit testing lacks the context of a real task
2. Sequential Data Analysis lacks direct link with higher layers
3. Not Event-Base Usability Evaluation lacks direct link with component
Limitations
Other Evaluation Methods
Exploitation of the Testing Method
ReflectionReflection
1. Creation process Reducing the need to deal with a component each time when it is deployed
2. Re-use process Still needs final usability test
Limitations
Other Evaluation Methods
Exploitation of the Testing Method
Testing MethodTesting Method
Aim to evaluate the difference in usability between two or more versions of a component