empirical usability testing in a component-based environment: improving test efficiency with...

Empirical Usability Testing in a Component-Based Environment:

Improving Test Efficiency with Component-Specific Usability Measures

Willem-Paul Brinkman Brunel University, London

Reinder Haakma Philips Research Laboratories Eindhoven

Don Bouwhuis Eindhoven University of Technology

TopicsTopics Research Motivation Testing Method Experimental Evaluation of the

Testing Method Conclusions

Research MotivationResearch MotivationStudying the usability of

a system

Research MotivationResearch Motivation

ExternalExternal ComparisonComparison relating differences in usability to differences in the systems

InternalInternal ComparisonComparison trying to link usability problems with parts of the systems

Component-Based Software Component-Based Software EngineeringEngineering

Create

S upport

Reuse

M anage

new components

components from repos ito ry

productsP roduct requirementsand exis ting so ftware

feedback

feedback

Multiple versions testing paradigm (external comparison)

Single version testing paradigm (internal comparison)

Manage

Support

Re-use

Create

Re-use

Research Research MotivationMotivation

PROBLEM

1. Only empirical analysis of the overall system such as task time, keystrokes, questionnaires etc - not powerful

2. Usability tests, heuristic evaluations, cognitive walkthroughs where experts identify problems – unreliable

SOLUTION

• Component-Specific usability measures: more powerful and reliable

Testing MethodTesting Method

Procedure Normal procedures of

a usability test User task which

requires interaction with components under investigation

Users must complete the task successfully

Component-specific component Component-specific component measuresmeasures

Perceived ease-of-use

Perceived satisfaction

Objective performance

Component-specific questionnaire helps the users to remember their interaction experience with a particular component





Perceived Usefulness and Ease-of-use questionnaire (David, 1989), 6 questions, e.g.

Learning to operate [name] would be easy for me.

I would find it easy to get [name] to do what I want it to do.

Unlikely Likely





Post-Study System Usability Questionnaire (Lewis, 1995)

The interface of [name] was pleasant.

I like using the interface of [name].

StronglyStrongly

disagree agree


Number of messages received directly, or indirectly from lower-level components.

The effort users put into the interaction




ComponentControl process

Control loop: Each message is a cycle of the control loop

Architectural ElementArchitectural ElementInteraction componentElementary unit of an interactive

system, on which behavioural-based evaluation is possible.

A unit within an application that can be represented as a finite state machine which directly, or indirectly via other components, receives signals from the user.

Users must be able to perceive or infer the state of the

interaction component.

A PC

A PC

A PC

Interactor

CNUCE model

C M V

V

MVC

PAC

Example of suitable agents-models

Interaction layersInteraction layers

15 + 23 =

15+23=

01111

10111

Add

100110

38

Processor

Editor

Control results

Control equation

User Calculator

15

15

15 +

15 +

15 + 23

15 + 23

38

38

Control LoopControl Loop

Evaluation

Component

User message

Feedback

Reference value

User

System

Lower Level Control Lower Level Control LoopLoop

User Calculator

Higher Level Control Higher Level Control LoopLoop

User Calculator

80 users8 mobile telephones3 components were manipulated

according to Cognitive Complexity Theory (Kieras & Polson, 1985)

1. Function Selector 2. Keypad3. Short Text Messages

Experimental Evaluation Experimental Evaluation of the Testing Methodof the Testing Method

Voice M ail

TelephoneR outer

Send text m essage

R ead text m essage

R ead address

list

Ed it Address

list

R ead D iary

Ed it D iary S tand-by C all

Keypad

M odeScreen

Function selector

M enuScreen

M ain Screen

Screen Screen Keyboard Keyboard Screen

Function keys, le ft key, right

key, m enu key, ok key, cance l

key

Backspace key0..9 keys, * key, # key,

M ode key

Function request, O k, C ancel

Letter, num ber,

cursor m ove

M ode restric tion

M ode

Letter

C haracters, C ursor

position,STM m enu

d irection

Keyboard

M enu direction

M enu icons

M ode sym bol

C haracters, cursor, S TM m enu icons

Function request, O k, C ancel, le tter, num ber,

cursor m ove, backspace key, function results

F low red irection,

function results

Architecture Mobile Architecture Mobile telephonetelephone

Send Text Message

Send Text Message Function

SelectorFunction Selector

KeypadKeypad

Evaluation study Evaluation study – Function – Function SelectorSelector

Versions:

Broad/shallow

Narrow/deep

Evaluation study Evaluation study – Keypad– Keypad

Versions

Repeated-Key Method

“L”

Modified-Model-Position method

“J”

Evaluation studyEvaluation study– Send Text – Send Text MessageMessage

Versions

Simple

Complex

Statistical Tests Statistical Tests

number of keystrokes

task time

0 8

x = sample mean (estimator of µ)

s = estimation of the standard deviation (σ)

sx = estimation of the standard error of the mean, sx2

= s2/n

Statistical Tests Statistical Tests

p-value: probability of making type I, or , error, wrongly rejecting the hypothesis that underlying distribution is the same.

Results Results – Function – Function SelectorSelector

Mean df Measure Broad Deep Hyp. Er. F p η2 Normal Joint measure — — 7 66 34.47 <0.001 0.80 Time in seconds 947 1394 1 72 29.56 <0.001 0.29 Number of keystrokes 461 686 1 72 37.72 <0.001 0.34 Number of messages received 67 265 1 72 155.34 <0.001 0.68 Ease of use mobile phone 5.5 4.8 1 72 11.86 0.001 0.14 Ease of use menu 5.6 4.5 1 72 22.33 <0.001 0.24 Satisfaction of mobile phone 4.4 3.8 1 72 4.25 0.043 0.06 Satisfaction of menu 4.6 3.5 1 72 15.96 <0.001 0.18 Correcteda Joint measure — — 2 71 60.96 <0.001 0.63 Number of keystrokes 437 602 1 72 20.27 <0.001 0.22 Number of messages received 52 190 1 72 75.36 <0.001 0.51

aCorrected for all a-priori differences between versions of the components.

Results of two multivariate analyses and related univariate analyses of variance with the version of the Function Selector as independent between-subjects variable.

Results Results – Keypad– Keypad

Results of multivariate and related univariate analyses of variance with the version of the Keypad as independent between-subjects variable.

Mean df Measure RK MMP Hyp. Er. F p η2 Normal Joint measure — — 7 66 4.05 0.001 0.30 Time in seconds 872 1083 1 72 9.44 0.003 0.12 Number of keystrokes 438 537 1 72 10.34 0.002 0.13 Number of messages received 233 271 1 72 13.92 <0.001 0.16 Ease of use mobile phone 5.3 5.0 1 72 1.07 0.305 0.02 Ease of use keyboard 5.6 4.9 1 72 11.13 0.001 0.13 Satisfaction of mobile phone 4.3 3.9 1 72 1.76 0.188 0.02 Satisfaction of keyboard 4.6 3.8 1 72 8.97 0.004 0.11

Results Results – Send Text – Send Text MessageMessage

Results of two multivariate analyses and related univariate analyses of variance with the version of the STM component as independent between-subjects variable

Mean df

Measure Simple Compl

ex Hyp. Er. F p η2

Normal Joint measure — — 7 66 18.16 <0.001 0.66 Time in seconds 523 672 1 72 8.15 0.006 0.10 Number of keystrokes 269 320 1 72 4.56 0.036 0.06 Number of messages received

12 49 1 72 74.18 <0.001 0.51

Ease of use mobile phone 5.0 5.3 1 72 1.15 0.288 0.02 Ease of use STM function 5.1 4.9 1 72 0.35 0.555 0.01 Satisfaction of mobile phone 3.9 4.2 1 72 0.93 0.339 0.01 Satisfaction of STM function 3.9 3.8 1 72 0.26 0.614 0.01 Correcteda Joint measure — — 2 71 20.85 <0.001 0.37 Number of keystrokes 249 289 1 72 2.30 0.134 0.03 Number of messages received

12 34 1 72 26.23 <0.001 0.27

aCorrected for all a-priori differences between versions of the components.

Power of component-specific Power of component-specific measuresmeasures

Statistical Power: 1 - β

Type II, or β, error: failing to reject the hypothesis when it is false


x = sample mean (estimator of µ)

s = estimation of the standard deviation (σ)

sx = estimation of the standard error of the mean, sx2

= s2/n


Statistical Power: 1 - β

Component-specific measure are less affected by usability problems users may or may not encounter with other part of the system

Component-specific measure are less affected by usability problems users may or may not encounter with other part of the system

Results- Power Results- Power AnalysisAnalysis

Average probability that a measure finds a significant (α = 0.05) effect for the usability difference between the two versions of FS, STM, or the Keypad components

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80

Number of subjects

Po

wer

1. messages received

2. corrected messagesreceived

3. task duration

4. keystrokes

5. corrected keystrokes

6. comp.-spec. ease-of-use7. comp.-spec. satisfaction

8. overall eas-of-use

9. overall satisfaction

ConclusionsConclusions

Component-Specific measure can be used to test the difference in usability between different versions of an interaction component1. Objective Performance Measure: Number of

messages received directly or indirectly via lower-level components

2. Subjective Usability Measures: Ease-Of-Use

and Satisfaction questionnaire Component-specific measures are potentially more powerful than overall usability measures

Questions / Discussion

Thanks for your attention

Layered Protocol TheoryLayered Protocol Theory(Taylor, 1988)

Component-Based Interactive Component-Based Interactive SystemsSystems

ReflectionReflection

1. Different lower level versions different effort involved when sending a message

2. Usability of a component can affect the interaction users have with other components Overall measure more powerful?

3. Can instrumentation code be inserted?

Limitations

Other Evaluation Methods

Exploitation of the Testing Method


1. Unit testing lacks the context of a real task

2. Sequential Data Analysis lacks direct link with higher layers

3. Not Event-Base Usability Evaluation lacks direct link with component

Limitations




1. Creation process Reducing the need to deal with a component each time when it is deployed

2. Re-use process Still needs final usability test

Limitations



Testing MethodTesting Method

Aim to evaluate the difference in usability between two or more versions of a component

empirical usability testing in a component-based environment: improving test efficiency with...

Documents

particular component

system slide

control loop slide

reuse slide

reliable slide

disagreeagree slide

unlikelylikely slide

componentbased environment