cosc 426 lect. 7: evaluating ar applications

137
Lecture 7: Evaluating AR Lecture 7: Evaluating AR Applications Mark Billinghurst HIT Lab NZ University of Canterbury University of Canterbury

Upload: mark-billinghurst

Post on 29-Nov-2014

1.574 views

Category:

Technology


1 download

DESCRIPTION

A lecture on evaluating AR interfaces, from the graduate course on Augmented Reality, taught by Mark Billinghurst from the HIT Lab NZ at the University of Canterbury.

TRANSCRIPT

Page 1: COSC 426 Lect. 7: Evaluating AR Applications

Lecture 7: Evaluating AR Lecture 7: Evaluating AR Applicationspp

Mark BillinghurstgHIT Lab NZ

University of Canterbury University of Canterbury

Page 2: COSC 426 Lect. 7: Evaluating AR Applications

B ildi C lli AR E iBuilding Compelling AR Experiences

experiencesEvaluation

applications Interaction

tools Authoringtools Authoring

components Tracking, Display

Sony CSL © 2004

Page 3: COSC 426 Lect. 7: Evaluating AR Applications

Introduction

Page 4: COSC 426 Lect. 7: Evaluating AR Applications
Page 5: COSC 426 Lect. 7: Evaluating AR Applications

The Interaction Design Process

Page 6: COSC 426 Lect. 7: Evaluating AR Applications

The Interaction Design Process

Page 7: COSC 426 Lect. 7: Evaluating AR Applications

Why Evaluate AR Applications?To test and compare interfaces, new technologies, interaction techniquesTest Usability (learnability, efficiency, satisfaction,...)Get user feedbackGet user feedbackRefine interface designB tt d t d d Better undertsand your end users...

Page 8: COSC 426 Lect. 7: Evaluating AR Applications

Survey of AR PapersEdward Swan (2005)Edward Swan (2005)Surveyed major conference/journals (1992-2004)

P ISMAR ISWC IEEE VR- Presence, ISMAR, ISWC, IEEE VRSummary

1104 t t l 1104 total papers266 AR papers38 AR HCI papers (Interaction)38 AR HCI papers (Interaction)21 AR user studies

O l 21 f 266 AR h d f l t d Only 21 from 266 AR papers had a formal user study Less than 8% of all AR papers

Page 9: COSC 426 Lect. 7: Evaluating AR Applications

AR Papers

Page 10: COSC 426 Lect. 7: Evaluating AR Applications

HIT Lab NZ Usability SurveyA Survey of Evaluation Techniques Used in Augmented Reality Studies

Andreas Dünser, Raphaël Grasset, Mark pBillinghurst

reviewed publications from 1993 reviewed publications from 1993 and 2007

Extracted 6071 papers which mentioned p p“Augmented Reality”Searched to find 165 AR papers with User StudiesStudies

Page 11: COSC 426 Lect. 7: Evaluating AR Applications

350

400

450

200

250

300

50

100

150

01992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007

ACM Digital Library SpringerLinkIEEE Xplore Journals ScienceDirectSPIE Digital Library InformaWorldMIT Press Journals HighwireBlackwell Synergy Mary Ann LiebertWiley Interscience Sage Journals OnlineEmerald Insight Oxford JournalsCambridge Journals Online ASCE PublicationsJSTOR KargerWorldSciNet BioMed Central ASME Annual ReviewsNature Online MathSciNetNational Research Council of Canada Research Press (NRC) AdisOnline APS Journals (PROLA) Royal Society Publishing

Page 12: COSC 426 Lect. 7: Evaluating AR Applications
Page 13: COSC 426 Lect. 7: Evaluating AR Applications

Types of User Studies

Types of AR user studiesPerceptionpUser PerformanceCollaborationCollaborationUsability of Complete Systems

Page 14: COSC 426 Lect. 7: Evaluating AR Applications

Types of AR User Studies

Page 15: COSC 426 Lect. 7: Evaluating AR Applications

Types of Experimental Measures UsedTypes of Experimental Measures

Objective measuresSubjective measuresQualitative analysisU b l l hUsability evaluation techniquesInformal evaluations

Page 16: COSC 426 Lect. 7: Evaluating AR Applications

Types of Experimental Measures Used

Page 17: COSC 426 Lect. 7: Evaluating AR Applications

Summary

Over last 10 yearsMost user studies focused on user performancepFewest user studies on collaborationObjective performance measures most usedObjective performance measures most usedQualitative and usability measures least used

Page 18: COSC 426 Lect. 7: Evaluating AR Applications

Types of User Evaluation

Page 19: COSC 426 Lect. 7: Evaluating AR Applications

What is evaluation?

Evaluation is concerned with gathering data about the usability of a design or data about the usability of a design or product by a specified group of users for a particular activity within a specified environment or work contextenvironment or work context

Page 20: COSC 426 Lect. 7: Evaluating AR Applications

EvaluationGoal: Measure goodness of the application designTwo types:yp

Formative evaluation performed at different stages of development to check that the product meets users’ needs.Summative evaluation assesses the quality of a finished product.

F i F i E l iFocusing on Formative Evaluation

Page 21: COSC 426 Lect. 7: Evaluating AR Applications

When to evaluate?Once the application has been developed

pros : rapid development, small evaluation costcons : rectifying problems

During design and development

design implementation evaluation redesign &reimplementation

During design and developmentpros : find and rectify problems earlycons : higher evaluation cost, longer developmentcons : higher evaluation cost, longer development

design implementation

Page 22: COSC 426 Lect. 7: Evaluating AR Applications

Four evaluation paradigms

‘quick and dirty’q yusability testing (lab studies)field studiesfield studiespredictive evaluation

Page 23: COSC 426 Lect. 7: Evaluating AR Applications

Quick and dirty

‘quick & dirty’ evaluation: informal feedback from users or consultants to confirm that their ideas are in-users or consultants to confirm that their ideas are inline with users’ needs and are liked.Quick & dirty evaluations are done any timeQuick & dirty evaluations are done any time.Emphasis is on fast input to the design process rather th f ll d t d fi dithan carefully documented findings.

Page 24: COSC 426 Lect. 7: Evaluating AR Applications

Usability TestingRecording typical users’ performance on typical tasks in controlled settings. Field observations may be used.g yAs the users perform these tasks they are watched & recorded on video & their inputs are logged. This data is used to calculate performance times, errors & help explain why the users did what they did. User satisfaction questionnaires & interviews are used to elicit users’ opinions.

Page 25: COSC 426 Lect. 7: Evaluating AR Applications

Laboratory-based StudiesLaboratory-based studies

can be used for evaluating the design or the can be used for evaluating the design, or the implemented systemare carried out in an interruption-free usability labare carried out in an interruption-free usability labcan accurately record some work situations

di l ibl i l b isome studies are only possible in a lab environmentsome tasks can be adequately performed in a labare useful for comparing different designs in a controlled context

Page 26: COSC 426 Lect. 7: Evaluating AR Applications

Laboratory-based Studies

Controlled, instrumented environment

Page 27: COSC 426 Lect. 7: Evaluating AR Applications

Field StudiesField studies are done in natural settingsThe aim is to understand what users do naturally and yhow technology impacts them.In product design field studies can be used to:In product design, field studies can be used to:- identify opportunities for new technology- determine design requirements determine design requirements - decide how to introduce new technology- evaluate technology in useevaluate technology in use.

Page 28: COSC 426 Lect. 7: Evaluating AR Applications

Predictive EvaluationExperts apply their knowledge of typical users, guided by heuristics, to predict usability problems. guided by heuristics, to predict usability problems. Can involve theoretically based models. A k f t f di ti l ti i th t l A key feature of predictive evaluation is that real end users need not be presentRelatively quick and inexpensive

Page 29: COSC 426 Lect. 7: Evaluating AR Applications

Characteristics of ApproachesUsability testing

Field studies Predictive

U d k l l dUsers do task natural not involved

Location controlled natural anywhere

When prototype early prototype

Data quantitative qualitative problemsData quantitative qualitative problems

Feed back measures & errors

descriptions problemserrors

Type applied naturalistic expert

Page 30: COSC 426 Lect. 7: Evaluating AR Applications

Evaluation Approaches and MethodsMethod Usability

testingField studies Predictive

Ob iObserving x x

Asking users x x

Asking experts

x xexpertsTesting x

Modeling x

Page 31: COSC 426 Lect. 7: Evaluating AR Applications

DECIDE: A framework to guide evaluationA framework to guide evaluation

- Determine the goals the evaluation addresses.Determine the goals the evaluation addresses.- Explore the specific questions to be answered.

Ch th l ti p di d t h i- Choose the evaluation paradigm and techniques- Identify the practical issues.- Decide how to deal with the ethical issues.- Evaluate, interpret and present the data.Evaluate, interpret and present the data.

Page 32: COSC 426 Lect. 7: Evaluating AR Applications

DECIDE FrameworkD G l Determine Goals:

What are the high-level goals of the evaluation?How wants the evaluation and why?How wants the evaluation and why?

Explore the Questions:Create well defined, relevant questionsq

Choose the Evaluation ParadigmInfluences the techniques used, how data is analyzed

Identify Practical IssuesHow to select users, stay on budget & scheduleHow to find evaluators select equipmentHow to find evaluators, select equipment

Page 33: COSC 426 Lect. 7: Evaluating AR Applications

DECIDE FrameworkDecide on Ethical IssuesDecide on Ethical Issues

Informed consent formParticipants have a right to:

k th l f th t d d h t ill h t th fi di- know the goals of the study and what will happen to the findings- privacy of personal information

Evaluate, Interpret and Present Data, p

- Reliability: can the study be replicated?- Validity: is it measuring what you thought?y g y g- Biases: is the process creating biases?- Scope: can the findings be generalized?

E l i l lidit i th i t i fl i th lt ? - Ecological validity: is the environment influencing the results?

Page 34: COSC 426 Lect. 7: Evaluating AR Applications

Usability Testing

Page 35: COSC 426 Lect. 7: Evaluating AR Applications

Pilot StudiesA small trial run of the main study.

Can identify majority of issues with interface designCan identify majority of issues with interface designPilot studies check:- that the evaluation plan is viablep- you can conduct the procedure- that interview scripts, questionnaires, experiments, etc. work appropriatelyIron out problems before doing the main study.

Page 36: COSC 426 Lect. 7: Evaluating AR Applications

Controlled experimentsDesigner of a controlled experiment should carefully consider

proposed hypothesisselected subjectsmeasured variablesexperimental methodsd ll idata collectiondata analysis

Page 37: COSC 426 Lect. 7: Evaluating AR Applications

V i blVariablesExperiments manipulate and measure variables under Experiments manipulate and measure variables under controlled conditionsThere are two types of variables

independent: variables that are manipulated to create different experimental conditions

- e.g. number of items in menus, colour of the icons

dependent: variables that are measured to find out the effects of changing the independent variables

- e.g. speed of menu selection, speed of locating icons

Test ConditionsThe levels, values, or settings for an independent variableE lExample

- test conditions: HMD, Handheld device 1, Handheld device 2

Page 38: COSC 426 Lect. 7: Evaluating AR Applications

“Other” VariablesControl variables

e.g. room light, noise…if controlled => less external validity

Random variables (not controlled)e.g. fatiguemore influence of random variable => less internal validity

Confounding variables practicepprevious experience

Page 39: COSC 426 Lect. 7: Evaluating AR Applications

HypothesisA hypothesis is a prediction of the outcome

what will happen to the dependent variables when the independent variables are changedto show that the prediction is right

d d t i bl d ’t h b h i - dependant variables don’t change by changing the independent variables

- rejecting the null hypothesis (H0 )j g yp ( 0 )

Page 40: COSC 426 Lect. 7: Evaluating AR Applications

Experimental methodsIt is important to select the right experimental method so that the results of the experiment can be generalizedThere are mainly two experimental methodsy p

between-groups: each subject is assigned to one experimental conditionwithin-groups: each subject performs under all the different conditions

Page 41: COSC 426 Lect. 7: Evaluating AR Applications

Experimental methodsBetween-groups Within-groups

Subjects

g p

Subjects

g p

Randomlyassigned

Randomlyassigned

erim

enta

l tas

k

Condition2

Condition3

Condition1

rimen

tal t

asks

Condition2

Condition1

rimen

tal t

asks

Condition1

Condition2

rimen

tal t

asks

Condition1

Condition3

Expe

data data data data data data

Expe

r

Condition3 Ex

per

Condition3 Ex

per

Condition2

Statistical data analysis

data data data

Statistical data analysis

data data data

Page 42: COSC 426 Lect. 7: Evaluating AR Applications

Within vs. Between Subjectsbetween subjects design

each participant is tested on only one level/conditiona separate group of participants is used for each condition

- one group uses HMD other group uses Handheld device

within subjects designparticipant is tested on each level/conditionparticipant is tested on each level/condition

- e.g. participants use Handheld device and HMD

repeated measurement

Page 43: COSC 426 Lect. 7: Evaluating AR Applications

Between SubjectsSometimes a factor must be between subjects

e.g. gender, age, experience

Between subjects advantage: avoids interference effects (e.g. practice / learning effect)

Between subjects disadvantage:Increased variability = need more subjectsy j

Important: randomised assignment to conditions

Page 44: COSC 426 Lect. 7: Evaluating AR Applications

Within SubjectsSometimes a factor must be within subjects

e.g. measuring learning effects

Within subjects advantagesless participants needed (all participants in all conditions)p p ( p p )differences (variability) between subjects the same across test conditions

Counterbalance order of presenting conditions A => B => C B => C => A C => A => BA B C B C A C A B

The order is best governed by a Latin Square

Page 45: COSC 426 Lect. 7: Evaluating AR Applications

Latin Square Designeach condition occurs once in each row and column

Note: In a balanced Latin Square each condition both d d f ll h th diti l precedes and follows each other condition an equal

number of times

Page 46: COSC 426 Lect. 7: Evaluating AR Applications

SubjectsTh h f b l h l d f h The choice of subjects is critical to the validity of the results of an experiment

bj t h ld b t ti f th subjects group should be representative of the expected user population

In selecting the subjects it is important to considerIn selecting the subjects it is important to considerthings such as their

age group, education, skills, cultureg g pHow does the sample influence the results?

Report the selection criteria and give relevant demographic information in your publication

Page 47: COSC 426 Lect. 7: Evaluating AR Applications

SubjectsH ?How many participants?

How big is the effect you want to measure?l ff b d d i h ll l- large effects can be detected with smaller samples

- e.g. small n needed to discriminate speed between turtles and a rabbits

The more participants the “smoother” the datap p- Central Limit Theorem - as n increases (n>30) the sample mean

approaches a normal distributionextreme data has less influence (e g one sleepy participants does not - extreme data has less influence (e.g. one sleepy participants does not mess up the results that much)

for quantitative analysis: rule of thumb MINIMUM q y15-20 or more per group/cell

Page 48: COSC 426 Lect. 7: Evaluating AR Applications

Data Collection and Analysis

The choice of a method is dependent on the type of d h d b ll ddata that needs to be collectedIn order to test a hypothesis the data has to be

l d l h danalysed using a statistical methodThe choice of a statistical method depends on the type of collected data

All the decisions about an experiment should be made before it is carried out

Page 49: COSC 426 Lect. 7: Evaluating AR Applications

Observe and MeasureObservations are gathered…

manually (human observers)automatically (computers, software, cameras, sensors, etc.)

A measurement is a recorded observationObjective metricsjSubjective metrics

Page 50: COSC 426 Lect. 7: Evaluating AR Applications

Typical objective metricsk l i itask completion time

errors (number, percent,…)percent of task completedpercent of task completedratio of successes to failuresnumber of repetitionsnumber of repetitionsnumber of commands usednumber of failed commandsnumber of failed commandsphysiological data (heart rate,…)…

Page 51: COSC 426 Lect. 7: Evaluating AR Applications

Typical subjective metricsuser satisfactionsubjective performancej pratingsease of useease of useintuitivenessjudgments…

Page 52: COSC 426 Lect. 7: Evaluating AR Applications

Data TypesSubjectiveSubjective

Subjective survey- Likert Scale, condition rankings

How easy was the task

1 2 3 4 5Observations

- Think Aloud

Interview responses

1 2 3 4 5Not very easy Very easy

Interview responses

ObjectivePerformance measurese o a ce easu es

- Time, accuracy, errors

Process measuresVid / di l i- Video/audio analysis

Page 53: COSC 426 Lect. 7: Evaluating AR Applications

E erimental Meas resExperimental MeasuresMeasure What does it tell us? How is it measured?

Timings Performance Via a stopwatch, orautomatically by the device.

Errors Performance, Particular sticking points in a task By success in completing the task correctly. Through experimentercorrectly. Through experimenter observation, examining the route walked.

Perceived Workload Effort invested. User satisfaction Through NASA TLX scales and other i iquestionnaires.

Distance traveled and route taken

Depending on the application, these can be used to pinpoint errors and to indicate performance

Using a pedometer, GPS or other location-sensing system. By experimenter observation.

Percentage preferred walking speed

Performance By finding average walking speed, which is compared with normal walking speed.

Comfort User satisfaction. Device acceptability Comfort Rating Scale and other questionnaires.

User comments and preferences

User satisfaction and preferences. Particular sticking points in a task.

Through questionnaires, interviews and think-alouds.preferences sticking points in a task. think alouds.

Experimenter observations Different aspects, depending on the experimenter and on the observations

Through observation and note-taking

Page 54: COSC 426 Lect. 7: Evaluating AR Applications

Statistical AnalysisOnce data is collected statistics can be used for analysisTypical Statistical Techniquesyp q

Comparing between two results- Unpaired T-Test (for between subjects – assumes normal distribution, interval

l h it f i )scale, homogeneity of variances)- Paired T-Test (for within subjects – assumes normal distribution, etc.)- Mann–Whitney U-test (between subjects – if assumptions are not met)

Comparing between > two results- Analysis of Variance – ANOVA

F ll d b t h l i B f i dj t t- Followed by post-hoc analysis – Bonferroni adjustment- Kruskal–Wallis (does not assume normal distribution)

Page 55: COSC 426 Lect. 7: Evaluating AR Applications

Running the studyOffl d B !Offload your Brain!

Write down instructions h kli tprepare checklists

create templatesprint and pitch important informationprint and pitch important information

Try and find an assistantPrint questionnaires and other Print questionnaires and other documents the day beforeRehearse procedures - 4 kg in 2 weeksRehearse procedures. Bring your lunch – don’t forget to eat

4 kg in 2 weeks

Page 56: COSC 426 Lect. 7: Evaluating AR Applications

Running the studyTreat the participants nicelyPrepare candy and drinks and make them feel good. p y gTake the role of a friendly waiter:

Always stay in background but offer assistance if needed.Always stay in background but offer assistance if needed.

Take notes, document oddities.Nothing is as bad as lost data!! Nothing is as bad as lost data!!

AVOID AVOID AVOID

Page 57: COSC 426 Lect. 7: Evaluating AR Applications

Running the studyTake many photos of your setup in action. Prepare consent forms if you want to use pictures p y pfor publications.

Page 58: COSC 426 Lect. 7: Evaluating AR Applications

Field Studies

Page 59: COSC 426 Lect. 7: Evaluating AR Applications

F ld S dField StudiesField studies are done in natural settingsField studies are done in natural settings.“in the wild” is a term for prototypes being used freely in natural settingsfreely in natural settings.Aim to understand what users do naturally and how technology impacts themtechnology impacts them.Field studies are used in product design to:- identify opportunities for new technology;- identify opportunities for new technology;- determine design requirements; - decide how best to introduce new technology;gy;- evaluate technology in use.

59 www.id-book.com

Page 60: COSC 426 Lect. 7: Evaluating AR Applications

ObservationDi b i i h fi ldDirect observation in the field

Structuring frameworksDegree of participation (insider or outsider)Degree of participation (insider or outsider)Ethnography

Direct observation in controlled environmentsDirect observation in controlled environmentsIndirect observation: tracking users’ activities

DiariesInteraction logging

Page 61: COSC 426 Lect. 7: Evaluating AR Applications

Ethnography• Ethnography is a philosophy with a set of techniques that

include participant observation and interviews• Ethnographers immerse themselves in the culture studied

• Need cooperation of people being studied

• A researcher’s degree of participation can vary along a scale from ‘outside’ to ‘inside’A l i id d d l b i i• Analyzing video and data logs can be time-consuming• Can use continuous data analysis

• Collections of comments incidents and artifacts are made • Collections of comments, incidents, and artifacts are made

Page 62: COSC 426 Lect. 7: Evaluating AR Applications

Direct observation in a controlled settingg

Think-aloud technique

Indirect observation

DiariesInteraction logsCultural probesCultural probes

Page 63: COSC 426 Lect. 7: Evaluating AR Applications

Structuring frameworks to guide observation- The person. Who? - The place. Where?p- The thing. What?

The Goetz and LeCompte (1984) framework:The Goetz and LeCompte (1984) framework:- Who is present? - What is their role? - What is happening? - Where is it happening? - Why is it happening? - How is the activity organized?

Page 64: COSC 426 Lect. 7: Evaluating AR Applications

Predictive Evaluation

Page 65: COSC 426 Lect. 7: Evaluating AR Applications

Predictive ModelsProvide a way of evaluating products or designs without directly involving users.without directly involving users.Less expensive than user testing.Usefulness limited to systems with predictable tasksUsefulness limited to systems with predictable tasks

e.g., telephone answering systems, mobiles, etc.

Based on expert error-free behaviorBased on expert error-free behavior.

Page 66: COSC 426 Lect. 7: Evaluating AR Applications

Fitts’ Law (Fitts, 1954)

Fitts’ Law predicts that the time to point at an object using a device is a function of the distance from the target using a device is a function of the distance from the target object and the object’s size. The further away and the smaller the object, the longer h l d the time to locate it and point to it.

Page 67: COSC 426 Lect. 7: Evaluating AR Applications

GOMS ModelG l h h hi fi d Goals - the state the user wants to achieve e.g., find a website.Operators - the cognitive processes and physical actions Operators - the cognitive processes and physical actions needed to attain the goals

Eg moving mouse to select icong gMethods - the procedures for accomplishing the goals, e.g., drag mouse over icon, click on button.Selection rules - decide which method to select when there is more than one.

Page 68: COSC 426 Lect. 7: Evaluating AR Applications

GOMS Response Times (Card et al., 1983)

Operator Description Time (sec)K Pressing a single key or buttong g y

Average skilled typist (55 wpm)Average non-skilled typist (40 wpm)Pressing shift or control keyTypist unfamiliarwiththekeyboard

0.220.280.08120Typist unfamiliar with the keyboard 1.20

P Pointing with a mouse or other device on adisplay to select an object.This value is derived from Fitts’ Law which is

0.40

P1discussed below.Clicking the mouse or similar device 0.20

H Bring ‘home’ hands on the keyboard or otherdevice

0.40device

M Mentally prepare/respond 1.35R(t) The response time is counted only if it causes

the user to wait.t

Page 69: COSC 426 Lect. 7: Evaluating AR Applications

Expert InspectionsSeveral kindsExperts use their knowledge of users and technology to review application usability.Expert critiques can be formal or informal reports.H i ti l ti i i id d b t f h i tiHeuristic evaluation is a review guided by a set of heuristics

Eg: Visibility of system status Jacob Nielsen’s heuristics (1990s)Jacob Nielsen s heuristics (1990s)

Walkthroughs involve stepping through a pre-planned scenario noting potential problems

Eg load AR model, scale it twice the size, add new model, etc

Page 70: COSC 426 Lect. 7: Evaluating AR Applications

Nielsen’s heuristicsVisibility of system statusVisibility of system status.Match between system and real world.User control and freedomUser control and freedom.Consistency and standards.E Error prevention. Recognition rather than recall.Flexibility and efficiency of use.Aesthetic and minimalist design.gHelp users recognize, diagnose, recover from errors.Help and documentation.Help and documentation.

Page 71: COSC 426 Lect. 7: Evaluating AR Applications

Three Stages for Doing Heuristic Evaluation

1/ Briefing session to tell experts what to do.2/ E l i i d f 1 2 h i hi h2/ Evaluation period of 1-2 hours in which:

Each expert works separately;Take one pass to get a feel for the product;Take one pass to get a feel for the product;Take a second pass to focus on specific features.

3/ Debriefing session in which experts work together to 3/ Debriefing session in which experts work together to prioritize problems.

Page 72: COSC 426 Lect. 7: Evaluating AR Applications

No. of evaluators & problems

Page 73: COSC 426 Lect. 7: Evaluating AR Applications

Advantages and ProblemsFew ethical and practical issues to consider because users not involved.Can be difficult and expensive to find experts.Best experts have knowledge of application domain and users.Biggest problems:

Important problems may get missed;Important problems may get missed;Many trivial problems are often identified;Experts have biases.

Page 74: COSC 426 Lect. 7: Evaluating AR Applications

Case Studies

Page 75: COSC 426 Lect. 7: Evaluating AR Applications

Types of AR ExperimentsPerception

How is virtual content perceived ?pWhat perceptual cues are most important ?

InteractionHow can users interact with virtual content ?Which interaction techniques are most efficient ?

CollaborationHow is collaboration in AR interface different ?Which collaborative cues can be conveyed best ?

Page 76: COSC 426 Lect. 7: Evaluating AR Applications

PerceptionCentral goal of AR systems is to fool the human perceptual

systemDi l M dDisplay Modes

Direct ViewStereo VideoStereo VideoStereo graphics

Multi-modal displayMulti-modal displayDifferent objects with different display modesPotential for depth cue conflictp

Page 77: COSC 426 Lect. 7: Evaluating AR Applications

Perceptual User StudiesDepth / Distance Studies

Estimate distance to objectJudge relative proximity

Object localizationjMatch physical and virtual object positions

Diffi ltiDifficultiesPrecise alignment / calibration of displaysL i h d t ki ( t ti i )Lag in head tracking (use static images)

Page 78: COSC 426 Lect. 7: Evaluating AR Applications

Layar – www.layar.com

Page 79: COSC 426 Lect. 7: Evaluating AR Applications

Outdoor AR: Limited Field of View

Page 80: COSC 426 Lect. 7: Evaluating AR Applications

P ibl l iPossible solutionsOverview + Detail

spatial separation; two views

Focus + Contextmerges both views into one view

Zoomingtemporal separation

Page 81: COSC 426 Lect. 7: Evaluating AR Applications

Z i ViTU G HIT L b NZ ll b ti

Zooming ViewsTU Graz – HIT Lab NZ - collaboration

Zooming panoramaZ i MZooming Map

Page 82: COSC 426 Lect. 7: Evaluating AR Applications

Z i AR i fZooming AR interfaces

Context Compass Zooming Panorama Zooming MapContext Compass Zooming Panorama Zooming Map

Interface TypesC (C)Compass (C)Compass + Zooming Panorama (CP)Compass + Zooming Map (CM)p g p ( )Compass, Zooming Panorama, Zooming Map (CPM)

Page 83: COSC 426 Lect. 7: Evaluating AR Applications

Experiment Evaluation

20 subjects (10 M/ 10 F)Café finding taskg

Task 1: Find particular café named “Alpha”Task 2: Find closest café

Experiment measuresTime to complete taskTime to complete taskAngular distance panned aroundSubjective survey feedback j y

Page 84: COSC 426 Lect. 7: Evaluating AR Applications

Performance Time

Page 85: COSC 426 Lect. 7: Evaluating AR Applications

Distance Panned

Page 86: COSC 426 Lect. 7: Evaluating AR Applications

ResultsCompass good for search, but not comparisonZooming (P or M) aids comparison g ( ) pInformation has significant effectCompass requires more panningCompass requires more panningUser felt compass alone wasn’t useful

Page 87: COSC 426 Lect. 7: Evaluating AR Applications

Interaction StudiesStages of Interface Development• Prototype Demonstration• Adoption of Interaction techniques from other interface

metaphors • Development of new interface metaphors appropriate to

the medium• Development of formal theoretical models for predicting

and modeling user interactions

Page 88: COSC 426 Lect. 7: Evaluating AR Applications

Fitt’s Law (1964)Relates Movement Time to Index of Difficulty

MT = a + b log2(2A/W)

where log2(2A/W) = ID

Robust under most circumstancesobject tracking tapping tasks movement tasksobject tracking, tapping tasks, movement tasks

Page 89: COSC 426 Lect. 7: Evaluating AR Applications

Interaction Study - ReachingMason, A. et. al. (2001). Reaching Movements to Augmented and Graphic Objects in Virtual Environments. Proc. CHI 2001.

D Fitt’ L h ld i i iti t k?Does Fitt’s Law hold in an acquisition task?Does Fitt’s Law hold when reaching for virtual objects ?D F ’ L h ld h ’ h d ?Does Fitt’s Law hold when you can’t see your hand ?

Page 90: COSC 426 Lect. 7: Evaluating AR Applications

Experimental SetupEnhanced Virtual Hand LabHalf Silvered MirrorShutter GlassesOPTOTRAK optical trackerp

IREDs worn on wrist, object

Four target cubesg

Conditions:Cube size arm visibility real/virtual objectsCube size, arm visibility, real/virtual objects

Page 91: COSC 426 Lect. 7: Evaluating AR Applications

Kinematic MeasuresMovement TimePeak Velocity of WristyTime to Peak Velocity of the WristPercent Time from Peak Velocity of the WristPercent Time from Peak Velocity of the Wrist

Page 92: COSC 426 Lect. 7: Evaluating AR Applications

Results – Movement Time

Page 93: COSC 426 Lect. 7: Evaluating AR Applications

Results – Velocity Profiles

Page 94: COSC 426 Lect. 7: Evaluating AR Applications

AR NavigationMany commercial AR browsers

Information in placeHow to navigate to POI

Page 95: COSC 426 Lect. 7: Evaluating AR Applications

2D vs. AR Navigation?

VS

Page 96: COSC 426 Lect. 7: Evaluating AR Applications

AR Navigation StudyUsers navigate between Points of InterestUsers navigate between Points of InterestThree conditions

AR U i l AR iAR: Using only an AR view2D-map: Using only a top down 2D map viewAR+2D-map: Using both an AR and 2D map view

Experiment MeasuresQuantitative

- Time taken, Distance travelled

Qualitative - Experimenter observations, Navigation behavior, Interviews

U kl d (NASA TLX)- User surveys, workload (NASA TLX)

Page 97: COSC 426 Lect. 7: Evaluating AR Applications

HIT Lab NZ Test Platform – AR View

Page 98: COSC 426 Lect. 7: Evaluating AR Applications

HIT Lab NZ Platform – Map View

Page 99: COSC 426 Lect. 7: Evaluating AR Applications

Distance and Time

No significant differences

Page 100: COSC 426 Lect. 7: Evaluating AR Applications

Paths Travelled

Red – ARBlue – AR + MapBlue AR MapYellow - Map

Page 101: COSC 426 Lect. 7: Evaluating AR Applications

Navigation BehaviourD d i t fDepends on interface

Map doesn’t show short cutscuts

Page 102: COSC 426 Lect. 7: Evaluating AR Applications

Survey Responses

Page 103: COSC 426 Lect. 7: Evaluating AR Applications

User CommentsAR“ d ' k l h ll f h i ”“you don't know exactly where you are all of the time.”“using AR I found it difficult to see where I was going”

MMap“you were able to get a sense of where you were”“ t ll bl t th ph i l bj t d ”“you are actually able to see the physical objects around you”

AR+MAP“I d th p t th b i i t d t d h th “I used the map at the beginning to understand where the buildings were and the AR between each point”“You can choose a direction with AR and find the shortest way You can choose a direction with AR and find the shortest way using the map.”

Page 104: COSC 426 Lect. 7: Evaluating AR Applications

Usability IssuesScreen readability in sunlightGPS inaccuracies Compass errorsTouch screen difficultiesTouch screen difficultiesNo routing information

Page 105: COSC 426 Lect. 7: Evaluating AR Applications

Lessons LearnedUser adapt navigation behaviour to guide type

AR interface shows shortcutsMap interface good for planning

Include map view in AR interface2D exocentric, and 3D egocentric

Allow people to easily change between viewsp p y gMay use Map far away, AR close

Difficult to accurately show depthy p

Page 106: COSC 426 Lect. 7: Evaluating AR Applications

Collaboration StudiesRemote Conferencing

Face to Face Collaboration

Page 107: COSC 426 Lect. 7: Evaluating AR Applications

Remote AR Conferencing

Moves conferencing from the desktop to the workspace

Page 108: COSC 426 Lect. 7: Evaluating AR Applications

Pilot StudyHow does AR conferencing differ ?

Taskdiscussing images12 pairs of subjects

Conditionsaudio only (AC)y ( )video conferencing (VC)mixed reality conferencing (MR)

Page 109: COSC 426 Lect. 7: Evaluating AR Applications

Sample Transcript

Page 110: COSC 426 Lect. 7: Evaluating AR Applications

Transcript Analysis

Users speak most in Audio Only conditionMR fewest words/min and interruptions/minMore results needed

Page 111: COSC 426 Lect. 7: Evaluating AR Applications

Presence and CommunicationPresence Rating (0-100)

8090

100

40506070

0102030

Could tell when Partner was Concentrating14

AC VC MR

8

10

12

0

2

4

6

AC VC MR

Page 112: COSC 426 Lect. 7: Evaluating AR Applications

Subjective CommentsPaid more attention to pictures Remote video provided peripheral cuesRemote video provided peripheral cuesIn AR condition

Difficult to see everythingDifficult to see everythingRemote user distractingCommunication asymmetriesCommunication asymmetries

Page 113: COSC 426 Lect. 7: Evaluating AR Applications

Face to Face CollaborationCompare two person collaboration in:

Face to Face, AR, Projection Display

TaskUrban design logic puzzleUrban design logic puzzle

- Arrange 9 building to satisfy 10 rules in 7 minutes

SubjectsSubjectsWithin subjects study (counter-balanced)12 pairs of college students12 pairs of college students

Page 114: COSC 426 Lect. 7: Evaluating AR Applications

Face to Face Condition

Moving Model Buildings

Page 115: COSC 426 Lect. 7: Evaluating AR Applications

AR Condition

Cards with AR ModelsCards with AR ModelsSVGA AR Display (800x600)Video see-through ARg

Page 116: COSC 426 Lect. 7: Evaluating AR Applications

Projection Condition

Tracked Input Devices

Page 117: COSC 426 Lect. 7: Evaluating AR Applications

Task Space Separation

Page 118: COSC 426 Lect. 7: Evaluating AR Applications

Interface ConditionsFtF AR Projection

User Viewpoint Independent Private PublicpEasy to change Independent

Easy to changeCommonDifficult to change

Limited FOV

Interaction Two handedNatural object manipulation

Two handedTangible AR techniques

Mouse-basedOne-handedTime-multiplexedmanipulation

Space-multiplexedtechniquesSpace-multiplexed

Time-multiplexed

Page 119: COSC 426 Lect. 7: Evaluating AR Applications

Hypothesis

Collaboration with AR technology will produce behaviors that are more like natural face-to-face collaboration than from using a screen-face collaboration than from using a screen

based interface.

Page 120: COSC 426 Lect. 7: Evaluating AR Applications

MetricsSubjective

Evaluative survey after each conditionEvaluative survey after each conditionForced-choice survey after all conditionsPost experiment interviewPost experiment interview

ObjectivejCommunication measures

- Video transcriptionp

Page 121: COSC 426 Lect. 7: Evaluating AR Applications

Measured ResultsPerformance

AR collaboration slower than FtF + Projectionj

CommunicationPointing/Picking gesture behaviors same in AR as FtFPointing/Picking gesture behaviors same in AR as FtFDeictic speech patterns same in AR as FtF

- Both significantly different than Projection conditiong y j

SubjectiveFtF easier to work together and understandFtF easier to work together and understandInteraction in AR easier than Proj. and same as FtF

Page 122: COSC 426 Lect. 7: Evaluating AR Applications

Deictic Expressions

25%

30%

15%

20%

5%

10%

0%FtF Proj AR

Significant difference – ANOVA, F(2,33) = 5.77, P < 0.01No difference between FtF and AR

Page 123: COSC 426 Lect. 7: Evaluating AR Applications

Ease of Interaction

S f d ffSignificant differencePick - F(2,69) = 37.8, P < 0.0001Move - F(2,69) = 28.4, P < 0.0001

Page 124: COSC 426 Lect. 7: Evaluating AR Applications

Interview Comments“AR’s biggest limit was lack of peripheral vision. The interaction was natural, it was just difficult to see. In the projection condition you could see everything but the interaction was tough”Face to Face

Subjects focused on task space- gestures easy to see gaze difficult- gestures easy to see, gaze difficult

Projection displayInteraction difficult (8/14)

- not mouse-like, invasion of space

AR display – “working solo together”Lack of peripheral cues = “tunnel vision” (10/14 people)Lack of peripheral cues = tunnel vision (10/14 people)

Page 125: COSC 426 Lect. 7: Evaluating AR Applications

Face to Face SummaryCollaboration is partly a Perceptual task

AR reduces perceptual cues -> Impacts collaborationTangible AR metaphor enhances ease of interaction

Users felt that AR collaboration different from FtFBut:

measured speech and gesture behaviors in AR condition is more similar to FtF condition than in Projection display

Thus we need to design AR interfaces that don’t reduce perceptual h l k f cues, while keeping ease of interaction

Page 126: COSC 426 Lect. 7: Evaluating AR Applications

Case Study: A Wearable Information Space

Head Stabilized Body Stabilized

A AR i t f id ti l di d i lAn AR interface provides spatial audio and visual cuesDoes a spatial interface aid performance?

–Task time / accuracyM. Billinghurst, J. BowskilE, Nick DyeE, Jason Morphett (1998). An Evaluation of Wearable Information Spaces. Proc. �Virtual Reality Annual International Symposium.

Page 127: COSC 426 Lect. 7: Evaluating AR Applications

Task PerformanceT k Task

find target icons on 8 pagesremember information space remember information space

Conditions A - head-stabilized pagesA head stabilized pagesB - cylindrical display with trackballC - cylindrical display with head tracking

SubjectsWithin subjects (need fewer subjects)12 subjects used

Page 128: COSC 426 Lect. 7: Evaluating AR Applications

Experimental MeasuresObObjective

spatial ability (pre-test)time to perform task Manytime to perform taskinformation recallworkload (NASA TLX)

Many Different

SubjectivePost Experiment Survey

Measures

- rank conditions (forced choice)- Likert Scale Questions

• “How intuitive was the interface to use?”

Page 129: COSC 426 Lect. 7: Evaluating AR Applications

Post Experiment SurveyFor each of these conditions please answer:For each of these conditions please answer:

1) How easy was it to find the target?1 2 3 4 5 6 71=not very easy 7=very easy

For the head stabilised condition (A):For the head stabilised condition (A):For the cylindrical condition with mouse input (B):For the head tracked condition (C):

Rank all the conditions in order on a scale of one to three 1) Which condition was easiest to find target (1 = easiest, 3 = hardest)

A: B: C:

Page 130: COSC 426 Lect. 7: Evaluating AR Applications

ResultsBody Stabilization Improved PerformanceBody Stabilization Improved Performance

search times significantly faster (One factor ANOVA)

Head Tracking Improved Information recallno difference between trackball and stack case

Head tracking involved more physical work

Page 131: COSC 426 Lect. 7: Evaluating AR Applications

Subjective Impressions

3.54

4.55

1 52

2.53

Find Target

Enjoyable

00.5

11.5

A B C

Subjects Felt Spatialized Conditions (ANOVA):

A B C

j p ( )More enjoyableEasier to find target

Page 132: COSC 426 Lect. 7: Evaluating AR Applications

Subjective Impressions

2

2.5

3

1

1.5

2EasiestUnderstandingIntuitive

0

0.5

A B C

Subject Rankings (Kruskal-Wallis)S ti li d i t th h d t bili dSpatialized easier to use than head stabilizedBody stabilized gave better understandingHead tracking most intuitiveg

Page 133: COSC 426 Lect. 7: Evaluating AR Applications

Conclusions

Page 134: COSC 426 Lect. 7: Evaluating AR Applications

Key Points• There is a need for more user evaluation of AR

experiences• There are several evaluation approaches that can be used

• ‘quick and dirty’q y• usability testing (lab studies)• field studiese stu es• predictive evaluation

• Studies should use multiple qualitative and quantitative • Studies should use multiple qualitative and quantitative experimental measures.

Page 135: COSC 426 Lect. 7: Evaluating AR Applications

Resources

Page 136: COSC 426 Lect. 7: Evaluating AR Applications

Online ResourcesMeta-site for Statistical Analysis

http://home.ubalt.edu/ntsbarsh/stat-data/Topics.htm

Online Statistical Analysishttp://www.quantitativeskills.com/sisa/

Experiment Designhttp://en.wikipedia.org/wiki/Design_of_experimentsp p g g _ _ phttp://www.curiouscat.net/library/designofexperiments.cfm

Page 137: COSC 426 Lect. 7: Evaluating AR Applications

BooksJ. Nielsen "Usability Engineering", Academic Press, 1993. H. Sharp, Y. Rogers, J. Preece. “Interaction Design: Beyond H I i ” J h Wil & S 2007Human-computer Interaction”, John Wiley & Sons, 2007J. Spool, J. Rubin, D. Chisnell. “Handbook of Usability Testing: How to Plan Design and Conduct Effective Tests” John How to Plan, Design, and Conduct Effective Tests , John Wiley & Sons, 2008T Tullis B Albert “Measuring the User Experience: T. Tullis, B. Albert. Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics”, Morgan Kaufmann , 2008gA. Field, G. Hole. “How to Design and Report Experiments”, Sage Publications Ltd, 2003