cosc 426 lect. 7: evaluating ar applications

Lecture 7: Evaluating AR Lecture 7: Evaluating AR Applicationspp

Mark BillinghurstgHIT Lab NZ

University of Canterbury University of Canterbury

B ildi C lli AR E iBuilding Compelling AR Experiences

experiencesEvaluation

applications Interaction

tools Authoringtools Authoring

components Tracking, Display

Sony CSL © 2004

Introduction

The Interaction Design Process

Why Evaluate AR Applications?To test and compare interfaces, new technologies, interaction techniquesTest Usability (learnability, efficiency, satisfaction,...)Get user feedbackGet user feedbackRefine interface designB tt d t d d Better undertsand your end users...

Survey of AR PapersEdward Swan (2005)Edward Swan (2005)Surveyed major conference/journals (1992-2004)

P ISMAR ISWC IEEE VR- Presence, ISMAR, ISWC, IEEE VRSummary

1104 t t l 1104 total papers266 AR papers38 AR HCI papers (Interaction)38 AR HCI papers (Interaction)21 AR user studies

O l 21 f 266 AR h d f l t d Only 21 from 266 AR papers had a formal user study Less than 8% of all AR papers

AR Papers

HIT Lab NZ Usability SurveyA Survey of Evaluation Techniques Used in Augmented Reality Studies

Andreas Dünser, Raphaël Grasset, Mark pBillinghurst

reviewed publications from 1993 reviewed publications from 1993 and 2007

Extracted 6071 papers which mentioned p p“Augmented Reality”Searched to find 165 AR papers with User StudiesStudies

350

400

450

200

250

300

50

100

150

01992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007

ACM Digital Library SpringerLinkIEEE Xplore Journals ScienceDirectSPIE Digital Library InformaWorldMIT Press Journals HighwireBlackwell Synergy Mary Ann LiebertWiley Interscience Sage Journals OnlineEmerald Insight Oxford JournalsCambridge Journals Online ASCE PublicationsJSTOR KargerWorldSciNet BioMed Central ASME Annual ReviewsNature Online MathSciNetNational Research Council of Canada Research Press (NRC) AdisOnline APS Journals (PROLA) Royal Society Publishing

Types of User Studies

Types of AR user studiesPerceptionpUser PerformanceCollaborationCollaborationUsability of Complete Systems

Types of AR User Studies

Types of Experimental Measures UsedTypes of Experimental Measures

Objective measuresSubjective measuresQualitative analysisU b l l hUsability evaluation techniquesInformal evaluations

Types of Experimental Measures Used

Summary

Over last 10 yearsMost user studies focused on user performancepFewest user studies on collaborationObjective performance measures most usedObjective performance measures most usedQualitative and usability measures least used

Types of User Evaluation

What is evaluation?

Evaluation is concerned with gathering data about the usability of a design or data about the usability of a design or product by a specified group of users for a particular activity within a specified environment or work contextenvironment or work context

EvaluationGoal: Measure goodness of the application designTwo types:yp

Formative evaluation performed at different stages of development to check that the product meets users’ needs.Summative evaluation assesses the quality of a finished product.

F i F i E l iFocusing on Formative Evaluation

When to evaluate?Once the application has been developed

pros : rapid development, small evaluation costcons : rectifying problems

During design and development

design implementation evaluation redesign &reimplementation

During design and developmentpros : find and rectify problems earlycons : higher evaluation cost, longer developmentcons : higher evaluation cost, longer development

design implementation

Four evaluation paradigms

‘quick and dirty’q yusability testing (lab studies)field studiesfield studiespredictive evaluation

Quick and dirty

‘quick & dirty’ evaluation: informal feedback from users or consultants to confirm that their ideas are in-users or consultants to confirm that their ideas are inline with users’ needs and are liked.Quick & dirty evaluations are done any timeQuick & dirty evaluations are done any time.Emphasis is on fast input to the design process rather th f ll d t d fi dithan carefully documented findings.

Usability TestingRecording typical users’ performance on typical tasks in controlled settings. Field observations may be used.g yAs the users perform these tasks they are watched & recorded on video & their inputs are logged. This data is used to calculate performance times, errors & help explain why the users did what they did. User satisfaction questionnaires & interviews are used to elicit users’ opinions.

Laboratory-based StudiesLaboratory-based studies

can be used for evaluating the design or the can be used for evaluating the design, or the implemented systemare carried out in an interruption-free usability labare carried out in an interruption-free usability labcan accurately record some work situations

di l ibl i l b isome studies are only possible in a lab environmentsome tasks can be adequately performed in a labare useful for comparing different designs in a controlled context

Laboratory-based Studies

Controlled, instrumented environment

Field StudiesField studies are done in natural settingsThe aim is to understand what users do naturally and yhow technology impacts them.In product design field studies can be used to:In product design, field studies can be used to:- identify opportunities for new technology- determine design requirements determine design requirements - decide how to introduce new technology- evaluate technology in useevaluate technology in use.

Predictive EvaluationExperts apply their knowledge of typical users, guided by heuristics, to predict usability problems. guided by heuristics, to predict usability problems. Can involve theoretically based models. A k f t f di ti l ti i th t l A key feature of predictive evaluation is that real end users need not be presentRelatively quick and inexpensive

Characteristics of ApproachesUsability testing

Field studies Predictive

U d k l l dUsers do task natural not involved

Location controlled natural anywhere

When prototype early prototype

Data quantitative qualitative problemsData quantitative qualitative problems

Feed back measures & errors

descriptions problemserrors

Type applied naturalistic expert

Evaluation Approaches and MethodsMethod Usability

testingField studies Predictive

Ob iObserving x x

Asking users x x

Asking experts

x xexpertsTesting x

Modeling x

DECIDE: A framework to guide evaluationA framework to guide evaluation

- Determine the goals the evaluation addresses.Determine the goals the evaluation addresses.- Explore the specific questions to be answered.

Ch th l ti p di d t h i- Choose the evaluation paradigm and techniques- Identify the practical issues.- Decide how to deal with the ethical issues.- Evaluate, interpret and present the data.Evaluate, interpret and present the data.

DECIDE FrameworkD G l Determine Goals:

What are the high-level goals of the evaluation?How wants the evaluation and why?How wants the evaluation and why?

Explore the Questions:Create well defined, relevant questionsq

Choose the Evaluation ParadigmInfluences the techniques used, how data is analyzed

Identify Practical IssuesHow to select users, stay on budget & scheduleHow to find evaluators select equipmentHow to find evaluators, select equipment

DECIDE FrameworkDecide on Ethical IssuesDecide on Ethical Issues

Informed consent formParticipants have a right to:

k th l f th t d d h t ill h t th fi di- know the goals of the study and what will happen to the findings- privacy of personal information

Evaluate, Interpret and Present Data, p

- Reliability: can the study be replicated?- Validity: is it measuring what you thought?y g y g- Biases: is the process creating biases?- Scope: can the findings be generalized?

E l i l lidit i th i t i fl i th lt ? - Ecological validity: is the environment influencing the results?

Usability Testing

Pilot StudiesA small trial run of the main study.

Can identify majority of issues with interface designCan identify majority of issues with interface designPilot studies check:- that the evaluation plan is viablep- you can conduct the procedure- that interview scripts, questionnaires, experiments, etc. work appropriatelyIron out problems before doing the main study.

Controlled experimentsDesigner of a controlled experiment should carefully consider

proposed hypothesisselected subjectsmeasured variablesexperimental methodsd ll idata collectiondata analysis

V i blVariablesExperiments manipulate and measure variables under Experiments manipulate and measure variables under controlled conditionsThere are two types of variables

independent: variables that are manipulated to create different experimental conditions

- e.g. number of items in menus, colour of the icons

dependent: variables that are measured to find out the effects of changing the independent variables

- e.g. speed of menu selection, speed of locating icons

Test ConditionsThe levels, values, or settings for an independent variableE lExample

- test conditions: HMD, Handheld device 1, Handheld device 2

“Other” VariablesControl variables

e.g. room light, noise…if controlled => less external validity

Random variables (not controlled)e.g. fatiguemore influence of random variable => less internal validity

Confounding variables practicepprevious experience

HypothesisA hypothesis is a prediction of the outcome

what will happen to the dependent variables when the independent variables are changedto show that the prediction is right

d d t i bl d ’t h b h i - dependant variables don’t change by changing the independent variables

- rejecting the null hypothesis (H0 )j g yp ( 0 )

Experimental methodsIt is important to select the right experimental method so that the results of the experiment can be generalizedThere are mainly two experimental methodsy p

between-groups: each subject is assigned to one experimental conditionwithin-groups: each subject performs under all the different conditions

Experimental methodsBetween-groups Within-groups

Subjects

g p

Subjects

g p

Randomlyassigned

Randomlyassigned

erim

enta

l tas

k

Condition2

Condition3

Condition1

rimen

tal t

asks

Condition2

Condition1

rimen

tal t

asks

Condition1

Condition2

rimen

tal t

asks

Condition1

Condition3

Expe

data data data data data data

Expe

r

Condition3 Ex

per

Condition3 Ex

per

Condition2

Statistical data analysis

data data data

Statistical data analysis

data data data

Within vs. Between Subjectsbetween subjects design

each participant is tested on only one level/conditiona separate group of participants is used for each condition

- one group uses HMD other group uses Handheld device

within subjects designparticipant is tested on each level/conditionparticipant is tested on each level/condition

- e.g. participants use Handheld device and HMD

repeated measurement

Between SubjectsSometimes a factor must be between subjects

e.g. gender, age, experience

Between subjects advantage: avoids interference effects (e.g. practice / learning effect)

Between subjects disadvantage:Increased variability = need more subjectsy j

Important: randomised assignment to conditions

Within SubjectsSometimes a factor must be within subjects

e.g. measuring learning effects

Within subjects advantagesless participants needed (all participants in all conditions)p p ( p p )differences (variability) between subjects the same across test conditions

Counterbalance order of presenting conditions A => B => C B => C => A C => A => BA B C B C A C A B

The order is best governed by a Latin Square

Latin Square Designeach condition occurs once in each row and column

Note: In a balanced Latin Square each condition both d d f ll h th diti l precedes and follows each other condition an equal

number of times

SubjectsTh h f b l h l d f h The choice of subjects is critical to the validity of the results of an experiment

bj t h ld b t ti f th subjects group should be representative of the expected user population

In selecting the subjects it is important to considerIn selecting the subjects it is important to considerthings such as their

age group, education, skills, cultureg g pHow does the sample influence the results?

Report the selection criteria and give relevant demographic information in your publication

SubjectsH ?How many participants?

How big is the effect you want to measure?l ff b d d i h ll l- large effects can be detected with smaller samples

- e.g. small n needed to discriminate speed between turtles and a rabbits

The more participants the “smoother” the datap p- Central Limit Theorem - as n increases (n>30) the sample mean

approaches a normal distributionextreme data has less influence (e g one sleepy participants does not - extreme data has less influence (e.g. one sleepy participants does not mess up the results that much)

for quantitative analysis: rule of thumb MINIMUM q y15-20 or more per group/cell

Data Collection and Analysis

The choice of a method is dependent on the type of d h d b ll ddata that needs to be collectedIn order to test a hypothesis the data has to be

l d l h danalysed using a statistical methodThe choice of a statistical method depends on the type of collected data

All the decisions about an experiment should be made before it is carried out

Observe and MeasureObservations are gathered…

manually (human observers)automatically (computers, software, cameras, sensors, etc.)

A measurement is a recorded observationObjective metricsjSubjective metrics

Typical objective metricsk l i itask completion time

errors (number, percent,…)percent of task completedpercent of task completedratio of successes to failuresnumber of repetitionsnumber of repetitionsnumber of commands usednumber of failed commandsnumber of failed commandsphysiological data (heart rate,…)…

Typical subjective metricsuser satisfactionsubjective performancej pratingsease of useease of useintuitivenessjudgments…

Data TypesSubjectiveSubjective

Subjective survey- Likert Scale, condition rankings

How easy was the task

1 2 3 4 5Observations

- Think Aloud

Interview responses

1 2 3 4 5Not very easy Very easy

Interview responses

ObjectivePerformance measurese o a ce easu es

- Time, accuracy, errors

Process measuresVid / di l i- Video/audio analysis

E erimental Meas resExperimental MeasuresMeasure What does it tell us? How is it measured?

Timings Performance Via a stopwatch, orautomatically by the device.

Errors Performance, Particular sticking points in a task By success in completing the task correctly. Through experimentercorrectly. Through experimenter observation, examining the route walked.

Perceived Workload Effort invested. User satisfaction Through NASA TLX scales and other i iquestionnaires.

Distance traveled and route taken

Depending on the application, these can be used to pinpoint errors and to indicate performance

Using a pedometer, GPS or other location-sensing system. By experimenter observation.

Percentage preferred walking speed

Performance By finding average walking speed, which is compared with normal walking speed.

Comfort User satisfaction. Device acceptability Comfort Rating Scale and other questionnaires.

User comments and preferences

User satisfaction and preferences. Particular sticking points in a task.

Through questionnaires, interviews and think-alouds.preferences sticking points in a task. think alouds.

Experimenter observations Different aspects, depending on the experimenter and on the observations

Through observation and note-taking

Statistical AnalysisOnce data is collected statistics can be used for analysisTypical Statistical Techniquesyp q

Comparing between two results- Unpaired T-Test (for between subjects – assumes normal distribution, interval

l h it f i )scale, homogeneity of variances)- Paired T-Test (for within subjects – assumes normal distribution, etc.)- Mann–Whitney U-test (between subjects – if assumptions are not met)

Comparing between > two results- Analysis of Variance – ANOVA

F ll d b t h l i B f i dj t t- Followed by post-hoc analysis – Bonferroni adjustment- Kruskal–Wallis (does not assume normal distribution)

Running the studyOffl d B !Offload your Brain!

Write down instructions h kli tprepare checklists

create templatesprint and pitch important informationprint and pitch important information

Try and find an assistantPrint questionnaires and other Print questionnaires and other documents the day beforeRehearse procedures - 4 kg in 2 weeksRehearse procedures. Bring your lunch – don’t forget to eat

4 kg in 2 weeks

Running the studyTreat the participants nicelyPrepare candy and drinks and make them feel good. p y gTake the role of a friendly waiter:

Always stay in background but offer assistance if needed.Always stay in background but offer assistance if needed.

Take notes, document oddities.Nothing is as bad as lost data!! Nothing is as bad as lost data!!

AVOID AVOID AVOID

Running the studyTake many photos of your setup in action. Prepare consent forms if you want to use pictures p y pfor publications.

Field Studies

F ld S dField StudiesField studies are done in natural settingsField studies are done in natural settings.“in the wild” is a term for prototypes being used freely in natural settingsfreely in natural settings.Aim to understand what users do naturally and how technology impacts themtechnology impacts them.Field studies are used in product design to:- identify opportunities for new technology;- identify opportunities for new technology;- determine design requirements; - decide how best to introduce new technology;gy;- evaluate technology in use.

59 www.id-book.com

ObservationDi b i i h fi ldDirect observation in the field

Structuring frameworksDegree of participation (insider or outsider)Degree of participation (insider or outsider)Ethnography

Direct observation in controlled environmentsDirect observation in controlled environmentsIndirect observation: tracking users’ activities

DiariesInteraction logging

Ethnography• Ethnography is a philosophy with a set of techniques that

include participant observation and interviews• Ethnographers immerse themselves in the culture studied

• Need cooperation of people being studied

• A researcher’s degree of participation can vary along a scale from ‘outside’ to ‘inside’A l i id d d l b i i• Analyzing video and data logs can be time-consuming• Can use continuous data analysis

• Collections of comments incidents and artifacts are made • Collections of comments, incidents, and artifacts are made

Direct observation in a controlled settingg

Think-aloud technique

Indirect observation

DiariesInteraction logsCultural probesCultural probes

Structuring frameworks to guide observation- The person. Who? - The place. Where?p- The thing. What?

The Goetz and LeCompte (1984) framework:The Goetz and LeCompte (1984) framework:- Who is present? - What is their role? - What is happening? - Where is it happening? - Why is it happening? - How is the activity organized?

Predictive Evaluation

Predictive ModelsProvide a way of evaluating products or designs without directly involving users.without directly involving users.Less expensive than user testing.Usefulness limited to systems with predictable tasksUsefulness limited to systems with predictable tasks

e.g., telephone answering systems, mobiles, etc.

Based on expert error-free behaviorBased on expert error-free behavior.

Fitts’ Law (Fitts, 1954)

Fitts’ Law predicts that the time to point at an object using a device is a function of the distance from the target using a device is a function of the distance from the target object and the object’s size. The further away and the smaller the object, the longer h l d the time to locate it and point to it.

GOMS ModelG l h h hi fi d Goals - the state the user wants to achieve e.g., find a website.Operators - the cognitive processes and physical actions Operators - the cognitive processes and physical actions needed to attain the goals

Eg moving mouse to select icong gMethods - the procedures for accomplishing the goals, e.g., drag mouse over icon, click on button.Selection rules - decide which method to select when there is more than one.

GOMS Response Times (Card et al., 1983)

Operator Description Time (sec)K Pressing a single key or buttong g y

Average skilled typist (55 wpm)Average non-skilled typist (40 wpm)Pressing shift or control keyTypist unfamiliarwiththekeyboard

0.220.280.08120Typist unfamiliar with the keyboard 1.20

P Pointing with a mouse or other device on adisplay to select an object.This value is derived from Fitts’ Law which is

0.40

P1discussed below.Clicking the mouse or similar device 0.20

H Bring ‘home’ hands on the keyboard or otherdevice

0.40device

M Mentally prepare/respond 1.35R(t) The response time is counted only if it causes

the user to wait.t

Expert InspectionsSeveral kindsExperts use their knowledge of users and technology to review application usability.Expert critiques can be formal or informal reports.H i ti l ti i i id d b t f h i tiHeuristic evaluation is a review guided by a set of heuristics

Eg: Visibility of system status Jacob Nielsen’s heuristics (1990s)Jacob Nielsen s heuristics (1990s)

Walkthroughs involve stepping through a pre-planned scenario noting potential problems

Eg load AR model, scale it twice the size, add new model, etc

Nielsen’s heuristicsVisibility of system statusVisibility of system status.Match between system and real world.User control and freedomUser control and freedom.Consistency and standards.E Error prevention. Recognition rather than recall.Flexibility and efficiency of use.Aesthetic and minimalist design.gHelp users recognize, diagnose, recover from errors.Help and documentation.Help and documentation.

Three Stages for Doing Heuristic Evaluation

1/ Briefing session to tell experts what to do.2/ E l i i d f 1 2 h i hi h2/ Evaluation period of 1-2 hours in which:

Each expert works separately;Take one pass to get a feel for the product;Take one pass to get a feel for the product;Take a second pass to focus on specific features.

3/ Debriefing session in which experts work together to 3/ Debriefing session in which experts work together to prioritize problems.

No. of evaluators & problems

Advantages and ProblemsFew ethical and practical issues to consider because users not involved.Can be difficult and expensive to find experts.Best experts have knowledge of application domain and users.Biggest problems:

Important problems may get missed;Important problems may get missed;Many trivial problems are often identified;Experts have biases.

Case Studies

Types of AR ExperimentsPerception

How is virtual content perceived ?pWhat perceptual cues are most important ?

InteractionHow can users interact with virtual content ?Which interaction techniques are most efficient ?

CollaborationHow is collaboration in AR interface different ?Which collaborative cues can be conveyed best ?

PerceptionCentral goal of AR systems is to fool the human perceptual

systemDi l M dDisplay Modes

Direct ViewStereo VideoStereo VideoStereo graphics

Multi-modal displayMulti-modal displayDifferent objects with different display modesPotential for depth cue conflictp

Perceptual User StudiesDepth / Distance Studies

Estimate distance to objectJudge relative proximity

Object localizationjMatch physical and virtual object positions

Diffi ltiDifficultiesPrecise alignment / calibration of displaysL i h d t ki ( t ti i )Lag in head tracking (use static images)

Layar – www.layar.com

Outdoor AR: Limited Field of View

P ibl l iPossible solutionsOverview + Detail

spatial separation; two views

Focus + Contextmerges both views into one view

Zoomingtemporal separation

Z i ViTU G HIT L b NZ ll b ti

Zooming ViewsTU Graz – HIT Lab NZ - collaboration

Zooming panoramaZ i MZooming Map

Z i AR i fZooming AR interfaces

Context Compass Zooming Panorama Zooming MapContext Compass Zooming Panorama Zooming Map

Interface TypesC (C)Compass (C)Compass + Zooming Panorama (CP)Compass + Zooming Map (CM)p g p ( )Compass, Zooming Panorama, Zooming Map (CPM)

Experiment Evaluation

20 subjects (10 M/ 10 F)Café finding taskg

Task 1: Find particular café named “Alpha”Task 2: Find closest café

Experiment measuresTime to complete taskTime to complete taskAngular distance panned aroundSubjective survey feedback j y

Performance Time

Distance Panned

ResultsCompass good for search, but not comparisonZooming (P or M) aids comparison g ( ) pInformation has significant effectCompass requires more panningCompass requires more panningUser felt compass alone wasn’t useful

Interaction StudiesStages of Interface Development• Prototype Demonstration• Adoption of Interaction techniques from other interface

metaphors • Development of new interface metaphors appropriate to

the medium• Development of formal theoretical models for predicting

and modeling user interactions

Fitt’s Law (1964)Relates Movement Time to Index of Difficulty

MT = a + b log2(2A/W)

where log2(2A/W) = ID

Robust under most circumstancesobject tracking tapping tasks movement tasksobject tracking, tapping tasks, movement tasks

Interaction Study - ReachingMason, A. et. al. (2001). Reaching Movements to Augmented and Graphic Objects in Virtual Environments. Proc. CHI 2001.

D Fitt’ L h ld i i iti t k?Does Fitt’s Law hold in an acquisition task?Does Fitt’s Law hold when reaching for virtual objects ?D F ’ L h ld h ’ h d ?Does Fitt’s Law hold when you can’t see your hand ?

Experimental SetupEnhanced Virtual Hand LabHalf Silvered MirrorShutter GlassesOPTOTRAK optical trackerp

IREDs worn on wrist, object

Four target cubesg

Conditions:Cube size arm visibility real/virtual objectsCube size, arm visibility, real/virtual objects

Kinematic MeasuresMovement TimePeak Velocity of WristyTime to Peak Velocity of the WristPercent Time from Peak Velocity of the WristPercent Time from Peak Velocity of the Wrist

Results – Movement Time

Results – Velocity Profiles

AR NavigationMany commercial AR browsers

Information in placeHow to navigate to POI

2D vs. AR Navigation?

VS

AR Navigation StudyUsers navigate between Points of InterestUsers navigate between Points of InterestThree conditions

AR U i l AR iAR: Using only an AR view2D-map: Using only a top down 2D map viewAR+2D-map: Using both an AR and 2D map view

Experiment MeasuresQuantitative

- Time taken, Distance travelled

Qualitative - Experimenter observations, Navigation behavior, Interviews

U kl d (NASA TLX)- User surveys, workload (NASA TLX)

HIT Lab NZ Test Platform – AR View

HIT Lab NZ Platform – Map View

Distance and Time

No significant differences

Paths Travelled

Red – ARBlue – AR + MapBlue AR MapYellow - Map

Navigation BehaviourD d i t fDepends on interface

Map doesn’t show short cutscuts

Survey Responses

User CommentsAR“ d ' k l h ll f h i ”“you don't know exactly where you are all of the time.”“using AR I found it difficult to see where I was going”

MMap“you were able to get a sense of where you were”“ t ll bl t th ph i l bj t d ”“you are actually able to see the physical objects around you”

AR+MAP“I d th p t th b i i t d t d h th “I used the map at the beginning to understand where the buildings were and the AR between each point”“You can choose a direction with AR and find the shortest way You can choose a direction with AR and find the shortest way using the map.”

Usability IssuesScreen readability in sunlightGPS inaccuracies Compass errorsTouch screen difficultiesTouch screen difficultiesNo routing information

Lessons LearnedUser adapt navigation behaviour to guide type

AR interface shows shortcutsMap interface good for planning

Include map view in AR interface2D exocentric, and 3D egocentric

Allow people to easily change between viewsp p y gMay use Map far away, AR close

Difficult to accurately show depthy p

Collaboration StudiesRemote Conferencing

Face to Face Collaboration

Remote AR Conferencing

Moves conferencing from the desktop to the workspace

Pilot StudyHow does AR conferencing differ ?

Taskdiscussing images12 pairs of subjects

Conditionsaudio only (AC)y ( )video conferencing (VC)mixed reality conferencing (MR)

Sample Transcript

Transcript Analysis

Users speak most in Audio Only conditionMR fewest words/min and interruptions/minMore results needed

Presence and CommunicationPresence Rating (0-100)

8090

100

40506070

0102030

Could tell when Partner was Concentrating14

AC VC MR

8

10

12

0

2

4

6

AC VC MR

Subjective CommentsPaid more attention to pictures Remote video provided peripheral cuesRemote video provided peripheral cuesIn AR condition

Difficult to see everythingDifficult to see everythingRemote user distractingCommunication asymmetriesCommunication asymmetries

Face to Face CollaborationCompare two person collaboration in:

Face to Face, AR, Projection Display

TaskUrban design logic puzzleUrban design logic puzzle

- Arrange 9 building to satisfy 10 rules in 7 minutes

SubjectsSubjectsWithin subjects study (counter-balanced)12 pairs of college students12 pairs of college students

Face to Face Condition

Moving Model Buildings

AR Condition

Cards with AR ModelsCards with AR ModelsSVGA AR Display (800x600)Video see-through ARg

Projection Condition

Tracked Input Devices

Task Space Separation

Interface ConditionsFtF AR Projection

User Viewpoint Independent Private PublicpEasy to change Independent

Easy to changeCommonDifficult to change

Limited FOV

Interaction Two handedNatural object manipulation

Two handedTangible AR techniques

Mouse-basedOne-handedTime-multiplexedmanipulation

Space-multiplexedtechniquesSpace-multiplexed

Time-multiplexed

Hypothesis

Collaboration with AR technology will produce behaviors that are more like natural face-to-face collaboration than from using a screen-face collaboration than from using a screen

based interface.

MetricsSubjective

Evaluative survey after each conditionEvaluative survey after each conditionForced-choice survey after all conditionsPost experiment interviewPost experiment interview

ObjectivejCommunication measures

- Video transcriptionp

Measured ResultsPerformance

AR collaboration slower than FtF + Projectionj

CommunicationPointing/Picking gesture behaviors same in AR as FtFPointing/Picking gesture behaviors same in AR as FtFDeictic speech patterns same in AR as FtF

- Both significantly different than Projection conditiong y j

SubjectiveFtF easier to work together and understandFtF easier to work together and understandInteraction in AR easier than Proj. and same as FtF

Deictic Expressions

25%

30%

15%

20%

5%

10%

0%FtF Proj AR

Significant difference – ANOVA, F(2,33) = 5.77, P < 0.01No difference between FtF and AR

Ease of Interaction

S f d ffSignificant differencePick - F(2,69) = 37.8, P < 0.0001Move - F(2,69) = 28.4, P < 0.0001

Interview Comments“AR’s biggest limit was lack of peripheral vision. The interaction was natural, it was just difficult to see. In the projection condition you could see everything but the interaction was tough”Face to Face

Subjects focused on task space- gestures easy to see gaze difficult- gestures easy to see, gaze difficult

Projection displayInteraction difficult (8/14)

- not mouse-like, invasion of space

AR display – “working solo together”Lack of peripheral cues = “tunnel vision” (10/14 people)Lack of peripheral cues = tunnel vision (10/14 people)

Face to Face SummaryCollaboration is partly a Perceptual task

AR reduces perceptual cues -> Impacts collaborationTangible AR metaphor enhances ease of interaction

Users felt that AR collaboration different from FtFBut:

measured speech and gesture behaviors in AR condition is more similar to FtF condition than in Projection display

Thus we need to design AR interfaces that don’t reduce perceptual h l k f cues, while keeping ease of interaction

Case Study: A Wearable Information Space

Head Stabilized Body Stabilized

A AR i t f id ti l di d i lAn AR interface provides spatial audio and visual cuesDoes a spatial interface aid performance?

–Task time / accuracyM. Billinghurst, J. BowskilE, Nick DyeE, Jason Morphett (1998). An Evaluation of Wearable Information Spaces. Proc. �Virtual Reality Annual International Symposium.

Task PerformanceT k Task

find target icons on 8 pagesremember information space remember information space

Conditions A - head-stabilized pagesA head stabilized pagesB - cylindrical display with trackballC - cylindrical display with head tracking

SubjectsWithin subjects (need fewer subjects)12 subjects used

Experimental MeasuresObObjective

spatial ability (pre-test)time to perform task Manytime to perform taskinformation recallworkload (NASA TLX)

Many Different

SubjectivePost Experiment Survey

Measures

- rank conditions (forced choice)- Likert Scale Questions

• “How intuitive was the interface to use?”

Post Experiment SurveyFor each of these conditions please answer:For each of these conditions please answer:

1) How easy was it to find the target?1 2 3 4 5 6 71=not very easy 7=very easy

For the head stabilised condition (A):For the head stabilised condition (A):For the cylindrical condition with mouse input (B):For the head tracked condition (C):

Rank all the conditions in order on a scale of one to three 1) Which condition was easiest to find target (1 = easiest, 3 = hardest)

A: B: C:

ResultsBody Stabilization Improved PerformanceBody Stabilization Improved Performance

search times significantly faster (One factor ANOVA)

Head Tracking Improved Information recallno difference between trackball and stack case

Head tracking involved more physical work

Subjective Impressions

3.54

4.55

1 52

2.53

Find Target

Enjoyable

00.5

11.5

A B C

Subjects Felt Spatialized Conditions (ANOVA):

A B C

j p ( )More enjoyableEasier to find target

Subjective Impressions

2

2.5

3

1

1.5

2EasiestUnderstandingIntuitive

0

0.5

A B C

Subject Rankings (Kruskal-Wallis)S ti li d i t th h d t bili dSpatialized easier to use than head stabilizedBody stabilized gave better understandingHead tracking most intuitiveg

Conclusions

Key Points• There is a need for more user evaluation of AR

experiences• There are several evaluation approaches that can be used

• ‘quick and dirty’q y• usability testing (lab studies)• field studiese stu es• predictive evaluation

• Studies should use multiple qualitative and quantitative • Studies should use multiple qualitative and quantitative experimental measures.

Resources

Online ResourcesMeta-site for Statistical Analysis

http://home.ubalt.edu/ntsbarsh/stat-data/Topics.htm

Online Statistical Analysishttp://www.quantitativeskills.com/sisa/

Experiment Designhttp://en.wikipedia.org/wiki/Design_of_experimentsp p g g _ _ phttp://www.curiouscat.net/library/designofexperiments.cfm

BooksJ. Nielsen "Usability Engineering", Academic Press, 1993. H. Sharp, Y. Rogers, J. Preece. “Interaction Design: Beyond H I i ” J h Wil & S 2007Human-computer Interaction”, John Wiley & Sons, 2007J. Spool, J. Rubin, D. Chisnell. “Handbook of Usability Testing: How to Plan Design and Conduct Effective Tests” John How to Plan, Design, and Conduct Effective Tests , John Wiley & Sons, 2008T Tullis B Albert “Measuring the User Experience: T. Tullis, B. Albert. Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics”, Morgan Kaufmann , 2008gA. Field, G. Hole. “How to Design and Report Experiments”, Sage Publications Ltd, 2003

cosc 426 lect. 7: evaluating ar applications

Technology