human performance measures for interactive haptic-audio-visual interfaces

12
Human Performance Measures for Interactive Haptic-Audio-Visual Interfaces Dawei Jia, Member, IEEE, Asim Bhatti, Member, IEEE, Saeid Nahavandi, Senior Member, IEEE, and Ben Horan, Member, IEEE Abstract—Virtual reality and simulation are becoming increasingly important in modern society and it is essential to improve our understanding of system usability and efficacy from the users’ perspective. This paper introduces a novel evaluation method designed to assess human user capability when undertaking technical and procedural training using virtual training systems. The evaluation method falls under the user-centered design and evaluation paradigm and draws on theories of cognitive, skill-based and affective learning outcomes. The method focuses on user interaction with haptic-audio-visual interfaces and the complexities related to variability in users’ performance, and the adoption and acceptance of the technologies. A large scale user study focusing on object assembly training tasks involving selecting, rotating, releasing, inserting, and manipulating three-dimensional objects was performed. The study demonstrated the advantages of the method in obtaining valuable multimodal information for accurate and comprehensive evaluation of virtual training system efficacy. The study investigated how well users learn, perform, adapt to, and perceive the virtual training. The results of the study revealed valuable aspects of the design and evaluation of virtual training systems contributing to an improved understanding of more usable virtual training systems. Index Terms—Human haptics, user-centered design, virtual training evaluation, ergonomics, user interaction Ç 1 INTRODUCTION V IRTUAL training (VT) systems are advanced virtual reality systems capable of training the human user through interaction with their haptic, auditory, and visual modalities. Such systems offer many potential advantages over conventional training practices, including reduction in cost, group training, and the ability to train offline without the need for physical apparatus and equipment. VT systems have been adopted across many domains including medicine, biotechnology, automotive, and aerospace [1], [2]. In the manufacturing industry, VT systems offer significant benefits and a number of systems have been developed and adopted for technical and procedural training [3], [4]. Despite ongoing advancements and increasing uptake of VT systems, there are several limitations which need to be overcome. Realistic replication of real-world user- environment interaction, accurate simulation of multi- sensory feedback, and the design of effective and usable VT systems remain significant research challenges. Given the limitations in interactivity and immersion inherent to any VT system [5], [6], it is widely acknowledged that skills development, and procedural and cognitive learning are limited. While system designers and developers remain focused on high-tech multimodal multisensory interfaces, there is an increasing awareness that to produce usable and useful interfaces, it is essential to consider a system design from the users’ perspective. In recent years, user- centered system design and evaluation methods have become an increasingly important tool for achieving improved usability for virtual environments (VEs). In this paper, a VE is distinguished from a VT system where the former represents a computer-generated three- dimensional environment and is a subset or component of the VT system that utilizes virtual reality technologies to provide haptic, audio and visual feedback simultaneously. To better understand the benefits of VT systems in facilitating procedural skill learning, this paper introduces a novel method for measuring human performance when using a haptic-audio-visual (HAV) interface. The method considers factors related to human perception, perfor- mance, and memory as well as interaction and learning. It combines the efficacy factors, criteria, metrics, and data mentioned in various models and studies [7], [8], [9], [10] for VE efficacy, and defines them and their interrelations in a consistent way. Haptic cues can enable users to gather information to navigate and control objects in the synthetic environment [11]. Hale et al. [12] found that when paired with head tracking, haptic/tactile cues enhance performance, and suggested that the multimodal capacity of VE training systems may advance the knowledge, skills, and attitudes of trainees. Similarly, Bicchi et al. [13] found that when using haptic interfaces, cognitive and physical interaction heavily depend on each other [14]. The authors explained that “physical interaction can help in setting rules for cognitive evaluation of the environment, while cognitive 46 IEEE TRANSACTIONS ON HAPTICS, VOL. 6, NO. 1, JANUARY-MARCH 2013 . D. Jia is with the Centre for Intelligent Systems Research, Deakin University, Geelong Waurn Ponds Campus, Locked Bag 20000, Geelong, VIC 3220. E-mail: [email protected]. . A. Bhatti and S. Nahavandi are with the Centre for Intelligent Systems Research (CISR), Deakin University, Geelong Waurn Ponds Campus, Locked Bag 20000, Geelong, VIC 3220. E-mail: {Asim.bhatti, Saeid.nahavandi}@deakin.edu.au. . B. Horan is with the School of Engineering, Deakin University, Pigdons Road, Geelong, Victoria 3217, Australia. E-mail: [email protected]. Manuscript received 9 Sept. 2011; revised 28 June 2012; accepted 30 June 2012; published online 20 July 2012. Recommended for acceptance by V. Hayward. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TH-2011-09-0069. Digital Object Identifier no. 10.1109/ToH.2012.41. 1939-1412/13/$31.00 ß 2013 IEEE Published by the IEEE CS, RAS, & CES

Upload: b

Post on 29-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Human Performance Measures for Interactive Haptic-Audio-Visual Interfaces

Human Performance Measures for InteractiveHaptic-Audio-Visual Interfaces

Dawei Jia, Member, IEEE, Asim Bhatti, Member, IEEE,

Saeid Nahavandi, Senior Member, IEEE, and Ben Horan, Member, IEEE

Abstract—Virtual reality and simulation are becoming increasingly important in modern society and it is essential to improve our

understanding of system usability and efficacy from the users’ perspective. This paper introduces a novel evaluation method designed

to assess human user capability when undertaking technical and procedural training using virtual training systems. The evaluation

method falls under the user-centered design and evaluation paradigm and draws on theories of cognitive, skill-based and affective

learning outcomes. The method focuses on user interaction with haptic-audio-visual interfaces and the complexities related to

variability in users’ performance, and the adoption and acceptance of the technologies. A large scale user study focusing on object

assembly training tasks involving selecting, rotating, releasing, inserting, and manipulating three-dimensional objects was performed.

The study demonstrated the advantages of the method in obtaining valuable multimodal information for accurate and comprehensive

evaluation of virtual training system efficacy. The study investigated how well users learn, perform, adapt to, and perceive the virtual

training. The results of the study revealed valuable aspects of the design and evaluation of virtual training systems contributing to an

improved understanding of more usable virtual training systems.

Index Terms—Human haptics, user-centered design, virtual training evaluation, ergonomics, user interaction

Ç

1 INTRODUCTION

VIRTUAL training (VT) systems are advanced virtualreality systems capable of training the human user

through interaction with their haptic, auditory, and visualmodalities. Such systems offer many potential advantagesover conventional training practices, including reduction incost, group training, and the ability to train offline withoutthe need for physical apparatus and equipment. VT systemshave been adopted across many domains includingmedicine, biotechnology, automotive, and aerospace [1],[2]. In the manufacturing industry, VT systems offersignificant benefits and a number of systems have beendeveloped and adopted for technical and proceduraltraining [3], [4].

Despite ongoing advancements and increasing uptakeof VT systems, there are several limitations which need tobe overcome. Realistic replication of real-world user-environment interaction, accurate simulation of multi-sensory feedback, and the design of effective and usableVT systems remain significant research challenges. Giventhe limitations in interactivity and immersion inherent to

any VT system [5], [6], it is widely acknowledged thatskills development, and procedural and cognitive learningare limited. While system designers and developers remainfocused on high-tech multimodal multisensory interfaces,there is an increasing awareness that to produce usableand useful interfaces, it is essential to consider a systemdesign from the users’ perspective. In recent years, user-centered system design and evaluation methods havebecome an increasingly important tool for achievingimproved usability for virtual environments (VEs).

In this paper, a VE is distinguished from a VT systemwhere the former represents a computer-generated three-dimensional environment and is a subset or component ofthe VT system that utilizes virtual reality technologies toprovide haptic, audio and visual feedback simultaneously.To better understand the benefits of VT systems infacilitating procedural skill learning, this paper introducesa novel method for measuring human performance whenusing a haptic-audio-visual (HAV) interface. The methodconsiders factors related to human perception, perfor-mance, and memory as well as interaction and learning. Itcombines the efficacy factors, criteria, metrics, and datamentioned in various models and studies [7], [8], [9], [10]for VE efficacy, and defines them and their interrelations ina consistent way.

Haptic cues can enable users to gather information tonavigate and control objects in the synthetic environment[11]. Hale et al. [12] found that when paired with headtracking, haptic/tactile cues enhance performance, andsuggested that the multimodal capacity of VE trainingsystems may advance the knowledge, skills, and attitudesof trainees. Similarly, Bicchi et al. [13] found that whenusing haptic interfaces, cognitive and physical interactionheavily depend on each other [14]. The authors explainedthat “physical interaction can help in setting rules forcognitive evaluation of the environment, while cognitive

46 IEEE TRANSACTIONS ON HAPTICS, VOL. 6, NO. 1, JANUARY-MARCH 2013

. D. Jia is with the Centre for Intelligent Systems Research, DeakinUniversity, Geelong Waurn Ponds Campus, Locked Bag 20000, Geelong,VIC 3220. E-mail: [email protected].

. A. Bhatti and S. Nahavandi are with the Centre for Intelligent SystemsResearch (CISR), Deakin University, Geelong Waurn Ponds Campus,Locked Bag 20000, Geelong, VIC 3220.E-mail: {Asim.bhatti, Saeid.nahavandi}@deakin.edu.au.

. B. Horan is with the School of Engineering, Deakin University, PigdonsRoad, Geelong, Victoria 3217, Australia.E-mail: [email protected].

Manuscript received 9 Sept. 2011; revised 28 June 2012; accepted 30 June2012; published online 20 July 2012.Recommended for acceptance by V. Hayward.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TH-2011-09-0069.Digital Object Identifier no. 10.1109/ToH.2012.41.

1939-1412/13/$31.00 � 2013 IEEE Published by the IEEE CS, RAS, & CES

Page 2: Human Performance Measures for Interactive Haptic-Audio-Visual Interfaces

aspects may improve the physical interaction by settingsuitable control interaction parameters.” Human cognitionfrom an embodied and embedded cognition (EEC) [15]perspective emphasizes that human cognition is embodiedin or involves bodily activities and physical interactionbetween humans and artifacts that potentially assist humancognition and learning. In our work, haptic interaction wasan important part of the VT experience, and we aimed tocapture the efficacy of the VT system in terms of how wellhaptic feedback facilitated learning. Also, since mostexisting VT systems are limited to visual and/or audiofeedback, there are not many methods suitable for assessingthe efficacy of VT systems, which include haptic, audio, andvisual feedback. The method we propose fills this gap andwas designed to assess VT systems in a holistic way. Themethod addresses how effective a VT system is in engagingusers’ multiple sensory channels as well as in supportingcognitive, skill-based and affective learning.

In the area of haptic user interfaces (UIs), effectivenessand efficiency are often assessed according to either theperformance of the haptic interface, or the haptic feedbackperceived by the user [14]. Measuring the performance ofthe haptic interface can be achieved through algorithmicvalidation, and comparison based on rendering realism,whereas measuring the haptic feedback perceived by thehuman user requires psychophysical evaluation. Manyhuman factors studies have assessed the performance ofhaptic UIs and user perceived haptic feedback in sensory-motor control tasks such as peg-in-hole, tapping, targeting,and haptic training [14]. The approach presented in thispaper assesses the performance of a haptically enabled VEin sensory-motor skill training, by investigating how thesystem functions as a whole, rather than examining theperformance of a specific system device/interface.

1.1 User Experience and Learning in VT Systems

Research demonstrates that humans have the natural abilityto effectively process visual representations [16] and thatwhen the visual channel is occupied by the information andknowledge visualizations, the human’s input capacity ismaximized [17]. The human visual perceptual channelshapes the viewer’s perception and understanding of visualrepresentations (e.g., images) as well as guiding humanattention to visual representations and objects [18]. Hapticinteraction can be integrated to VT systems to assist users toperform complex procedural tasks and in the acquisition oftechnical skills [19], [20], [21]. In [22], Yoo and Bruns showthat force-reflective VT systems can improve user perfor-mance in tasks, which require specific motor skills. Similarstudies [23], [24] suggest that VT systems can improve taskperformance and relieve mental workload in mechanicalassembly tasks. Despite these works, there are very fewempirical studies focusing on the usability and effectivenessof VT systems.

1.2 Measuring Human Aspects of VT Systems

The efficacy of haptically enabled VT systems can beassessed according to either the technical performance ofthe haptic interface, or the visual, audio, and haptic feedbackperceived by the users [25], [26]. Measuring the technicalperformance of the haptic interface is often achievedthrough algorithmic validation, and comparison based on

rendering realism, while measuring the feedback perceivedby the user requires psychophysical evaluation [27]. Manyhuman factors studies have been applied to assess both theperformance of haptic UIs and the user perceived systemfeedback in sensory-motor control tasks [28], [29], [30].

A variety of approaches have been proposed forevaluating user performance and feedback during VT [31],[32]. One such approach is the sequential evaluationmethod developed by Gbbard et al. [33]. The method is acombination of expert-based and user-based techniquesachieved through the iterative adaptation and enhancementof existing two-dimensional and graphical user interface(GUI) usability evaluation methods. The method modifiesand extends existing methods to consider complex interac-tion techniques, nonstandard and dynamic UI components,and the multimodal tasks inherent to three-dimensionalUIs. One significant limitation of such an approach is that itis application specific and not suitable for broader applica-tions or evaluation of generic VT systems. While sequentialevaluation may be considered to be more systematic andyielding more comprehensive evaluation results (whenapplied in several cycles of evaluations), it is highly time-consuming thereby prohibiting use as an evaluationprocedure for building usable applications. Furthermore,the use of expert-based evaluation methods makes itdifficult, if not impossible, to produce reliable evaluationresults. As Bowman et al. [34] have emphasized, “evalua-tions based solely on heuristics performed by usabilityexperts are very difficult in three-dimensional UIs becauseof a lack of published, verified guidelines for three-dimensional UI design.”

To address these limitations, we have developed amultidimensional user-centered systematic training evalua-tion (MUSTe) method for VT systems, which utilize haptic,audio and visual displays. The MUSTe method is distinctfrom other methods in terms of theoretical foundation,methodology, construct and scale/applicability.

The MUSTe method:

. draws on solid theoretical foundations of cognitive,skill-based and affective learning theories as aframework,

. utilizes a user-centered evaluation methodologyrather than expert-based approaches,

. identifies efficacy dimensions and factors at the highlevel, and specifies associated measures at the lowlevel, and

. is applicable to a wide range of VT systems and isnot application specific.

In VT applications, measuring the user preference inpractice is an important aspect of system evaluation. AsBowman et al. [34] asserted, it is the lack of attention to userpreference, which has kept VT systems from being used andadopted in the real world. They argue that two userpreference metrics, presence and comfort, are critical forthree-dimensional UI evaluation, but are usually neglectedby traditional UI evaluation methods. These were able to beexamined using the MUSTe method through its multiplehuman performance measures. Additionally, while expert-based evaluation methods depend on expert knowledgeand predefined heuristics (or several evaluation trails), theMUSTe method achieves a reliable VT efficacy evaluation

JIA ET AL.: HUMAN PERFORMANCE MEASURES FOR INTERACTIVE HAPTIC-AUDIO-VISUAL INTERFACES 47

Page 3: Human Performance Measures for Interactive Haptic-Audio-Visual Interfaces

using explicit information obtained from users through asingle user-test session.

2 MUSTE METHOD

This section presents the MUSTe method and its theore-tical background, methodology, individual components,and applicability.

2.1 Theoretical Model, Measurement Dimensions,and Factors

Seffah et al. [35] argue that usability as a recognized andimportant quality factor needs to be assessed through awell-structured consolidated model, rather than relying onindividual methods that are not well integrated into a singleconceptual framework. The MUSTe method was developedbased on the cognitive, affective, and skill-based theories oflearning outcomes [36], and a model for understanding howVEs aid in conceptual learning [37]. The conceptual modelof the MUSTe method is shown in Fig. 1.

More specifically, the MUSTe method adopts a holisticapproach to quantifying the efficacy of VT systems byassessing learning outcomes in cognitive, skill-based andaffective dimensions. Cognitive learning outcomes wereexamined through a memory test that requires recognitionand recall via a self-report method. Skill-based learningoutcomes were assessed using a performance test thatreflects composition proceduralisation [36], thus allowingspeed and fluidity of performance, and error rate, to bedetermined. Affective learning outcomes were examinedusing two self-report methods (self-efficacy (SE) beliefs andperceived VE efficacy (PVE) questionnaires) to gather users’perceptions of and attitude toward the VT system and theirexperience. The affective learning dimension included SEand user PVE as key factors. The PVE was further dividedinto nine subfactors (including usability, learnability,immersion, and presence, as illustrated in Fig. 1), basedon the relevant literature in human-computer interaction(HCI) (at a general level), and human-VE interaction (HVEI)(more specifically).

2.2 User-Centered Methodology

The MUSTe method draws on the user-centered design andevaluation paradigm as the basis to assess how well a VTsystem facilitates user performance, conceptual under-standing, and affect. It is evident from the HCI and HVEIliterature that user-centered design and evaluation showsgreat promise as an emerging paradigm for achievingeffective system design [33], [38]. The user-centered designand evaluation paradigm not only allow evaluators to gainfirst-hand observable performance data through user

observation, but it is also possible for evaluators to gatherunobservable yet critical information, such as user percep-tions of the system’s efficacy, specifically, emotion, andaffect [39], [40], [41]. In the context of this study, “expert”refers to usability expert rather than domain expert.Section 4.5 explains why the user-centered evaluationmethodology was favored over an expert-based approach.

Importantly, the human aspect of emerging technologies(e.g., VT system) can only be fully addressed by focusing ondesigning VT systems around users, respecting userdiversity, and involving users in system evaluation. Toillustrate the efficacy of an existing design, it is important toinvestigate user performance, interaction, and experience inVT. Evaluators using the MUSTe method are able toobserve user performance first hand as well being able toobtain information not directly observable, such as users’perceptions of interaction and experience, as well as thesystem’s efficacy.

2.3 Measures

The MUSTe method is multidimensional in that evaluationinvolves the systematic collection of data from cognitive,skill-based and affective learning dimensions. The informa-tion obtained using the MUSTe method consists of bothsubjective and objective data. Subjective data were gatheredthrough two user perception measures, i.e., a SE ques-tionnaire and PVE questionnaire. Objective data weregathered via task performance and performance memorytests. The experimental tasks are detailed in Section 3.3.

The MUSTe method’s SE questionnaire (shown inAppendix 2, which can be found on the Computer SocietyDigital Library at http://doi.ieeecomputersociety.org/10.1109/ToH.2012.41) was designed for subjective assessmentof a user’s own capability to perform object assembly tasksusing the VT system. Sample questions from the ques-tionnaire are as follows:

. Please estimate the accuracy with which you will completethe training test; and

. Please indicate the test score that you expect to receivebased on accuracy, efficiency, and effectiveness.

The PVE questionnaire (parts 1 and 2 as shown inAppendices 3 and 4, respectively, available in the onlinesupplemental material) measures the individual’s percep-tions of the effectiveness of the VT system to assist them inlearning the object assembly tasks. A 7-point Likert scalewas used to gather the participant’s rating for eachquestion. The scale ranged from 1 (strongly disagree) to 7(strongly agree), where the higher a rating indicates a betterperception of the VT system’s efficacy. Sample statementsare as follows:

. I was able to focus my attention on learning assemblyprocedures rather than the input control tools (e.g.,haptic devices);

. The input control tools (e.g., haptic devices, data glove,and three-dimensional mouse) were comfortable to operatetogether in unison; and

. I have a strong sense of “being there” (sufficientlyimmersed) in the VT environment.

The term “being there” has been widely adopted in theVE interface design literature to describe “presence.”

48 IEEE TRANSACTIONS ON HAPTICS, VOL. 6, NO. 1, JANUARY-MARCH 2013

Fig. 1. Conceptual model of the MUSTe method.

Page 4: Human Performance Measures for Interactive Haptic-Audio-Visual Interfaces

Presence is a key factor, which reflects a user’s psycho-logical perception of “being there,” when immersed in aVE [42]. Unlike immersion, presence appears to be morecognitively driven, and thus may not be assessed easily,but it is treated as a by-product of the VE’s immersivefeatures [5]. The inclusion of subjective measurementtechniques in the MUSTe method is supported by Kraigeret al. [36] who claim that self-report measures, such asself-rating of SE or perceived system performance cap-ability, are the most appropriate methods for the evalua-tion of the affective dimension. This is also supported byothers [30], [43], [44], [45].

The MUSTe objective measures include a task perfor-mance test and a performance memory test. The taskperformance test was achieved in an evaluation test, wheretask completion rate, time on task and error rate were recorded.The performance memory test was achieved using aquestionnaire (as in Appendix 8, available in the onlinesupplemental material) that tested users’ accuracy of recalland recognition about the VT and the concepts and/or skillsbeing taught.

The memory test was designed to objectively measurecognitive learning. Unlike questions that measure anexperience, such as “presence,” (treated as perceptual-based attributes and measured in specifically designedperception measurement scales), questions in the memorytest addressed each dimension of the participant’s VTmemory structure (e.g., tool used, shape, and size) andtraining procedure (e.g., task sequence, tool support).Memory tests were used in the work [43] to aid in assessingthe engagement and immersion of the VE user experience.It was claimed that, by focusing on questions relating to VEstructure and characteristics, the user may reveal theirspatial awareness, sense of presence, and attention towardthe VE. The MUSTe method’s design and measurementconstruction was reviewed by three VE design experts, andtheir feedback was incorporated into the refined method.

3 EVALUATION AND EXPERIMENTAL DESIGN

VT systems offer significant potential in knowledge crea-tion, human learning, and in interactive and immersiveinformation visualization. In this paper, the MUSTe methodis employed for evaluating users’ HAV interaction forcomputer-generated virtual machine assembly.

3.1 Participants

Seventy-six volunteers participated in the evaluation in acontrolled laboratory environment. Participants weretrained on a series of object assembly tasks. Theparticipants had diverse backgrounds and fell into fourage groups: 18-24 ðN ¼ 32Þ; 25-34 ðN ¼ 33Þ; 35-45 ðN ¼ 8Þ,and over 45 (N ¼ 3). Fifty-six of the participants weremale and 20 female, and most participants reportedhaving limited prior experience with manipulating three-dimensional objects in a computer environment. Allparticipants were recruited from the School of Engineer-ing, Deakin University and the study was approved bythe University’s Ethics Committee.

The age groups were divided based on commonlyaccepted and used age ranges for younger, middle-aged,and older adults in the HCI literature. In the literature, it iscommon to simply define specific age groups based on the

focus of the study, which can lead to uneven age ranges andgroups. For example, a study [46], which reported age-related performance differences in training ATM menunavigation tasks, defined two age groups as younger (18 to30 years) and older adults (64 to 80 years). In the evaluationof a VR driving simulator, Liu et al. [47] targeted age groupsof 13 to 35, 36 to 55, and 56þ for the investigation of ageimpact on performance. Another more recent study [48]used three females and nine males aged between 20 and 27to validate the performance of haptic motor skill training.

Our earlier study [49], [50] showed that previousexperience with object assembly tasks does not impact onuser performance in a VE, but prior experience with themanipulation three-dimensional objects in a computerenvironment (LOE3D) does. In the work presented herein,most participants reported limited prior experience withmanipulating three-dimensional objects in a computerenvironment (LOE3D). Specifically, 46 percent (N ¼ 35) ofparticipants indicated that they had no experience or onlyminimal LOE3DExp, 37.3 percent (N ¼ 28) reported mod-erate LOE3D, and 16 percent (N ¼ 12) high LOE3D. Interms of prior experience with VEs (LOEveExp), mostparticipants 88 percent (N ¼ 66) indicated that they hadlittle to no prior experience, 10.7 percent (N ¼ 8) reportedmoderate LOEveExp, and only 1.3 percent (N ¼ 1) reportedhigh LOEveExp.

3.2 Apparatus

A HAV training system designed to support the learningprocess for general assembly tasks was developed at theCentre for Intelligent System Research (CISR), DeakinUniversity. Fig. 2 depicts the VT system architecture. TheHAV training system was developed to provide anintuitive training platform enabling assembly operators tobe trained in assembly tasks and sequences. In contrastwith existing VT systems capable of providing assemblysequence training only, the system provides a high level ofphysical interactivity aimed at cognitive learning andprocedural skills development. The haptic component ofthe system plays an important role in facilitating thisthrough interaction with the users’ haptic modality.Through replication (within the capabilities of the system)of real-world training scenarios using the haptic, audio,and visual modalities, the system evokes a realistic training

JIA ET AL.: HUMAN PERFORMANCE MEASURES FOR INTERACTIVE HAPTIC-AUDIO-VISUAL INTERFACES 49

Fig. 2. System architecture of the HAV training system.

Page 5: Human Performance Measures for Interactive Haptic-Audio-Visual Interfaces

environment that facilitates the user’s perceptual, cognitive,and assembly skills development.

Fig. 3 illustrates a participant using the VT system. As canbe seen, the LCD monitors were placed in front of theparticipants at a comfortable viewing distance. The three-dimensional mouse was placed next to the participant’snondominant hand. The chosen hardware components wereused to provide users with force feedback, three-dimen-sional object perception, and to enable three-dimensionalenvironmental manipulation. The system’s software com-ponents comprised a UI consisting of a series of user menus,and three-dimensional visual and haptic models of assemblyobjects.

The participants were also provided with visual andaudio feedback to keep them engaged with the trainingprogress, as well as to inform them about different eventsoccurring during training. Events that triggered audio-visual cues included task completion, requested assistance,violation of recommended practices, and incorrect opera-tional sequences. The visual component of the audio-visualfeedback involved assembly objects changing color on taskcompletion, as well as graphical cues to assist the participantto perform the task, highlight correct assembly sequences,and to show the assembly path and correct procedures. Theaudio feedback involved three distinct audio cues linked tocorrect, incorrect, and forbidden actions. The feedback wassimilar to the positive and negative reinforcement learningmechanisms used in computer gaming. Audio cues linked tocorrect actions and the successful completion of tasks weredesigned to invoke an enhanced feeling of engagement andmotivation, while audio cues linked to incorrect or for-bidden actions aimed to improve understanding of correctassembly procedures.

3.3 Experimental Tasks and Training Design

A total of seven object assembly tasks were included in theVT system. The tasks had varying levels of difficulty (LOD)classified as low, moderate, and high. Descriptions of thetask design as well as pilot study and research findingswere reported in our previous works [51], [52]. Theassembly tasks involved selecting, rotating, releasing,inserting, and manipulating three-dimensional objects with-in the VE. A practice task is shown by Fig. 4. The tasks weredesigned based on first-hand observation of an automotiveassembly production line. In designing the tasks, both staticobjects (physically constrained to move within prescribedlimits) and dynamic objects (with no constraints on theirspatial behavior) [53] were included. For example, the carcockpit was a static object that users could not displace,whereas other objects such as the radio box, screw driver,stereo, and power connector were dynamic objects andcould be moved and manipulated.

The training process consisted of user-selectable diffi-culty levels and training modes, as follows:

. Mode I. Three-dimensional animated simulationexplaining the task procedures and sequence ofoperations necessary to achieve successful assembly.

. Mode II. Experimental learning with a simple objectassembly task, enabling first-person user experiencewith the VT system.

. Mode III. Interactive simulation of assembly opera-tions allowing the user to perform the training tasksin a predefined sequence. Each task needed to beperformed in a specific order.

. Mode IV. Self-exploration of assembly tasks wherethe participant could practice the required skillswith no restriction on task sequence.

The above-described training modes require differentlevels of interaction from the participant. In a generalsense, the lower the level of difficulty of a particulartraining mode, the less interaction or input required fromthe participant and the more audio-visual feedbackprovided. Similarly, as the level of difficulty increased, sodid the level of interactivity while the assistive feedbackdecreased accordingly.

3.4 Procedures

Fig. 5 illustrates the procedures of the experimental study.Upon arrival to the training venue, each participant wasbriefed on the purpose of the experiments, the VT system,the experimental procedure, benefits, possible risks, andtheir rights as a participant. A pretraining questionnaire(pretesting-Q) was then completed by the participant. Thequestionnaire (as shown in Appendix 5, available inthe online supplemental material) was designed to collectthe participant’s demographic information, e.g., age, gender,prior experience with computers, and so on. The ques-tionnaire also included questions about prior experiencewith VT systems and current knowledge and experiencewith object assembly in both virtual and real worlds. The

50 IEEE TRANSACTIONS ON HAPTICS, VOL. 6, NO. 1, JANUARY-MARCH 2013

Fig. 3. A participant using the VT system.

Fig. 4. Practice task.

Fig. 5. Flowchart of the experiment process.

Page 6: Human Performance Measures for Interactive Haptic-Audio-Visual Interfaces

participant then completed four training modes that tookapproximately 30 minutes. To minimize potential VEexposure impact, 5-minute breaks were included betweeneach training mode. The participant then completed a SEquestionnaire designed to obtain the participant’s percep-tions of their own capability in performing the training tasks.

Afterwards, the participant was required to undertake atraining test involving seven object assembly tasks within15 minutes. During this test, the system automaticallylogged participants’ performance for each task (e.g., thetime taken to complete the task and the error rate), thenumber of times the Help function was accessed, andthe number of restarts for any particular task.

Upon completion, participants were required to com-plete a posttest questionnaire designed to collect theirperceptions of the system efficacy. This was then followedby a short interview about the participant’s feelings,emotions, and perceptions of the training and learningexperience. The purpose of the interview was to obtain asnapshot of the participant’s feelings immediately after theVT experience.

The entire experiment lasted approximately 1.5 hours.Two weeks after participating in the experiments, amemory test questionnaire was distributed to each partici-pant to assess retention. The questionnaire was conductedonline and distributed via e-mail (with a link to the memorytest website) and participants were required to recognizeVT I/O devices, assembly parts, tools, and assemblysequences. Participants’ responses to the questionnairewere submitted online and returned via e-mail.

An important aspect of the experimental design isconsideration of the effects of cyber sickness and fatigue.Since a standard for acceptable exposure time to VEs doesnot yet exist, nor does the knowledge of which VE elementscause cyber sickness, a worst case assumption was used [6].An experiment of more than 30 minutes duration wasconsidered as lengthy VE exposure. To obtain statisticallysignificant results, the duration of the experiment could notbe reduced. Therefore, to reduce the likelihood of cybersickness or fatigue occurring, planned rest breaks wereintegrated into the experiment design.

3.5 Data Collection and Analysis

To automatically track participant performance, task statusand outcome data, an Extensible Mark-up Language (XML)logging tool was developed. The tool extracted thenecessary information including time on task, completionrate, and times spent navigating the VE and the user menu.This data were then sent to a common gateway interface(CGI) script, where it was stored within a time-stamped textfile, to be later read by a data collection tool. All collecteddata were able to be imported, manipulated, and exportedto the SPSS software package for statistical analysis.

The data collection methods developed in this studyinclude close-ended (with numerical responses) and open-ended questions on the same questionnaire. Data analysisincluded factor analysis of Likert-scaled questions, and theuse of the constant comparative method to analyzenarrative responses to open-ended questions [54], [55].

Any qualitative data collected through questionnaires,interviews, observations, and video recordings can be

converted to numerical form using the “quantitizing”technique [55]. The technique involves assigning numericalvalues to qualitative data and integrating quantitative andqualitative data through data transformation [56]. Thistransformation enables qualitative data to be represented innumerical format, such as scores, scales or clusters andsupports a more comprehensive interpretation of a phe-nomenon. Due to limited space, this paper omits theanalysis of the video recordings and interviews. Detailswill be reported elsewhere and will discuss the moresubjective insights of system efficacy.

4 RESULTS AND DISCUSSIONS

The usability of a system can be evaluated by objectivemeasures of participants’ behavior such as time taken andtask performance. Real-time task performance often referredto as “performance usability” was assessed in this work.

4.1 User Performance

The results of the experiments indicated that a total of75 participants successfully completed one or moreassembly tasks (1 to 7) of various LOD (low to high)within 15 minutes. One participant did not assemble any ofthe required objects with the VT system and withdrewfrom the testing.

As shown in Table 1, assembly task 1 (lowest difficultylevel) had the highest completion rate. Assembly task 7(highest difficulty level), however, was only completed by27 participants. Assembly tasks 2 to 6 (moderate difficultylevel) had completion rates ranging from N ¼ 53 to N ¼ 71.A full score of 100 was achieved by 36 percent of theparticipants (N ¼ 27), while a performance score of 80 wasachieved by 35 percent (N ¼ 26) of the participants. Fifteenpercent of the participants (N ¼ 11) were able to completethe test with a performance score of 60. A small number ofparticipants received poor performance outcomes, withscores falling into 40 (N ¼ 4; 5 percent), 30 (N ¼ 3; 4 percent),and 20 (N ¼ 4, 5 percent), respectively.

As can be observed from Table 1, the 75 participants tookan average of 36.21 seconds to complete the first task(lowest difficulty level), and 152.04 seconds to complete thelast task (highest difficulty level). Completion rates werehigher for the simpler tasks and lower for the more difficulttasks as demonstrated by Table 1. For the tasks of moderatedifficulty, the completion rates varied from 53 to 71.

JIA ET AL.: HUMAN PERFORMANCE MEASURES FOR INTERACTIVE HAPTIC-AUDIO-VISUAL INTERFACES 51

TABLE 1Task Performance Results

Page 7: Human Performance Measures for Interactive Haptic-Audio-Visual Interfaces

The memory test included a list of questions and imagesof the assembly objects and tools used in the VT system.Participants were assessed based on their accuracy of recalland the amount of knowledge they learned from the VTexperience. To successfully answer the questions, partici-pants needed to recognize and recall training proceduresand interaction experiences such as assembly trainingprocedures and interaction with three-dimensional virtualobjects including parts and tools, and with the VT system’sUI. Nineteen participants responded to the memory testand the majority performed well (M ¼ 72; SD ¼ 20;N ¼ 19).The mean score of the memory test was 72.

Table 2 presents the questions and scores for the memorytest. S and M represent single and multiple choicequestions, respectively, and � and percent are the numberand percentage of the 19 respondents who answered theparticular question correctly. All memory test questionsrequired unique answers except Q4, which required multi-ple choices and for each correct choice, a 12.5 score weightwas given.

Of the 19 participants, 4 (21 percent) achieved a full scoreof 100, 4 (21 percent) performed poorly with a score below50, and 7 (36.8 percent) achieved moderately high perfor-mance scoring between 70 and 75. Two participants(10.5 percent) received relatively high test scores of 87.5and 85. The other two participants received test scores of 60and 57.5. All participants were successful in correctlyanswering question 3, which required them to recall thetool used and its shape. Eighteen participants (94.7 percent)also correctly answered question 5, which related to the tasksequence of the assembly training operations.

The memory tests were conducted online using aPHP webpage. The webpage collected responses, dis-played friendly feedback, thanked them for participating,and e-mailed text files containing participant responses tothe evaluator’s e-mail account. Given that the memory testwas distributed and conducted online, as well as beingconducted outside of the experiment session, only a lowresponse rate was achieved. As a result, we are nowinvestigating improved ways of distributing and conduct-ing the memory test.

4.2 System Performance

The following results were derived from responses to the SEand PVE questionnaires. The questionnaires provided both

qualitative and quantitative data. Factor analysis andCronbach’s alpha are sophisticated and widely usedmethods for instrument refinement and validation [57],[58] and were applied to the two questionnaires to test theirvalidity and reliability, respectively.

4.2.1 SE Beliefs

The SE questionnaire provides a self-estimate, prior tocommencement, of a participant’s capability to performobject assembly tasks. Table 3 shows the loadings ofvariables on factors, eigenvalues, and percents of variance.To aid in interpretation variables are ordered and groupedby loading value. The importance of each factor isdetermined by the percent of variance it represents.Factor I accounts for 68.44 percent, and Factor II accountsfor 16.86 percent of the variance in the set of variables. Thetwo factors account for over 85 percent of the total variancecorresponding to outstanding performance.

Overall, the participants had moderate SE beliefs with amean score of 59 (SD ¼ 15;N ¼ 75). This can be interpretedas participants believing that they are reasonably capable ofperforming effective object assembly with the VT system.Similar mean scores were found for performing the taskaccurately (M ¼ 67; SD ¼ 17;N ¼ 75), and estimated trainingtask scores (M ¼ 69; SD ¼ 15;N ¼ 75). Interestingly, theseself-estimated performance scores were lower than the taskperformance actually achieved (M ¼ 77; SD ¼ 24;N ¼ 75).From a participant’s perspective, this indicates better thanexpected skill-based learning.

4.2.2 Perceived System Efficacy: Subjective Usability

Factor analysis was performed to explore the dimension-ality of the PVE factors. All items had a loading greater than0.3 and as such, no item was omitted. The mean score foreach factor was calculated by taking into account the factorweight with raw response data. A combined score was thenproduced for each factor shown in Table 4b. Through anextensive review of the related literature nine factors for themeasurement focus of the PVE were initially identified. Inthis paper, the appropriateness of these factors wereempirically assessed.

52 IEEE TRANSACTIONS ON HAPTICS, VOL. 6, NO. 1, JANUARY-MARCH 2013

TABLE 2Memory Test Outcomes

TABLE 3SE Items and Factor Analysis Results

Page 8: Human Performance Measures for Interactive Haptic-Audio-Visual Interfaces

Results demonstrate that the factor construct of the PVEremains unchanged. Some items, however, were groupedunder a different factor than the one to which they wereoriginally assigned. As expected this may contribute tointercorrelations among the factors.

Based on the factor analysis and the results from thisstudy, the contents of the items and factors they wereloaded to were carefully reviewed to produce the new labelsbased on the empirically established factors (Table 4a). As aresult, objective awareness, engagement, control, andinteractive usability were determined as factors signifi-cantly influencing the PVE and can be used to ensure ausable and user-centric design.

Herein, PVE is defined as the extent to which a learningactivity, using a specific VT system, is perceived to beeffective, efficient, and enjoyable. This definition accom-modates the quality of the VT system’s knowledge transferas perceived by the user and can be grouped into threespecific measurement focuses, namely: Perceived cognitiveand learning quality (PCLq), perceived interaction andlearning quality (PILq), and perceived system and UIquality (PSUIq). From a theoretical perspective, PCLq isconcerned with how users perceive the quality of theknowledge transfer achieved by the VT system, PILq withhow users perceive the level of interaction with the VTsystem enabling their effective learning, and PSUIq withhow users perceive the effectiveness of the VE system andthe UI in enabling their learning.

Pearson’s correlation test was performed and the resultsare shown in Table 5. Pearson’s correlation was used toidentify relationships between factors and measurementoutcomes. The results demonstrate that all of the extractedfactors share significant positive relationships, confirminginternal consistency among the PVE factors. WhileSpearman’s correlation is typically more appropriate formeasurements taken from ordinal scales and Pearson’scorrelation for measurements taken from interval scales,there is no widely accepted rule as to which method toemploy. Ideally, both methods could be applied and thedifferences observed. The literature suggests, however,that in many cases the outcome will be the same.

Pearson coefficients are based on true values and depictlinear relationships while Spearman’s method is based on

ranks and depicts monotonic relationships. In this paper, weassumed a linear relationship between factors and measuresand for this reason Pearson’s method was chosen. Inaddition, Pearson’s method is easy to compute, sensitiveto outliers, and assumes normality in variables.

Cronbach’s Alpha reliability test was performed(� ¼ 0:920) and the results show that the SE scale is highlyreliable when compared with the recommended reliabilitylevel (0.70) as suggested in [59], [60]. The reliability testresults for the three subscales measuring PVE show thatPCIq (� ¼ 0:87) and PSUIq (� ¼ 0:95) are highly reliable(compared with the recommended reliability level). PCLqalso provided a satisfactory result (� ¼ 0:70). All factorswithin each subscale had high internal consistency withalpha coefficients ranging from 0.945 to 0.948 for PSUIq,0.848 to 0.868 for PILq, and 0.671 to 0.750 for PCLq.

The user’s perception of a system is a key indicator ofsystem effectiveness. Despite this, however, there is a lackof empirically validated research results suggesting whichfactors are important for evaluating VT system efficacy, andhow they relate to design quality. The user-centeredevaluation presented herein contributes to filling this gapthrough the development of set of empirically validatedfactors for VT system quantification (i.e., the attributes ofthe factor analysis). Unlike performance usability, whichprovides an objective measurement of VE efficacy, userperceptions of ease of use, ease of learning, and satisfactioncan also be used to assess system performance. The resultsshown in Fig. 5 and Tables 5 and 6 are derived from the75 participants’ responses using the PVE scale.

JIA ET AL.: HUMAN PERFORMANCE MEASURES FOR INTERACTIVE HAPTIC-AUDIO-VISUAL INTERFACES 53

TABLE 4Dimensionality of PVE (a) Factor Weight

and Factor Score (b) Factor Result

TABLE 6User Satisfaction

TABLE 5Correlations within PVE Factors

Page 9: Human Performance Measures for Interactive Haptic-Audio-Visual Interfaces

Fig. 6 shows users’ perceptions of the VT system’sefficacy. The results demonstrate that overall users agreedthat the VT system’s components easily allowed them toeasily manipulate the virtual objects. The three-dimensionalmouse scored the highest for “ease of use” while the headmounted display (HMD) scored the lowest.

These results suggest that in general users were able toadapt to the three-dimensional mouse quickly and weresubject to a shallow learning curve. In contrast, users foundthe HMD more difficult to use and generally took longer toeffectively utilize it to perform tasks with the VT system. Infact, some users were unable to utilize the HMD at all dueto simulator sickness. In terms of “ease of learning,” theresults demonstrate that users found operating the hapticdevice, data glove, and three-dimensional mouse straight-forward, with the data glove requiring the least effort.Given the specialist nature of the haptic device, it isinteresting to observe that most users found it straightfor-ward to learn to operate. Due to the physical and cognitiveload associated with using the haptic device, however,“ease of use” received a lower rating.

When compared with other user perception measures,user perceived “ease of learning” received the highestratings, where users agreed (M ¼ 5) that learning to usehaptic device, data glove, and three-dimensional mousewas easy. Note: The HMD was excluded from the “ease oflearning” measure due to requiring no “learning” from theuser. This is the case because so long as users have normalvision and can distinguish colors and three-dimensionaldepth (as indicated by all participants through self-reportsduring induction) they are considered capable of perceivingvisual information using the HMD.

Considering “system feedback,” the HMD received thehighest rating suggesting that the majority of users felt thatvisual feedback (e.g., color change of the object) wasappropriate and useful in assisting them to manipulatevirtual objects. Similarly, the results suggest that the users feltthat the haptic feedback (i.e., force sensation) was appro-priate and did not distract them from performing the task.

In terms of comfort, the results suggest that users felt thatthe data glove and three-dimensional mouse were comfor-table to operate, with ratings between 5 and 6. As for thecomfort of the haptic device and HMD, ratings only slightlyabove 4 (Neutral) suggest that users were not entirelysatisfied. Interestingly, the HMD provided the least usercomfort of all VT system components.

Table 6 presents user satisfaction results. With allsatisfaction ratings close to or above 5 (out of 7), it appears

that users were satisfied with their learning experience, thesystem design, and training with the VT system. “Userexperience” received the highest satisfaction rating.“Learnability” of the VT system also received a highrating. When asked as to whether the haptic device, dataglove, three-dimensional mouse, and HMD were comfor-table to operate in unison, i.e., “system comfort,” themajority of users were satisfied (mean rating close to 5).

In summary, users achieved a high level of PVE(M ¼ 74; SD ¼ 16;N ¼ 76). Furthermore, consistent resultswere achieved regarding system efficacy, including sub-jective perceptions (measured via SE and PVE scales) andobjective performances (measured via performancetest—TTS and memory test—MMT). The empirical userstudy results indicate that the VT system is able tosupport training of object assembly operations and iseffective in training users with various levels of priorexperience and demographic backgrounds to achieve amoderate to high levels of cognitive, skill-based andaffective learning outcomes.

4.3 Synthesis of Human Performance Measures

The PVE scores (M ¼ 74; SD ¼ 16;N ¼ 76) were higher thanthe SE scores (M ¼ 59; SD ¼ 15;N ¼ 75), while both werelower than participants’ actual training test scores(M ¼ 77; SD ¼ 24;N ¼ 75). This suggests that althoughusers were not confident with their own ability to performlearned object assembly skills before commencing training,they had positive perceptions of the VT system’s efficacy.

The actual training test scores, which were higher thanboth the PVE and SE scores, suggest that the VT system waswell designed. Despite being spread between 40 and 100,the users’ SE score also showed a pattern supporting highachievement. Users held positive SE beliefs and perceivedthe VT to be effective at various task performance levels(with task scores between 20 and 100). This suggests thatregardless of users’ skill-based learning outcomes (mea-sured using the training test), they held high SE beliefs(before the test) and high perceptions of the VT system’sefficacy (after the test).

Pearson’s correlation test was performed and the resultsare shown in Table 7. The results suggest that SE and PVEshare a positive moderate but significant relationship(r ¼ 0:409, p ¼ 0:000, N¼75). This indicates that users whohave higher SE beliefs also perceive that the VT system ismore effective, and vice versa. SE, however, shares anegative moderate relationship with TTS (r ¼ �025; p > :05,N¼75) suggesting that users who had low SE still did wellin the training test. PVE shared a positive moderate

54 IEEE TRANSACTIONS ON HAPTICS, VOL. 6, NO. 1, JANUARY-MARCH 2013

Fig. 6. System usability.

TABLE 7Correlations of Performance and Perception Measures

Page 10: Human Performance Measures for Interactive Haptic-Audio-Visual Interfaces

relationship with TTS (r ¼ 0:052; p > 0:05, N ¼ 75) suggest-ing that users who achieved a high-performance outcome(task score) also perceived the VT system to be moreeffective. Although the relationships between SE and TTS,and PVE and TTS are only moderate, they still identify therelationships between important human performance andperception measurement dimensions.

Overall, the results presented in this paper indicate thatqualitative methods are valuable for obtaining in-depthdetails about user behavior, preference, and perceptions.We suggest that it is worthwhile combining users’ sub-jective perceptions with performance data, to allow de-signers to accurately locate problematic areas within anexisting design, and to create new designs to enhance VTsystem efficacy.

Existing work has shown bias for perceived inertial andviscous loads when vibrotactile stimuli is applied to fingerpads [61]. The results agree with claims made by Salzmanet al. [37], Coelho et al. [42] who suggest that users’physiological and psychological reactions to VEs vary.Further human factors research is required to develop abetter understanding of the capabilities of the human userin manipulating or lifting virtual objects, as well as workingwith the instabilities common to haptic devices [62], [63]. Assuch, it is beneficial to conduct human factors studies aspart of any research which validates system design [64].

4.4 Utilization of MUSTe

MUSTe is a user-centered evaluation method (UEM), whichassesses how well a system facilitates users’ performance,conceptual understanding, system utility and affect. Aflowchart as shown in Appendix 6, available in the onlinesupplemental material, illustrates how MUSTe can beutilized to determine a VT system’s efficacy and to helpidentify possibilities for system redeisgn to improve efficacy.

Prior to commencing evaluation using the MUSTemethod, six users need to be identified. An evaluator,ideally a usability practitioner or an expert possessingknowledge and expertise in usability studies and usertesting, will perform the evaluation in a controlled experi-mental setting. The practitioner/expert will gather multi-modal user performance, perception, and feedback as wellas memory data. In practice, with little training, a VTsystem designer or developer can use the MUSTe method toevaluate VT efficacy. A set of screen shots of MUSTeengineering tool is included in Appendix 7, available in theonline supplemental material, and along with the detaileddescription of how the method can be applied (Appendix 6,available in the online supplemental material).

4.5 Formative User-Centered Evaluation

Over recent decades various methods and tools have beendeveloped to evaluate computer UIs. Some classical toolsand methods are still widely adopted and recognized byevaluation experts. Examples include user task analysis,scenarios, taxonomy (or comprehensive guidelines), andprototyping, which are all used to evaluate low level, or anearly design or metaphor. On the other hand, cognitivewalkthrough, heuristic evaluation, formative evaluation(often in usability labs), and summative evaluation(e.g., for iterative design) are commonly used methods fordesktop computing user evaluations.

These methods differ with respect to formality, rigidity,and the degree of user participation. Choosing the mostsuitable method for an evaluation depends on the system orproduct being developed, the availability of representativeusers, funding, and time restrictions. Depending on thelevel of user involvement or degree of user participation, anevaluation can be expert based or user based. Expertinspection methods such as cognitive walk-through andheuristic evaluation tend to identify lack of conformity tostandards, interface design guidelines, and experience-based expert comments. User-based testing (either forma-tive or summative) provides information related to the taskat hand, and asks the question: “Does this design supportthe user in their work?” In the study presented herein, userbehavior observation methods such as direct observation,questionnaires, and performance measurement are used inthe testing session to verify the effectiveness of the design.

It is apparent that the requirement for expert knowledge,multiple evaluations, and follow-up user testing, to achievea reliable evaluation, limits an evaluation technique to aspecific group of evaluators. There also exists a lack ofpredefined heuristics for VT systems [34]. Consequently,many acknowledge that the current expert-based evaluationpractices provide little value if they do not involve usertesting as part of the evaluation process [65], [66].

User task analysis is a valuable technique in HCI. It hasbeen used extensively for design based on what users needto be able to do with an application. It has been suggestedthat user task analysis enables rigorous and structuredcharacterization of user activity [34], [67]. The techniqueoften requires extensive input from representative users soas to enable a design and development team to obtaindetails of task descriptions, sequences, relationships, userwork, and information flow [34]. Performing user taskanalysis enables the evaluator of a design and developmentteam to better understand what users actually want: Whatare their tasks? What is the nature of those tasks? Such anunderstanding can facilitate a UI design that is specified atan extremely low level and mapped effectively to users’high level tasks. As the evaluation is centered around thehigh-level functional prototype or system rather than low-level designs, user task analysis is not ideal.

Formative evaluation is favored by the work presentedherein. It is an observational, empirical evaluation methodwhich uses iterative user testing to identify usabilityproblems and to assess the design’s ability to support userexploration, learning, and task performance [34]. Formativeevaluation also obtains critical user task performance anduser satisfaction data and has demonstrated to be efficientand effective in VE usability evaluation [68]. Most usabilityevaluations of three-dimensional UIs and VEs fall into theformative evaluation category.

5 CONCLUSIONS

This paper discusses the development of evaluationmethods for evaluating VT system efficacy. The workcontributes to a better understanding of the key factorsaccounting for VT system efficacy and the impact of theseon VT outcomes. In combination with the user-centeredevaluation processes and outcomes, the work is applicable

JIA ET AL.: HUMAN PERFORMANCE MEASURES FOR INTERACTIVE HAPTIC-AUDIO-VISUAL INTERFACES 55

Page 11: Human Performance Measures for Interactive Haptic-Audio-Visual Interfaces

to designers, users, and decision makers of simulation andadvanced computer technologies supporting interactionand learning.

Cognitive, affective and skill-based learning were identi-fied as important measurement dimensions for an evalua-tion model and the MUSTe evaluation method forquantifying VT training system efficacy was presented.The MUSTe method and the underlying measurementdimensions were tested in a large-scale evaluation study,and the results demonstrated its effectiveness in quantify-ing the efficacy of a VT system based on user performance,perception and feedback. The results also show that bothperformance and perceptually based measures provideconsistent performance outcomes.

We suggest that the MUSTe method is particularly usefulin fields such as medical, manufacturing, and aerospace,which require employees to deal with complex visual andhaptic scenarios, and in extracting the relevant informationquickly and reliably.

ACKNOWLEDGMENTS

This work was supported by the Centre for IntelligentSystems Research, Institute of Technology ResearchInnovation, Deakin University.

REFERENCES

[1] B. Horan et al., “Towards Haptic Microrobotic IntracellularInjection,” Proc. IEEE/ASME Int’l Conf. Mechatronic and EmbeddedSystems and Applications, 2009.

[2] T.R. Coles, D. Meglan, and N.W. John, “The Role of Haptics inMedical Training Simulators: A Survey of the State of the Art,”IEEE Trans. Haptics, vol. 4, no. 1, pp. 51-66, Jan./Feb. 2011.

[3] F. Crison et al., “Virtual Technical Trainer: Learning How to UseMilling Machines with Multi-Sensory Feedback in VirtualReality,” Proc. IEEE Conf. Virtual Reality, pp. 139-145, 2005.

[4] O. Goksel, K. Sapchuk, and S.E. Salcudean, “Haptic Simulatorfor Prostate Brachytherapy with Simulated Needle and ProbeInteraction,” IEEE Trans. Haptics, vol. 4, no. 3, pp. 188-198,May/June 2011.

[5] K.M. Stanney et al., “Usability Engineering of Virtual Environ-ments (VEs): Identifying Multiple Criteria that Drive EffectiveVE System Design,” J. Human-Computer Studies, vol. 58,pp. 447-481, 2003.

[6] R. Kennedy, K. Stanney, and W. Dunlap, “Duration and Exposureto Virtual Environments: Sickness Curves during and acrossSessions,” Presence: Teleoperators and Virtual Environments, vol. 9,no. 5, pp. 463-472, 2000.

[7] M.C. Salzman et al., “The Design and Evaluation of VirtualReality-Based Learning Enviornments,” Presence: Teleoperators andVirtual Environments, http://www.bsuc.cn:8013/bk/a18/xssj/webquest/fanliangchen-huangjing/xxzy/sjkf/sj/9.pdf, 1993.

[8] D. Stone et al., User Interface Design and Evaluation. MorganKaufmann Publishers, 2005.

[9] Y.-L. Theng et al., “Mixed Reality Systems for Learning: A PilotStudy Understanding User Perceptions and Acceptance,” Proc.12th Int’l Conf. Human-Computer Interaction, 2007.

[10] T. Whalen, S. Noel, and J. Stewart, “Measuring the HumanSide of Virtual Reality,” Proc. IEEE Int’l Symp. VirtualEnvironments, Human-Computer Interfaces and Measurement Sys-tems (VECIMS ’03), 2003.

[11] M. Steffin, “Visual-Haptic Interfaces, Modification of Motorand Cognitive Performance,” http://emedicine.medscape.com/article/1136674-overview, 2009.

[12] K.S. Hale, K.M. Stanney, and L. Malone, “Enhancing VirtualEnvironment Spatial Awareness Training and Transfer throughTactile and Vestibular Cues. Ergonomics,” vol. 52, no. 2, pp. 187-203, 2009.

[13] A. Bicchi et al., “An Atias of Physical Human-Robot Interaction,”J. Mechanism and Machine Theory, vol. 43, pp. 253-270, 2008.

[14] S. Ricciardi et al., “Dependability Issues in Visual-Haptic Inter-faces,” J. Visual Languages and Computing, vol. 21, pp. 33-40, 2010.

[15] W. Winn, “Learning in Artifical Environments: Embodiment,Embeddedness and Dynamic Adaption,” Technology, Instruction,Cognition & Learning, vol. 1, no. 1, pp. 87-114, 2002.

[16] R.A. Burkhard, “Towards a Framework and a Model for Knowl-edge Visualization: Synergies between Information and Knowl-edge Visualization,” Knowledge and Information Visualization, S.O.Tergan and T. Keller, ed., pp. 238-255, Springer-Verlag, 2005.

[17] G.A. Miller, “The Magical Number Seven, Plus or Minus Two:Some Limits on Our Capacity for Processing Information,”Psychological Rev., vol. 63, pp. 81-97, 1956.

[18] J.F. Jensen, “Virtual Inhabited 3D Worlds: Interactivity andInteraction between Avatars, Autonomous Agents and USers,”Virtual Interaction: Interaction in Virtual Inhabited 3D Worlds,L. Ovortrup, ed., pp. 23-47, Springer-Verlag, 2001.

[19] M. De Marsico and G. Vitiello, “Introduction to the Special Issueon Multimodal Interaction through Haptic Feedback,” J. VisualLanguages and Computing, vol. 20, no. 5, pp. 285-286, 2009.

[20] M. Tavakoli, R.V. Patel, and M. Moallem, “A Haptic Interface forComputer-Integrated Endoscopic Surgery and Training VirtualReality,” vol. 9, pp. 160-176, 2006.

[21] S. Okamoto et al., “Virtual Active Touch: Perception of VirtualGratings Wavelength through Pointing-Stick Interface,” IEEETrans. Haptics, vol. 5, no. 1, pp. 85-93, Jan.-Mar. 2012.

[22] Y.H. Yoo and W. Bruns, “Motor Skill Learning with Force-feedback in Mixed Reality,” Proc. 16th Int’l Federation of AutomaticControl (IFAC) World Congress, 2005.

[23] A. Seth, H.-J. Su, and J.M. Vance, “Development of a Dual-Handed Haptic Assembly System: SHARP,” J. Computing andInformation Science in Eng., vol. 8, no. 4, pp. 044-502, 2008.

[24] A. Tang et al., “Comparative Effectiveness of Augmented Realityin Object Assembly,” Computer Human Interaction, vol. 5, no. 1,pp. 73-80, 2003.

[25] A. Bhatti et al., “Haptically Enable Interactive Virtual AssemblyTraining System Development and Evalaution,” Proc. Ann.Simulation Technology and Training Conf. (SimTecT ’09). 2009.

[26] P. Boulanger et al., “Hapto-Audio-Visual Environments forCollaborative Training of Ophthalmic Surgery over OpticalNetwork,” Proc. IEEE Int’l Workshop Haptic Audio Visual Environ-ments and their Applications (HAVE), 2006.

[27] H. Bleuler et al., “Generic and Systematic Evaluation of HapticInterfaces Based on Testbeds,” Proc. IEEE/RSJ Int’l Conf. IntelligentRobots and Systems, 2007.

[28] S. Ricciardi et al., “Dependability Issues in Visual-HapticInterfaces,” J. Visual Languages and Computing, vol. 21, no. 1,pp. 33-40, 2010.

[29] A. Sutcliffe, B. Gault, and J.-E. Shin, “Presence, Memory andInteraction in Virtual Environments,” Int’l J. Human-ComputerStudies, vol. 62, no. 3, pp. 307-327, 2005.

[30] D.M. Popovici and A.-M. Marhan, “Virtual Reality-Based Envir-onments for Learning and Training,” Product Engineering: Toolsand Methods Based on Virtual Reality, D. Talaba and A. Amditis, ed.,pp. 123-142, Springer, 2008.

[31] J.-W.J. Lin, “User Experience Modeling and Enhancement forVirtual Environment that Employ Wide-Field Displays,” DigitalHuman Modeling, V.G. Duffy, ed., pp. 423-433, Springer, 2007.

[32] Y.L. Theng et al., “Mixed Reality System for Learning: A PilotStudy Understanding User Perceptions and Acceptance,” Proc.Second Int’l Conf. Virtual Reality, 2007.

[33] J.L. Gabbard, D. Hix, and J.E. Swan, “User-Centered Design andEvaluation of Virtual Environments,” IEEE Computer Graphics andApplications, vol. 19, no. 6, pp. 51-59, Nov./Dec. 1999.

[34] D.A. Bowman et al., 3D User Interfaces: Theory and Practice.Addison-Wesley/Pearson Education, 2004.

[35] A. Seffah et al., “Usability Measurement and Metrics: AConsolidated Model,” Software Quality J., vol. 14, no. 2, pp. 159-178, 2006.

[36] K. Kraiger, J.K. Ford, and E. Salas, “Application of Cognitive,Skill-Based, and Affective Theories of Learning Outcomes to NewMethods of Training Evaluation,” J. Applied Psychology, vol. 78,no. 2, pp. 311-328, 1993.

[37] M.C. Salzman et al., “A Model for Understanding How VirtualReality Aids Complex Conceptual Learning,” Presence: Teleopera-tors and Virtual Environments, vol. 8, no. 3, pp. 293-316, 1999.

56 IEEE TRANSACTIONS ON HAPTICS, VOL. 6, NO. 1, JANUARY-MARCH 2013

Page 12: Human Performance Measures for Interactive Haptic-Audio-Visual Interfaces

[38] M. Good et al., “User-Derived Impact Analysis as a Tool forUsability Engineering,” Proc. SIGCHI Conf. Human Factors inComputing Systems, 1986.

[39] G. Lindgaard, Usability Testing and System Evaluation. Chapman &Hall, 1994.

[40] B. Riecke et al., “Cognitive Factors Can Influence Self-MotionPerception (Vection) in Virtual Reality,” ACM Trans. AppliedPerception, vol. 3, no. 3, pp. 194-216, 2006.

[41] E. Hudicka, “To Feel or Not to Feel: The Role of Affect in Human-Computer Interaction,” Int’l J. Human-Computer Studies, vol. 59,pp. 1-32, 2003.

[42] C. Coelho et al., “Media Presence and Inner Presence: The Sense ofPresence in Virtual Reality Technologies,” From Communication toPresence: Cognition, Emotions and Culture Towards the UltimateCommunicative Experience, G. Riva, et al., ed., IOS Press, 2006.

[43] J.-W.J. Lin, Enhancement of User-Experiences in Immersive VirtualEnvironments that Employ Wide-Field Displays, p. 207. Univ. ofWashington, 2004.

[44] D.A. Bowman, J.L. Gabbard, and D. Hix, “A Survey of UsabilityEvaluation in Virtual Environments: Classification and Compar-ison of Methods,” Presence: Teleoprators & Virtual Environments,vol. 11, no. 4, pp. 404-424, 2002.

[45] M.M. North et al., “Virtual Reality and Transfer of Learning,”Usability Evaluation and Interface Design: Cognitive Engineering,Intelligent Agents and Virtual Reality, J.S. Michael, et al., ed.,pp. 634-638, Lawrence Erlbaum Assoc., 2001.

[46] S.E. Mead and A.D. Fisk, “Measuring Skill Acquisition andRetention with an ATM Simulator: The Need for Age-SpecificTraining,” Human Factors, vol. 40, pp. 516-523, 1998.

[47] L. Liu, B. Watson, and M. Miyazaki, “VR for the Elderly:Quantitative and Qualitative Differences in Performance with aDriving Simulator,” Cyberpschology & Behavior, vol. 2, no. 6,pp. 567-576, 1999.

[48] X.D. Yang, W.F. Bischof, and P. Boulanger, “Validating thePerofrmance of Haptic Motor Skill Training,” Proc. Symp. HapticInterfaces for Virtual Environments and Teleoprator Systems, pp. 129-135, 2008.

[49] D. Jia, A. Bhatti, and N. Saeid, “The Impact of Self-Efficacy andPerceived System Efficacy on Effectiveness of Virtual TrainingSystems,” J. Behaviour and Information Technology, http://www.tandfonline.com/doi/full/10.1080/0144929X.2012.681067, 2012.

[50] D. Jia et al., “User-Centred Evaluation of a Virtual EnvironmentTraining System: Utility of User Perception Measures,” Proc. ThirdInt’l Conf. Virtual and Mixed Reality, R. Shumaker, ed., pp. 196-205,2009.

[51] D. Jia, A. Bhatti, and S. Nahavandi, “MUSTe Method forQuantifying Virtual Environment Training System Efficacy,” Proc.Ann. Conf. Australian Computer-Human Interaction Special InterestGroup (CHISIG) of the Human Factors and Ergonomics Soc. ofAustralia (HFESA) (OZCHI ’09), 2009.

[52] D. Jia, A. Bhatti, and S. Nahavandi, “Design and Evaluation of aHaptically Enable Virtual Environment for Object AssemblyTraining,” Proc. IEEE Int’l Workshop Haptic Audio-Visual Environ-ments and Games (HAVE ’09), 2009.

[53] J. Vince, Virtual Reality Systems. Pearson Education, 1995.[54] V.W. Caracelli and J.C. Greene, “Data Analysis Strategies for

Mixed-Method Evaluation Designs,” Educational Evaluation andPolicy Analysis, vol. 15, no. 2, pp. 195-207, 1993.

[55] A.M. Huberman and M.B. Miles, “Data Management andAnalysis Methods,” Handbook of Qualitative Research, N.K. Denzinand Y.S. Lincoln, ed., pp. 428-444, Sage, 1994.

[56] M. Sandelowski, C.L. Voils, and G. Knafl, “On Quantitizing Journalof Mixed Methods Research,” vol. 3, no. 3, pp. 208-222, 2009.

[57] K. Kelkar et al., “The Added Usefulness of Process Measures overPerformance Measures in Interface Design,” Int’l J. Human-Computer Interaction, vol. 18, no. 1, pp. 1-18, 2005.

[58] R.M. Furr and V.R. Bacharach, Psychometrics: An Introduction. SagePublications, 2008.

[59] J. Pallant, “Development and Validation of a Scale to MeasruePerceived Control of Internal States,” J. Personality Assessment,vol. 75, no. 2, pp. 308-37, 2000.

[60] R.F. DeVellis, Scale Development: Theory and Applications,second ed. Sage Publications, 2003.

[61] S. Okamoto, M. Konyo, and S. Tadokoro, “Vibrotactile StimuliApplied to Finger Pads as Biases for Perceived Inertial andViscous Loads,” IEEE Trans. Haptics, vol. 4, no. 4, pp. 307-315,2011.

[62] G. Borghesan, A. Macchelli, and C. Melchiorri, “Interconnectionand Simulation Issues in Haptics,” IEEE Trans. Haptics, vol. 3,no. 4, pp. 266-279, Oct.-Dec. 2010.

[63] E.L.M. Su et al., “Effect of Grip Force and Training in UnstableDynamics on Micromanipulation Accuracy,” IEEE Trans. Haptics,vol. 4, no. 3, pp. 167-174, May/June 2011.

[64] G.D. Lecakes et al., “Virtual Reality Environments for Inte-grated Systems Health Management of Rocket Engine Tests,”IEEE Trans. Instrumentation and Measurement, vol. 58, no. 9,pp. 3050-3057, Sept. 2009.

[65] A Guide to Usability: Human Factors in Computing, J. Preece, ed.Addison Wesley, 1993.

[66] C.P. Nemeth, Human Factors Method for Design: Making SystemsHuman-Centered, B. Raton, ed. CRC Press, 2004.

[67] A. Crystal and B. Ellington, “Task Analysis and Human-Computer Interaction: Approaches, Techniques, and Levels ofAnalysis,” Proc. 10th Am. Conf. Information Systems, 2004.

[68] D. Hix et al., “User-Centered Design and Evalaution of a Real-Time Battlefield Visualization Virtual Environment,” Proc. IEEEVirtual Reality ’99, 1999.

Dawei Jia received the BS degree in comput-ing, honors in software engineering, and thePhD degree in information systems from DeakinUniversity, Australia, in 2005, 2006, and 2011,respectively. She was with the Centre forIntelligent Systems Research as a human-computer interaction and user experience re-searcher. Her research interests are in theareas of human-computer interaction, hapticuser interface, and usability engineering. She

is a member of the IEEE.

Asim Bhatti received BS and MS degrees fromEastern Mediterranean University, Turkey, andthe PhD degree in computer science fromDeakin University, Australia, in 1999, 2001,and 2005, respectively. He currently acts as aresearch group leader for the fields of haptics,HMI, and micro nanomanipulation at the Centrefor Intelligent Systems Research, Deakin Uni-versity. He is a member of the IEEE.

Saied Nahavandi received the BSc (honors),MSc, and PhD degrees in automation andcontrol from Durham University, Durham, UnitedKingdom. He is the Alfred Deakin professor,chair of Engineering, and the director for theCenter for Intelligent Systems Research, DeakinUniversity, Geelong, VIC, Australia. He is asenior member of the IEEE.

Ben Horan received the BEng degree with first-class honors and PhD degree from DeakinUniversity in 2005 and 2009, respectively. Since2008, he has been a lecturer in electronics andmechatronics at the School of Engineering,Deakin University, Australia. His current re-search interests include haptic human roboticinteraction, haptic device design, and mobilerobotics. He is a member of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

JIA ET AL.: HUMAN PERFORMANCE MEASURES FOR INTERACTIVE HAPTIC-AUDIO-VISUAL INTERFACES 57