human interface guidelines - intel® software · intel® perceptual computing sdk human interface...

Intel® Perceptual Computing SDK Human Interface Guidelines

Revision 3.0

February 25, 2013

Legal Disclaimer

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS

DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL

ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO

SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A

PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER

INTELLECTUAL PROPERTY RIGHT.

UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR

ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL

INJURY OR DEATH MAY OCCUR.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not

rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel

reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities

arising from future changes to them. The information here is subject to change without notice. Do not finalize a

design with this information.

The products described in this document may contain design defects or errors known as errata which may cause

the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your

product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature,

may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

Software and workloads used in performance tests may have been optimized for performance only on Intel

microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer

systems, components, software, operations, and functions. Any change to any of those factors may cause the

results to vary. You should consult other information and performance tests to assist you in fully evaluating your

contemplated purchases, including the performance of that product when combined with other products.

Any software source code reprinted in this document is furnished under a software license and may only be used

or copied in accordance with the terms of that license.

Intel, the Intel logo, and Ultrabook, are trademarks of Intel Corporation in the US and/or other countries.

Copyright © 2012-2013 Intel Corporation. All rights reserved.

*Other names and brands may be claimed as the property of others.

http://www.intel.com/design/literature.htm

Optimization Notice

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Table of Contents

Introduction .............................................................................................................................................. 1

Welcome ............................................................................................................................................... 1

About the Camera ................................................................................................................................. 2

High-Level Design Principles ..................................................................................................................... 5

Input Modalities .................................................................................................................................... 5

Design Philosophy ................................................................................................................................. 6

Multimodality ........................................................................................................................................ 7

Gesture Design Guidelines ........................................................................................................................ 8

Capture Volumes ................................................................................................................................... 8

Occlusion ............................................................................................................................................... 8

High-Level Mid-Air Gesture Recommendations ................................................................................... 9

Recognized Poses ................................................................................................................................ 11

Universal Gesture Primitives ............................................................................................................... 12

Other Considerations .......................................................................................................................... 15

Samples and API .................................................................................................................................. 17

Voice Design Guidelines .......................................................................................................................... 19

High-Level Voice Design Recommendations ....................................................................................... 19

Voice Recognition ............................................................................................................................... 19

Speech Synthesis ................................................................................................................................. 21

Samples and API .................................................................................................................................. 21

Face Tracking Design Guidelines ............................................................................................................. 22

High-Level Recommendations ............................................................................................................ 22

Samples and API .................................................................................................................................. 22

Visual Feedback Guidelines..................................................................................................................... 23

High-Level Recommendations ............................................................................................................ 23

Representing the User ........................................................................................................................ 23

Representing Objects .......................................................................................................................... 25

2D vs. 3D ............................................................................................................................................. 26

Traditional UI Elements ....................................................................................................................... 26

Integrating the Keyboard, Mouse, and Touch .................................................................................... 27

Questions or Suggestions? ...................................................................................................................... 28

1

Introduction

Welcome

Participate in the revolution of Perceptual Computing! Imagine new ways of navigating your world with

more senses and sensors integrated into the computing platform of the future. Give your users a new,

natural, engaging way to experience applications, and have fun while doing it. At Intel we are excited to

provide the tools as the foundation for this journey with the Intel® Perceptual Computing SDK—and look

forward to seeing what you come up with. Over the new few months, you will be able to incorporate

new capabilities into your applications including close-range hand gestures, finger articulation, speech

recognition, face tracking, and augmented reality experiences to fundamentally change how people

interact with their PCs.

Perceptual Computing is about bringing exciting user experiences

through new human-computing interactions where devices sense and

perceive the user’s actions in a natural, immersive, and intuitive way.

This document is intended to help you create innovative, enjoyable, functional, consistent, and powerful

user interfaces for the Perceptual Computing applications of the future. In particular, it will help you:

Develop compelling user experiences appropriate for the platform.

Design intuitive and approachable interactions.

Make proper use of different input modalities.

Remember, Perceptual Computing is a new field, and the technology gets better literally every week.

Don’t just design for today; as a designer and developer you will need to be creatively agile in designing

for extensibility, modularity, and scalability for tomorrow’s capabilities. We’ll share new updates with

you as they become available!

2

About the Camera

Intel has announced the release of a peripheral device for use in Perceptual Computing applications- the

CREATIVE* Interactive Gesture Camera. This is the first, but not necessarily only, technology platform

from Intel that will be able to sense gesture, voice, and other input modalities. The guidelines in this

document apply to this device, but also apply, in a broader sense, to other potential technology

platforms.

The following are some of the critical specifications the CREATIVE Interactive Gesture Camera:

Size: 4.27 in × 2.03 in × 2.11 in (10.8cm × 5.2cm × 5.4cm)

Weight: 9.56 oz (271 grams)

Power: Single USB 2.0 (<2.5W)

RGB Camera

Native Resolution: 720p (1280×720 pixels) Frame Rate: 30fps FOV: 73 degrees diagonal Range: 0-23 feet (0m-7.01m) RGB + Depth frame sync

IR Depth Sensor (3D depth mapping)

Native Resolution: QVGA (320×240 pixels)

Frame Rate: 30fps

FOV: 73 degrees diagonal

Range: 6 inches to 3.25 feet (15 cm to 100 cm)

Ranging Technology: Time-of-flight

Audio

Dual-array microphones

Recommended System Configuration

PC with 2nd or 3rd generation Intel® Core™ processor Windows* 7 with Service Pack 1 or higher / Windows 8 Desktop UI 4GB system memory USB 2.0 port

Dual-array

Microphones

HD 720p Image sensor

Power LED

Indicator

Multi-attach

Base

3D Depth Sensor

3

Physical Device Configuration

You’ll want your app to work on a variety of platforms. Users might be running your application on a

notebook, Ultrabook™ device, All-in-one, convertible, tablet, or traditional PC and monitor. These

different platforms present different ergonomic limitations. Keep in mind the following variables:

Screen size

Smaller laptops and Ultrabook systems commonly have 13-inch screens and, occasionally, have even

smaller screens. Desktops may have 24-inch screens or larger. This presents a design challenge for

generating application UI and related artwork and for designing interactions. You must be flexible in

supporting different display sizes.

Screen distance

Users are normally closer to laptop screens than desktop ones because laptop screens and keyboards

are attached. Likewise, a laptop screen is often lower than a desktop one, relative to the user’s face and

hands.

When using a laptop, a user’s hands tend to be very close to the screen. The screen is usually

lower, relative to the user’s head.

When using a desktop, a user’s hands are farther away from the screen. The screen is also higher,

relative to the user’s head.

Camera configuration

The Perceptual Computing camera is designed to be mounted on top of the monitor. Design your

application assuming that this is the location of the camera. The camera is typically pointed at the user

such that the user’s head and the upper portion of the user’s torso are in view. This supports common

use cases such as video-conferencing. The camera will be placed at different heights on different

platforms. For a large desk mounted display, the camera height could be even with the top of the user’s

head, oriented to look down at the user. For an Ultrabook device on the user’s lap, the camera could be

much lower, angled up at the user. Your application should support these different camera

configurations.

4

Proper camera mounting on a stand-alone monitor. Proper camera mounting on a laptop.

You should be flexible in supporting different screen sizes and camera configurations since this will

impact the user’s interaction space.

5

High-Level Design Principles

To design a successful app for the Perceptual Computing platform, you must understand its strengths.

The killer apps for Perceptual Computing will not be the ones that we have seen on traditional

platforms, or even more recent platforms such as phones and tablets.

Input Modalities

What sets the Perceptual Computing platform apart from traditional platforms are the new and

different input modalities. You’ll want to understand the strengths of these modalities, and incorporate

them into your app appropriately. It can be especially powerful to combine multiple modalities. For

example, users can often coordinate simultaneous physical and voice input operations, making

interaction richer and less taxing.

Mid-air hand gestures. Allows for very rich and engaging interaction with 2D or

3D objects. Allows easier, more literal direct manipulation. However, mid-air

gesture can be tiring over long periods, and precision is limited.

Touch. Also very concrete and easy to understand, with the additional benefit

of having tactile feedback to touch events. However, touch is limited to 2D

interaction. It is not as flexible as mid-air gesture.

Voice. Human language is a powerful and compelling means of expression.

Voice is also useful when a user is not within range of a computer’s other

sensors. Environmental noise and social appropriateness should be

considered.

Mouse. The best modality for the accurate indication of a 2D point. Large-scale

screen movements can be made with small mouse movements.

Keyboard. Currently the best and most common modality for consistent and accurate text input. Useful for easy and reliable shortcuts.

6

Design Philosophy

Designing and implementing applications for the Perceptual Computing platform requires a very

different mindset than designing for traditional platforms, such as Windows* or Mac* OS X, or even

newer platforms like iOS* or Android*. When designing your app, you’ll want it to be:

Reality-inspired, but not a clone of reality. You should draw inspiration from the real-world.

Perceptual Computing builds off of our natural skills used in every-day life. Every day we use our

hands to pick up and manipulate objects and our voices to communicate. Leverage these natural

human capabilities. However, do not slavishly imitate reality. In a virtual environment, we can

relax the rules of the physical world to make interaction easier. For example, it is very difficult

for a user to precisely wrap their virtual fingers around a virtual object in order to pick it up.

With the Intel® Perceptual Computing SDK, it may be easier for a user to perform a grasp action

within a short proximity of a virtual object in order to pick it up.

Literal, not abstract. Visual cues and interaction styles built from real-world equivalents are

easier to understand than abstract symbolic alternatives. Also, symbolism can vary by geography

and culture, and doesn’t necessarily translate. Literal design metaphors, such as switches and

knobs, are culturally universal.

Intuitive. Your application should be approachable and immediately usable. Visual cues should

be built in to guide the user. Voice input commands should be based around natural language

usage, and your app should be flexible and tolerant in interpreting input.

Consistent. Similar operations in different parts of your application should be performed in

similar ways. Where guidelines for interaction exist, as described in this document, you should

follow them. Consistency across applications in the Perceptual Computing ecosystem builds

understanding and trust in the user.

Extensible. Keep future SDK enhancements in mind. Unlike mouse interfaces, the power,

robustness, and flexibility of Perceptual Computing platforms will improve over time. How will

your app function in the future when sensing of hand poses improves dramatically? How about

when understanding natural language improves? Design your app such that it can be improved

as technology improves and new senses are integrated together.

Reliable. It only takes a small number of false positives to discourage a user from your

application. Focus on simplicity where possible to minimize errors.

Intelligently manage persistence. For example, if a user’s hand goes out of the field of view of

the camera, make sure that your application doesn’t crash or do something completely

unexpected. Intelligently handle such types of situations and provide feedback.

Designed to strengths. Mid-air gesture input is very different from mouse input or touch input.

Each modality has its strengths and weaknesses—use each when appropriate.

Contextually appropriate. Are you designing a game? A medical application? A corporate

content-sharing application? Make sure that the interactions you provide match the context. For

example, you expect to have more fun interactions in a game, but may want more

7

straightforward interactions in a more serious context. Pay attention to modalities (e.g., don’t

rely on voice in a noisy environment).

Take user-centered design seriously. Even the best designs need to be tested by the intended users.

Don’t do this right before you plan to launch your application or product. Unexpected issues will come

up and require you to redesign your application. Make sure you know who your audience is before

choosing the users you work with.

Multimodality

As we add more to our SDK, you will have additional sensors and inputs to play with. Make sure to

design smartly—don’t use all types of input just for the sake of it, but also make sure to take advantage

of combining different input modalities both synchronously and asynchronously. This will make it a

more exciting and natural experience for the user, and can minimize fatigue of the hands, fingers, or

voice. Having a few different modalities working in unison can also inspire confidence in the user that

they are conveying the proper information. For example, use your hand to swipe through images, and

use your voice to email the ones you like to a friend. Design in such a way that extending to different

modalities and combinations of modalities is easy. Make sure that it is comfortable for the user to

switch between modalities both mentally and physically. Also keep in mind that some of your users may

prefer certain modalities over others, or have differing abilities.

8

Gesture Design Guidelines

In this section we describe best practices for designing and implementing mid-air hand input (gesture)

interactions.

Capture Volumes

It is important to be aware of the sensing capabilities of your platform when designing and

implementing your application. A camera has a certain field-of-view, or capture volume, beyond which it

can’t see anything. Furthermore, most depth sensing cameras have minimum and maximum sensing

distances. The camera cannot sense objects closer than the minimum distance or farther than the

maximum distance.

The capture volume of the

camera is visualized as a frustum defined by near and far planes

and a field-of-view.

The user is performing a hand gesture that is captured in the

camera’s capture volume.

The user is performing a hand gesture outside of the capture

volume. The camera will not see this gesture.

Capture volume constraints limit the practical range of motion of the user and the general interaction

space. Especially in games, enthusiastic users can inadvertently move outside of the capture volume.

Feedback and interaction must take these situations into account.

When performing gestures, it is expected that the user leans back in the chair in a relaxed position. The

user’s hands move around a virtual plane roughly 12 inches away from the camera. This virtual plane

serves multiple purposes: (a) it activates hand tracking when the user’s hand is within 12 inches from

the camera; (b) the swipe gestures use the plane to distinguish between a left swipe and a right swipe.

It is also recommended that the user’s head always be eight inches away from the user’s hands. The

hand-tracking software cannot reliably distinguish a hand from a head if they are too close to each

other.

Occlusion

For applications involving mid-air gestures, keep in mind the problem of a user’s hands occluding the user’s view of the screen. It is awkward if users raise their hand to grab an object on the screen, but can’t see the object because their hand caused the object to be hidden. When mapping the hand to the screen coordinates, map them in such a way that the hand is not in the line of sight of the screen object to be manipulated.

9

High-Level Mid-Air Gesture Recommendations

For many Perceptual Computing applications, mid-air gestures will be the primary input modality.

Consider the following points when designing your interaction and when considering gesture choices:

Where possible make use of our universal gesture primitives. Introduce your own gesture

primitives only when there is a compelling reason to do so. A small set of general-purpose

natural gestures is preferable to a larger set of specialized gestures. As more apps come out,

users will come to expect certain primitives, which will improve the perceived intuitiveness.

Stay away from abstract gestures that require users to memorize a sequence or a pose.

Abstract gestures are gestures that do not have a real-life equivalent and don’t fit any existing

mental models. An example of a confusing pose is “thumbs down” to delete something. An

example of a better delete gesture is to place or throw an item in a trash can.

Poses vs. gestures. Be aware of the different types of gestures. Poses are sustained postures,

ones like clenching a fist to select an item, and dynamic gestures are those like swiping to turn a

page. Figure out which make more sense for different interactions, and be clear in

communicating which is needed at any given point.

Innate vs. learned gestures. Some gestures will be natural to the user (e.g., grabbing an object

on the screen), while some will have to be learned (e.g., waving to escape a mode). Make sure

you keep the number of gestures small for a low cognitive load on the user.

Be aware of which gestures should be actionable. What will you do if the user fixes her hair,

drinks some coffee, or turns to talk to a friend? Make sure to make your gestures specific

enough to be safe in these situations and not mess up the experience.

Relative vs. absolute motion. Relative motion allows users to reset their current hand

representation on the screen to a location more comfortable for their hand (e.g., as one would

lift a mouse and reposition it so that it is still on a mouse pad). Absolute motion preserves

spatial relationships. Applications should use the motion model that makes the most sense for

the particular context.

Design your gestures to be ergonomically comfortable. If the user gets tired or uncomfortable,

they will likely stop using your application.

Gesturing left-and-right is easier than up-and-down. Whenever presented with a choice, design

for movement in the left-right directions for ease and ergonomic considerations.

Two hands when appropriate. Some tasks, like zooming, are best performed with two hands.

Support bi-manual interaction where appropriate.

Handedness. Be aware of supporting both right- and left-handed gestures.

Flexible thresholds. Make sure your code can accommodate hands of varying sizes and amounts

of motor control. Some people may have issues with the standard settings and the application

will need to work with them. For example, to accommodate an older person with hand jitter,

the jitter threshold should be customizable. Another example is accommodating a young child

or an excitable person who makes much larger gestures than you might expect.

10

Teach the gestures to the users. Provide users with a tutorial for your application, or show

obvious feedback that guides them when first using the application. You could have an option to

turn this training off after a certain amount of time or number of uses.

Give an escape plan. Make it easy for the user to back out of a gesture or a mode, or reset.

Consider providing the equivalent of a traditional “home button.”

Be aware of your gesture engagement models. You may choose to design a gesture such that

the system only looks for it once the user has done something to engage the system first (e.g.,

spoken a command, made a thumbs up pose).

Design for the right space. Be aware of designing for a larger world space (e.g. with larger

gestures, more arm movement) versus a smaller more constrained space (e.g. manipulating a

single object). Distinguish between environmental and object interaction.

11

Recognized Poses

A pose and a gesture are two distinct things. A pose is a sustained posture, while a gesture is a

movement between poses. Here are the poses that we currently recognize as part of the SDK.

Openness

Using our SDK, you are able to discern between an open hand and a closed hand, by looking at the

LABEL_OPEN and LABEL_CLOSE attributes respectively.

Thumbs Up and Thumbs Down

“Thumbs up” and “thumbs down” poses can be recognized by looking at the LABEL_POSE_THUMB_UP

and LABEL_POSE_THUMB_DOWN attributes, respectively. These could be used, for example, to confirm

or cancel a verbal command.

Peace

The peace sign pose can be recognized by looking at the LABEL_POSE_PEACE attribute. This could be

used as a trigger command, for example.

Big5

The Big 5 pose can be recognized by looking at the LABEL_POSE_BIG5 attribute. Depending on the

context of the application, this pose could be used to stop some sort of action (or to turn off voice

commands, for example), or to initiate a gesture.

12

Universal Gesture Primitives

We have defined some gestures that are reserved for pre-defined actions. In general, these gestures

should be used only for these actions. Conversely, when these actions exist in your application, they

should generally be performed using the given gestures. Providing feedback for these gestures is critical,

and is discussed in the Visual Feedback Guidelines section. We don’t require that you conform to these

guidelines, but if you depart from these guidelines you should have a compelling user experience reason

to do so. This set of universal gestures will become learned by users as standard and will become more

expansive over time.

Partial support for these gestures exists in the SDK. Some gestures are supported in their entirety, some

are supported in a limited number of poses, and some are not yet supported. We plan to provide more

complete support as the SDK matures.

Grab and Release

The gesture for grabbing an on-screen object is shown below. The user should start with a unique pose

in order to start the sequence. The user should then have her fingers and thumb apart, and then bring

them together into the grab pose. The reverse action, moving the fingers and thumb apart, releases the

object. Limited grab and release functionality can be achieved through the “openness” parameter (the

value from 0 to 100 to indicate the level of palm openness) and fingertips (e.g. LABEL_FINGER_THUMB,

LABEL_FINGER_INDEX) exposed by the SDK. For more reliable detection, you can also detect the top,

middle, and bottom of the hand (e.g. LABEL_HAND_MIDDLE).

A user grabs an object.

13

Move

After grabbing an object, the user moves her hand to move the object. Some of the general guidelines

for the design of basic grabbable objects are:

It should be obvious to the user which objects can be moved and which cannot be moved.

If the interface relies heavily on grabbing and moving, it should be obvious to the user where a

grabbed object can be dropped. It may be useful to provide snappable behavior.

Objects should be large enough to account for slight hand jitter.

Objects should be far enough apart so users won’t inadvertently grab the wrong object.

If the hand becomes untracked while the user is moving an object the moved object should reset to its

origin, and the tracking failure should be communicated to the user. This functionality can be realized

through hand tracking with an openness value indicating a closed hand.

A user moves an object.

Pan

If the application supports panning, this should be done using a flat hand. Panning engages once the hand is made mostly flat. Translation of the flat hand pans the view. Once the hand relaxes into a natural slightly curled pose, which can be determined by the hand openness parameter, panning ends. Note that if one panning is not good enough, the hand will have to move back and pan again.

A user pans the view.

14

Zoom

If the application supports zooming, this should be done using two flat hands. Zooming engages once

both hands become mostly flat. Zooming is then coupled to the distance between the two hands (similar

to pinch-zooming for touch). Zoom functionality requires an action to disengage the zooming otherwise

the user cannot escape without changing the zoom.

Resizing an object is very similar. Instead of keeping the 2 hands open, one hand will grab one side of an

object, while the second hand grabs the other side of the object. Then the user moves the hands relative

to one another, either closer together to shrink the object, or farther apart to grow the object. Once the

user releases one hand, the resize operation ends.

Wave

The gesture for resetting, escaping a mode, or moving up a hierarchy is shown below. The user quickly

waves her hand back and forth. This is a general purpose “get-me-out-of-here” gesture. You can find this

in the SDK under LABEL_HAND_WAVE.

A user waves to reset a mode.

A user zooms the view.

15

Circle

The circle gesture, LABEL_HAND_CIRCLE, is recognized when the user extends all fingers and moves the

hand in a circle. This could be used for selection or resetting, for example.

Swipe

Swipes are basic navigation gestures. However, it is technically challenging to recognize swipes accurately. There are many cases a left swipe is exactly like a right swipe from the camera’s view point, if one is performing multiple swipes. This also applies to up and down swipes, respectively. You can find Swipe in the SDK under LABEL_NAV_SWIPE_LEFT, LABEL_NAV_SWIPE_RIGHT, LABEL_NAV_SWIPE_UP ,and LABEL_NAV_SWIPE_DOWN.

To avoid confusion, the user should perform the swipe gestures as follows:

Imagine there is a virtual plane about 12 inches away from the camera. The swipes must first go into the plane,

travel inside the plane from left to right or right to left, and then go out of the plane.

Other Considerations

Hand Agnosticism

All one-handed gestures can be performed with either the right or left hand. For two-handed gestures where the sequence of operations matters (e.g., grabbing an object with both hands for the resize gesture), the hand choice for starting the operation does not matter.

A user circles with her hand to move to the next level of a game.

16

Finger Count Independence

For many gestures, the number of fingers extended does not matter. For example, the pan operation can be performed with all fingers extended, or only a few. Restrictions in finger count only exist where necessary to avoid conflict. For example, having the index finger extended could be reserved for pointing at a 2D location, in which case it can’t also be used for panning.

Flexibility in Interpretation of Pose

Hands can be in poses similar to, but slightly different, from the poses described. For example, accurate panning can be accomplished with the fingers pressed together or fanned apart.

Rate Controlled or Absolute Controlled Rotation and Translation

You can use an absolute-controlled model or a rate-controlled model to control gesture-adjusted parameters such as rotation, translation (of object or view), and zoom level. In an absolute model, the magnitude to which the hand is rotated or translated in the gesture is translated directly into the parameter being adjusted, i.e., rotation or translation. For example, a 90-degree rotation by the input hand results in a 90-degree rotation in the virtual object. In a rate-controlled model, the magnitude of rotation/translation is translated into the rate of change of the parameter, i.e., rotational velocity or linear velocity. For example, a 90-degree rotation could be translated into a rate of change of 10 degrees/second (or some other constant rate). With a rate-controlled model, users release the object or return their hands to the starting state to stop the change.

17

How to Minimize Fatigue

Gestural input is naturally fatiguing as it relies on several large muscles to sustain the whole arm in the

air. It is a serious problem and should not be disregarded; otherwise, users may quickly abandon the

application. By carefully balancing the following guidelines, you can alleviate the issue of fatigue as

much as possible:

Allow users to interact with elbows rested on a surface. Perhaps the best way to alleviate arm

fatigue is by resting elbows on a chair’s arm rest. Support this kind of input when possible. This,

however, reduces the usable range of motion of the hand to an arc in the left and right

direction. Evaluate whether interaction can be designed around this type of motion.

Make gestures short-lasting. Long-lasting gestures, especially ones where the arms must be

held in a static pose, quickly induce fatigue in the user’s arm and shoulder (e.g., holding the arm

up for several seconds to make a selection).

Design for breaks. Users naturally, and often subconsciously, take quick breaks (e.g., professors

writing on the blackboard). Short, frequent breaks are better than long, infrequent ones.

Do not require precise input. Users naturally tense up their muscles when trying to perform

very precise actions (much like trying to reduce camera shake when taking a picture in the dark).

This, in turn, accelerates fatigue. Allow for gross gestures and make your interactive objects

large.

Do not require many repeating gestures. If you require users to constantly move their hands in

a certain way for a long period of time (e.g., while moving through a very long list of items by

panning right), they will become tired and frustrated very quickly.

Samples and API

In the /doc folder of the SDK, you can find a file called sdksamples.pdf. This gives you examples that

show finger tracking, pose/gesture recognition, and event notification (gesture_viewer,

gesture_viewer_simple) in both C++ and C#. You can run the applications and view the source code in

the Intel/PCSDK/sample folder.

18

Also, in sdkmanual-gesture.pdf, you can find the most current version of the gesture module, which consumes RGB, depth, or IR streams as input and returns blob information, geometric node tracking results, pose/gesture notification, and alert notification. For an example of 2d pan, zoom, and rotate, see: http://github.com/IntelPerceptual/PerceptualP5/ tree/master/PanZoomRotate For an online tutorial on close-range hand/finger tracking, see: http://software.intel.com/en-

us/sites/default/files/article/328725/perc-gesturerecognition-tutorial-final.pdf

http://software.intel.com/en-us/sites/default/files/article/328725/perc-gesturerecognition-tutorial-final.pdf

http://software.intel.com/en-us/sites/default/files/article/328725/perc-gesturerecognition-tutorial-final.pdf

19

Voice Design Guidelines

In this section we describe best practices for designing and implementing voice

command and control, dictation, and text to speech for your applications. As of

now, English is the only supported language.

High-Level Voice Design Recommendations

Test your application in noisy background and different environmental spaces to ensure

robustness of sound input.

Watch out for false positives. For example, don’t let a specific sound delete a file without

verification, as this sound could unexpectedly crop up as background noise.

Always show listening status of the system. Is your application listening? Not listening?

Processing sound?

People do not speak the way they write. Be aware of pauses and interjections such as “um”

and “uh”.

Teach the user how to use your system as they use it. Give more help initially, then fade it

away as the user gets more comfortable (or have it as a customizable option).

Voice Recognition

Command Mode Vs. Dictation Mode

Be aware of the different listening modes your application will be in. Once listening, your application can be listening in command mode or dictation mode. Command mode is for issuing commands (e.g., “Start computer”, “Email photo”, “Volume up”). In Command mode, the SDK module recognizes only from a predefined list of context phrases that you have set. The developer can use multiple command lists, which we will call grammars. Good “command” application design would create multiple grammars and activate the one that is relevant to the current application state (this limits what the user can do at any given point in time based on the command grammar used). To invoke the command mode, provide a grammar.

Dictation mode is for general language from the user (e.g., entering in the text for a Facebook status update). Dictation mode has a predefined vocabulary. It is a large, generic vocabulary containing 50k+ common words (with some common named entities). Highly domain specific terms (e.g. medical terminology) may not be widely represented. Absence of a grammar will invoke the SDK in dictation mode. Dictation is limited to 30 seconds. Currently, grammar mode and dictation mode cannot be run at the same time.

20

Constructing Grammars

Keep the following points in mind when constructing your grammars:

Don’t assume that your command phrasing is natural! The language you use is very important.

Ask other people -- friends, family, people on forums, study participants – how they would want

to interact with your system or initiate certain events.

Provide many different options for your grammar to exert less effort on the user, and try to

make interaction more natural. For example, instead of constraining the user to say “Program

start”, you could also accept “Start program”, “Start my program”, “Begin program”, etc.

Complicated words/names are not easily recognized. Make your grammar include commonly

used words. However, very short words can be difficult to recognize because of sound ambiguity

with other words.

Be aware of the length of the phrases in your grammar. Longer phrases are easier to

distinguish between, but you also don’t want users to have to say long phrases too often.

Beware of easily confusable commands. For example, “Create playlist” and “Create a list” will

likely sound the same to your application. One would be used in a media player setting, and the

other could be in a word processor setting, but if they are all in one grammar the application

could have undesired responses.

Experiment with different lengths of “end of sentence detection.” Responsiveness is important

and the end of sentence parameter (endofSentence in PXCVoiceRecognition::ProfileInfo) can

help adjust the responsiveness of the application.

User Feedback

Let the user know what commands are possible. It is not obvious to the user what your

application’s current grammar is.

Let the user know how to initiate listening mode. Make your application’s listening status clear.

Let the user know that their commands have been understood. The user needs to know this to

trust the system, and know which part is broken if something doesn’t go the way they planned.

One easy way to do this is to relay back a command. For example, the user could say “Program

start”, and the system could respond by saying “Starting program, please wait”.

Give users the ability to make the system stop listening.

Give users the ability to edit/redo/change their dictation quickly. It might be easier for the

user to edit with the mouse, keyboard, or touchscreen at some point to edit their dictations.

If you give verbal feedback, make sure it is necessary, important, and concise! Don’t overuse

verbal feedback as it could get annoying to the user.

If sound is muted, provide visual components and feedback.

21

How To Minimize Fatigue

Remember, the user should not have to be speaking constantly. Speech is best used for times when

dictation is being used, or triggers are necessary to accomplish actions. Speech can be socially awkward

in public, and background noise can easily get in the way of successful voice recognition. Be aware of

speech’s best uses. Speech is best used as a shortcut for a multi-menu action (something that requires

more than a first-level menu and a single mouse click). To scroll down a menu, it would make more

sense to use a gesture rather than repeatedly have the user say “Down”, “Down”, “Down”.

Speech Synthesis

You can also generate speech using the built in Nuance speech synthesis that comes with our SDK.

Currently a female voice is used for TTS.

Make sure to use speech synthesis where it makes sense. Have an alternative for people who

cannot hear well, or if speakers are muted.

Listening to long synthesized speech will be tiresome. Synthesize and speak only short

sentences.

Samples and API

You can run and view the source code for the voice_recognition and voice_synthesis projects in

Intel/PCSDK/sample.

In sdksamples.pdf, you can check out the audio_recorder sample. You can also find more information in sdkmanual-core.pdf on audio abstraction with the PXCAudio, PXCAccelerator, and PXCCapture::AudioStream interfaces. An article on how to use the SDK for Voice Recognition can be found here: http://software.intel.com/en-us/sites/default/files/article/328725/voicerecognitionhowto.pdf

http://software.intel.com/en-us/sites/default/files/article/328725/voicerecognitionhowto.pdf

22

Face Tracking Design Guidelines

In a future release we will provide more guidelines

for designing interactions based on face tracking, face

detection, and face recognition. Stay tuned!

High-Level Recommendations

More expressions will be in the SDK in the future- smiles and winks can be currently detected.

Natural expressions in front of a computer will be difficult to detect, users should be prompted

to show exaggerated expressions.

Give feedback to the user to make sure they are in a typical working distance away from the

computer, for optimal feature detection.

Give feedback to the user about any orientation or lighting issues- provide error messages or

recommendations.

For optimal tracking, have ambient light or light facing the user’s face (avoid shadows).

Try to make the interface background as close to white as possible (the screen can serve as a

second light to ensure good reading).

Notify the user if they are moving too fast to properly track facial features.

Samples and API

Check out out the face_detection and landmark_detection samples (in Intel/PCSDK/sample and

discussed in sdksamples.pdf) to run the application and see the source code.

You can also find more information in sdkmanual-face.pdf. An article on how to use the Face Detection module can be found here: http://software.intel.com/en-us/articles/intel-perceptual-computing-sdk-how-to-use-the-face-detection-module

http://software.intel.com/en-us/articles/intel-perceptual-computing-sdk-how-to-use-the-face-detection-module

http://software.intel.com/en-us/articles/intel-perceptual-computing-sdk-how-to-use-the-face-detection-module

23

Visual Feedback Guidelines

You’ll want your Perceptual Computing application to appear and behave very differently from a

traditional desktop PC style application. Familiar concepts, such as cursors, clicking, icons, menus, and

folders, don’t necessarily apply to an environment in which gesture and voice are the primary

interaction modalities. In this section we provide design guidelines for developing your application to

visually conform to the Perceptual Computing interaction model.

High-Level Recommendations

Don’t have a delay between the user’s input (whether it’s gesture, voice, or anything else) and

the visual feedback on the display.

Smooth movements. Apply a filter to the user’s movements if necessary to prevent jarring

visual movements.

Combine different kinds of feedback. This can convince the user that the interactions are more

realistic. Stay tuned to the next version of this manual for more advice on how to deal with

audio feedback.

Show what is actionable. You don’t want the user trying to interact with something that they

can’t interact with.

Show the current state of the system. Is the current object selected? If so, what can you do to

show this visually? Ideas include using different colors, tilting the object, orienting the object

differently, and changing object size.

Show recognition of commands or interactions. This will let the user know they are on the right

or wrong track.

Show progress. For example, you could show an animation or a timer for short timespans.

Consider physics. Think about the physics that you want to use to convey a more realistic and

satisfying experience to the user. You could simulate magnetic snapping to an object to make

selection easier, for example. While the user is panning through a list, you could accelerate the

list movement and slow it down after the user has finished panning.

Representing the User

A user must be represented in the virtual world. The user embodiment allows the user to interact with

elements in the scene. In traditional environments this embodiment is a mouse cursor. In a Perceptual

Computing environment, the representation of the user should reflect the modalities used to interact

and the nature of the application in question. Typically, where hand gestures are used, a representation

of the hands should be shown on the screen. The hand representation depends on the application. In a

magic game, the user may be represented as a glowing wand held by a hand. In a 3D modeling

application, the user may be represented by an articulated hand model. You could have the cursor be a

static object, or also have the cursor change orientation, size, or color depending on the movement or

depth of the user’s hands.

24

The hand representation should be neither very realistic, nor very simplistic. A very realistic hand risks

the “uncanny valley” effect1, which would disturb users. A too simplistic hand will be inadequate to

communicate the complex state of the hand and risks being too close to a cursor.

If head location is relevant to interaction (e.g. you are using face-tracking) a representation of the head

may need to be incorporated. Similar rules hold for other modalities.

Hand representation consisting of an articulated

hand model. This would be appropriate for applications involving direct object manipulation.

Hand representation consisting of a magic wand. This would be appropriate for a magic game.

Hand representation when tracking has failed. The user is told that tracking has failed, so they know to

act to fix tracking.

1 The uncanny valley is a hypothesis in the field of robotics and 3D computer animation which holds that when

human replicas look and act almost, but not perfectly, like actual human beings, it causes a response of revulsion among human observers. The valley refers to the dip in the graph of the comfort level of humans as a function of a robot’s human likeness [Wikipedia 2013].

25

Sensor limitations can result in cases where the user is not being tracked. For example, a user may be

too far from the camera, or may have moved to the side and is out of the view of the camera. Users

often don’t understand when this has happened. Your application should tell users when tracking has

failed, why tracking has failed, and what they can do to correct the situation. This feedback can be

incorporated into the design of the user representation (e.g., showing the relation between the user and

the interaction bounding box/camera field of view visually). Other measures can be taken when tracking

fails. In a game, where lost tracking can result in the user losing the game, the action can be dramatically

slowed down until tracking is re-established.

In general, you should recognize the limitations of the sensors and insure that the experiences you are

trying to create are intelligent in working with the technology that you currently have. For example, it

would be a poor design to have a user interaction that, in real life, would require sensitive tracking that

is super-fast when your tracking only enables something slow. You may want to modify the interaction

and visual representations to work within the current abilities of the technology.

Representing Objects

The ideal representation of objects in the scene is influenced greatly by the method in which we interact

with them. In a Perceptual environment, we are able to interact much more richly with objects. We can

push, grab, twist, or stretch them. This is much more than can be done with a mouse. On the other

hand, a hand has much less precision than a mouse. The representation of objects should reflect these

realities. Objects in your application should:

Take advantage of the rich manipulation abilities of the human hand

Convey visibly the interactive possibilities, so users can understand what can be done

Be of a size that can be manipulated easily

Not demand a degree of precise manipulation that results in a large number of errors or a large

amount of fatigue

Gestural Actions on Objects

Some action states to consider while interacting with objects in your application may include:

Targeting

Hovering

Selecting

Dragging

Releasing

Resizing

Rotating

26

2D vs. 3D

A Perceptual Computing graphical application can be shown either within a 2D or 3D interactional

environment. 2D environments are easier to understand and navigate on a 2D display, so should be

used when there isn’t a compelling need for a 3D environment. When using gesture to interact with a

2D environment, however, consider using subtle 3D cues to enhance interaction. For example, a

grabbed object can be made slightly larger with a drop shadow, to indicate it has been lifted off the 2D

surface. Full 3D environments play to many of the strengths of a Perceptual environment, and should be

used when the use case demands it. Some applications, especially games, benefit from operating in 3D.

Traditional UI Elements

The interactive elements in a primarily gesture-driven interface are different from those in a primarily

mouse-driven environment. This section suggests some of the more traditional UI elements for use with

mid-air gesture. These can be useful for clarity and efficiency, and many users are familiar with these

models. Of course, it isn’t good to just rely on what people are accustomed to if there are better

solutions, but don’t discount some of the UI elements people are already using.

Horizontal Lists

Horizontal lists can be good because they rely on the more natural left-right motion with the right hand.

A welcomed improvement to linear lists is presenting choices on a slight arc, which allows the user to

make a choice while resting their elbow on a hard surface. Note, however, that this approach is

handedness-dependent. A left-handed user might not find it comfortable. Consider accommodating left-

handed users by optionally mirroring the interface.

Example of a horizontal list sweep.

27

Radial Lists

Radial lists (also known as pie menus) are useful, especially for gestural input, as they are less error-

prone since the distance the user has to traverse in order to reach any option is short, and a user

doesn’t have to aim precisely to select an option. Also they can take up less space than linear lists. When

constructing radial lists, maximize the selectable area for each option by making the whole “slice” of the

list selectable.

Sliders

Typically, sliders are used for adjusting values within a given range. You may want to use a slider for

absolute panning instead of using relative panning depending on your application. Follow these

guidelines:

Create discrete sliders as opposed to continuous ones. Gestural input lacks the fidelity required

to make fine selections without inducing fatigue.

Try to keep sufficient distance between “steps” to avoid demanding too much precision on the

part of the user.

The top slider has fewer steps, allowing the user to easily select the one they want using mid-air gesture.

The numerous steps on the lower slider make it much harder to select the desired value.

Integrating the Keyboard, Mouse, and Touch

Don’t ignore the mouse, keyboard, and touchpad or touchscreen. People are used to these form factors,

and each has their own specialized purpose. Often, it makes much more sense to type in information

using the keyboard, rather than using an onscreen keyboard (although in some situations, like when the

user only has to input a few letters, using gesture makes sense). Keys can still be used as failsafe

shortcuts or escapes. To find a very precise 2D location, the mouse and touchscreen can still be very

useful and efficient.

Example of a radial list with “paste” currently selected.

28

Questions or Suggestions?

This document provides guidelines that are rooted in many years of research in human-computer

interaction, user interface design, and multi-modal input. However, if you feel that certain guidelines do

not fit your use case or you have proposals for modifications or additions, please post to the forum

thread “Human Interaction Guidelines-Questions and Suggestions,” and we will be happy to discuss the

issues with you.

Other helpful information:

Our website:

http://intel.com/software/perceptual

For information and updates on the SDK, follow us on Twitter at:

@PerceptualSDK

All manuals mentioned in this document that were downloaded with the SDK are also available

online: http://software.intel.com/en-us/articles/intel-perceptual-computing-sdk-manual-page

Check out our tutorials!

http://software.intel.com/en-us/articles/intel-perceptual-computing-sdk-tutorials

Check out our github repository:

http://github.com/IntelPerceptual

We also have a social hub where you can find links to our videos and connect with us on

Facebook, Twitter, and Google+ :

http://about.me/IntelPerceptual

Frequently Asked Questions

http://software.intel.com/articles/perc-faq

And last but not least, participate in our Intel® Developer Zone Intel® Perceptual Computing SDK

forum to share information with fellow developers and ask questions.

http://software.intel.com/en-us/forums/intel-perceptual-computing-sdk



http://software.intel.com/en-us/articles/intel-perceptual-computing-sdk-manual-page

http://software.intel.com/en-us/articles/intel-perceptual-computing-sdk-tutorials

http://github.com/IntelPerceptual

http://about.me/IntelPerceptual

http://software.intel.com/articles/perc-faq

http://software.intel.com/en-us/forums/intel-perceptual-computing-sdk

human interface guidelines - intel® software · intel® perceptual computing sdk human interface...

Documents