human interaction library
DESCRIPTION
This report presents the design and the architecture of a java library that allows the final user to interact with applications through non conventional devices, like the Nintendo Wii® Controller (WiiMote), the microphone and the webcam. It has been developed as a java API, that can be easily integrated inside any kind of software that is suitable to be controlled by devices different from a keyboard or a mouse. In particular in this report we will show how it has been used insideWorldWind sdk, the open source NASA world exploring tool.TRANSCRIPT
UNITN & Graphitech
Human Interaction Library Final Report
Magliocchetti Daniele < 125162 >
Version 1.0
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 2
Revisions
Date Version Description Author
11/09/2008 0.1 Software Requirements Definition Magliocchetti Daniele
06/07/2008 0.2 First Draft Magliocchetti Daniele
08/07/2008 0.3 Manuals Magliocchetti Daniele
07/08/2008 0.4 Review Magliocchetti Daniele
03/09/2008 0.5 Review Magliocchetti Daniele
10/09/2008 1.0 Final Review Magliocchetti Daniele
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 3
Index 1. Introduction 4
1.1 Acronyms and Abbreviations 4 1.2 References 4 1.3 Report Overview 4
2. Global Capability Description 5 2.1 Finger Tracking Capability 5
2.1.1 Navigation Mode (up to 2 fingers) 5 2.1.2 Interaction / Edit Mode (3 or more fingers) 5
2.2 Head Tracking Capability (Optional) 6 2.3 Speech Recognition (Optional) 6
3. Requirements 6 3.1 Functional Requirements 6 3.2 User Use Case 7 3.3 Additional Requirements 7
4. Requirements Classification 7
5. System Architecture 8 5.1 General Description 8 5.2 Finger Tracking Engine 9 5.3 Head Tracking Engine 10 5.4 Speech Recognition Engine 10 5.5 WorldWind Integration 11
6. Developer Manual 11 6.1 System Requirements 11 6.2 System Configuration 11 6.3 WorldWind 15
7. User Manual 15 7.1 Gestures 16 7.2 Voice Commands 16 7.3 Head Tracking 16
8. Appendix 17 8.1 Know Bugs and limitations 17 8.2 Directory Structure 17 8.3 Adopted Tools 18
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 4
Software Requirements Specification 1. 0BIntroduction
This report presents the design and the architecture of a java library that allows the final user to interact with applications through non conventional devices, like the Nintendo Wii® Controller (WiiMote), the microphone and the webcam. It has been developed as a java API, that can be easily integrated inside any kind of software that is suitable to be controlled by devices different from a keyboard or a mouse. In particular in this report we will show how it has been used inside WorldWind sdk, the open source NASA world exploring tool.
1.1 8BAcronyms and Abbreviations WiiMote: The Nintendo Wii® Controller WWJ: WorldWind Java SDK OpenCV: The Intel® Open Source Computer Vision Library MoteJ: The WiiMote java api BlueCove: The java bluetooth stack Sphinx4j: The speech recognition library BT: Bluetooth IR: Infrared JNI: The Java Native Interface BNF: Backus‐Naur Form
1.2 9BReferences ‐ Hhttp://motej.sourceforge.netH : The WiiMote java api official site ‐ Hhttp://bluecove.sourceforge.net/ H : The java bluetooth stack official site ‐ Hhttp://www.wiili.org/index.php/Wiimote H : A WiiMote hardware description ‐ Hhttp://www.youtube.com/watch?v=0awjPUkBXOUH : “Tracking Your Fingers with the
Wiimote” by Johnny Chung Lee ‐ Hhttp://sourceforge.net/projects/opencv/ H : The OpenCV official site ‐ Hhttp://www.youtube.com/watch?v=Jd3‐eiid‐Uw H : “Head tracking with the Wiimote” by
Johnny Chung Lee ‐ Hhttp://cmusphinx.sourceforge.net/sphinx4/ H : The Sphinx official site: “A speech recognizer
written entirely in the JavaTM programming language”
1.3 10BReport Overview
In the following chapters we will go through all the aspects related to the implemented library. The reminder of the paper is organized as follows. Section 2 describes in details the problem and the functionalities that need to be implemented by the library, section 3 captures the functional and non functional requirements for the project while section 4 classifies their priority for the development. Finally, section 5 describes the architecture of the system while section 6 and 7 provide usage manuals for developers and users respectively.
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 5
2. 1BGlobal Capability Description This chapter describes in details what the final user should be able to do with the library, once it has been integrated inside a software like WorldWind.
2.1 11BFinger Tracking Capability The system should be able to recognize the finger movements of the user, through the WiiMote controller and a set of reflective markers located on both forefingers and thumbs of the user (the WiiMote can recognize up to 4 points with a resolution of 1024x768 points at 100 Hz of frequency). The system should be able to distinguish between 2 modes, with the following set of related operations:
2.1.1 31BNavigation Mode (up to 2 fingers) The navigation mode includes all the common navigation operations and it is done with at most 2 fingers: ‐ Pan (1 finger): this is the default operation; if the application detects just one finger, a pan
(drag) action over the map is assumed. ‐ Zoom (2 fingers): when the system detects two points moving in the opposite direction, a
zoom action is assumed. In particular, an increasing distance is equivalent to a zoom‐in and a decreasing distance is equivalent to a zoom‐out. Zoom‐in and zoom‐out will increase/decrease the speed of the camera, respectively.
‐ Rotation (2 fingers): when the system detects two points rotating one over the other, a steer action is assumed.
‐ Look up and look down (2 fingers): when the system detects two points at a constant distance, moving on the same direction, a camera move action is assumed. In particular, moving two fingers up will be interpreted as a look up and moving two fingers down as a look down.
2.1.2 32BInteraction / Edit Mode (3 or more fingers) The interaction mode includes all the common edit operations, like selection, drag, double click etc. ‐ Move cursor (3 or 1 finger): to use one of the fingers like a simple cursor in the application,
and to disable the default pan action of the navigation mode, the user can vocally switch to the edit mode, or simply maintain the forefinger and the thumb of one hand fixed while the forefinger of the other hand acts like a cursor. This mode is more immediate for occasional selections.
‐ Click (1 finger): if the finger appears, disappears and appears on the same location, the system should recognize it as a single click. To avoid pan reffer to the move cursor operation.
‐ Double click (1 finger): if the finger appears, disappears and appears for two times, the system should recognize it as a double click. To avoid pan reffer to the move cursor operation.
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 6
‐ Right click (1 finger): if one finger moves forward and backward returning on the same position, the system should recognize it as a right click. To avoid pan reffer to the move cursor operation.
2.2 12BHead Tracking Capability (Optional) The system should be able to recognize the position of user head through a WiiMote controller and a couple of infrared emitters placed on a pair of glasses or on a hat, and change the perspective accordingly to its movements. Alternatively, the system can use a face recognition algorithm to recognize user face and its movements.
2.3 13BSpeech Recognition (Optional) The system should be able to recognize vocal commands of the user. These commands could be: ‐ Switch between navigation and edit mode; ‐ Navigation commands; ‐ Edit commands;
3. 2BRequirements On the base of the description provided in the previous chapter, we can detect the following functionalities and additional requirements.
3.1 14BFunctional Requirements WiiMote Detection and Connection
The system should be able to discover Nintendo WiiMote® devices and connect to them.
Gestures Recognition
The system should be able to process the coordinate stream provided by the WiiMote when the user turns on the infrared emitters, detect the gestures defined in section 2 and notify them to the third party application (e.g. WorldWind).
WebCam Detection and Connection
The system should be able to detect and connect to a webcam in order to run the face recognition algorithm.
Head Tracking
The system should be able to process the images provided by the webcam, detect the user face position and notify its movements to the third party application.
Microphone Detection and Connection
The system should be able to detect and use the microphone of the machine in order to run the speech recognition algorithm.
Speech Recognition
The system should be able to process the sound stream of the microphone and detect the user vocal commands chosen from a small command set defined by a grammar.
Simultaneous Execution
The system should be able to run gesture recognition, speech recognition and head tracking simultaneously allowing each capability to interact with the others.
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 7
Capability Selection
The user should be able to enable or disable each capability separately.
WorldWind Integration
The system should be integrated inside the WorldWind sdk, to extend the user interaction.
3.2 15BUser Use Case The functionalities provided to the final user can be summarized with the following use case:
User
GesturesInteraction
Voice Interraction
Head MovementsInteraction
«extends»
Navigation Mode
Edit Mode«extends»
«extends»
CapabilitySelection
3.3 16BAdditional Requirements
Configuration file
All the configuration parameters should be defined inside a setting file.
Java
The adoption of Java as the main programming language is preferable.
Speed
Since the WiiMote sends data with a frequency of 100 frame per second, the system should be optimized for a fast gesture detection.
Simple Integration
The integration of the library inside a third party application by the developer, has to be the simplest possible. In particular the use of Singleton and an Event patterns should be preferred.
4. 3BRequirements Classification The following table shows the previous requirements divided between functional and non functional. Accordingly with the IBM Rational Unified Process©, we have classified each of them with the following tags: Essential, Desirable and Optional. Please note that although some of them have been tagged as Optional and Desirable the library satisfies all the requirements.
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 8
Requirement Type Fu
nctio
nal
WiiMote Detection and Connection Essential
Gestures Recognition Essential
WebCam Detection and Connection Optional
Head Tracking Optional
Microphone Detection and Connection Optional
Speech Recognition Optional
Simultaneous Execution Desirable
Capability Selection Desirable
WorldWind Integration Essential
Non
Fun
ctio
nal Configuration file Essential
Java Desirable
Speed Essential
Simple Integration Essential
5. 4BSystem Architecture
5.1 17BGeneral Description We choose to develop a java library that wraps and extends a set of existing APIs, allowing the final user to interact with its applications in different ways through gestures and voice. To do this, we build a complex system that will be explained with the help of the package organization and the schema provided in the Architecture.docx or Architecture.pdf file. If we look at the source code, we can recognize the following set of packages:
‐ humanInteraction: that includes the main class of the system and a demo/test class; ‐ humanInteraction.connection: that includes the classes required to handle the
connection with the WiiMote and with the network; ‐ humanInteraction.core.events: that includes all the event wrapper classes that
are notified by the system to the third party application when an event occurs; ‐ humanInteraction.core.listeners.ext: that includes all the listener
interfaces that should be implemented by the third party application in order to handle the incoming events from the system;
‐ humanInteraction.core.listeners.mote: that includes all the listener classes that have been implemented by the system to handle the events produced by the MoteJ library;
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 9
‐ humanInteraction.core.queue: that includes a set of utility classes to handle the flow of information between two threads;
‐ humanInteraction.core.fingertracking: that includes the classes for the gesture recognition process and the continuous event notification;
‐ humanInteraction.core.headTracking: that includes the classes for the head tracking process;
‐ humanInteraction.core.speechRecognition: that includes the classes for the speech recognition process and the continuous event notification;
As we have seen from the package description, we have chosen an event‐listener approach where the system calls all the defined external listeners each time an event occurs. We adopted this solution because it is the most suitable for an API, since it allows a complete separation from third party applications and its reuse in different contexts. The starting point of the library is the HumanInteractionManager class, that implements the Singleton patter to ensure that a class has only one instance, and provides a global point of access to it among multiple applications. This class is responsible for the creation and management of the API since it can add the defined listeners and start/stop the tracking system but most of the options must be defined from the configuration file: settings.cfg. When the manager creates its instance, it checks the configuration file to ensure that all the parameters are correct, loads all the settings and creates an instance of the ConnectionManager class. This class handles the discovery and connection with the WiiMotes and with the head tracking client through a socket connection. After that, when the startTracking() method is invoked, the manager initializes and starts the desired engines. Each of them runs on a different thread and thus can be executed independently from the others. Following the schema presented in the Architecture.docx (or Architecture.pdf) file, the next sections will explain all the details related to these three engines, with references to the additional libraries that have been used.
5.2 18BFinger Tracking Engine The finger tracking engine is responsible for the gesture recognition and relies, among the other things on two external libraries: BlueCove the java bluetooth stack and Motej the WiiMote java API. The former is responsible for the communication with the bluetooth dongle of the system, the latter is an event‐listener layer that simplifies the communication with the WiiMotes translating commands (methods) to byte streams and vice versa. When the tracking starts, the ConnectionManager launches the WiiDiscoveryHandler thread that adds a listener (WiiMoteListener) to the discoverer and each time a new WiiMote is found, the listener initializes it by activating the IR camera and adding an IrEventListener to the just created Mote instance. Since the Motej library generates an event for each of the four points that can be detected by the WiiMote for each frame, the IrEventListener executes a preprocessing operation where it reconstructs the original frame, keep tracks of the position of each finger and generates a stream of IrEvent objects for the finger tracking engine (FTEngine) through a fixed size synchronized message queue. The FTEngine is a thread listening on the message queue, that processes all the incoming IrEvents and detect gestures with the help of a CoordinatesBuffer class that acts like an history file with the coordinates of the previous n seconds, where n can be defined from the configuration file.
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 10
The gesture recognition for all the navigation commands (see chapter 2) is executed by analyzing the coordinate history and checking the movements of the two points with some tolerance values, while the editing operations are detected by a state machine implemented with a set of switch‐case and a status flag. For additional details, please refer to the source code. Once a gesture has been detected, the FTEngine sends an event to all the registered listeners. Events can be of the type: WiiButtonEvent, WiiInteractionEvent and WiiNavigationEvent. Finally, since some application (like WorldWind) need to be continuously notified to execute an animation, an additional thread (FTNotifier) can be activated from the configuration file. In this case, the FTEngine will only change some FTNotifier flags and let it do the event generation step .
5.3 19BHead Tracking Engine The head tracking engine (HTEngine) is responsible for the face tracking step and relies on an external library called OpenCV a very powerful face recognition library. Since it has been written entirely in C, to avoid the use of jni, we choose to write the face recognition code in its native language and send the detected coordinates stream through a socket connection. When the tracking starts, the main manager starts the HTEngine listening on the message queue while the ConnectionManager launches a SocketHandler, listening on the port defined in the configuration file. Once started, the face recognition client loads its configuration file htSettings.cfg, checks the correctness of the settings, connects to the SocketHandler, activates the webcam of the system and starts the detection. When the SocketHandler receives the connection request from the client, it launches a new thread (ConnectionHandler) to handle the incoming stream of data and returns listening on the port for new connections. The new thread is responsible for the processing of the coordinates stream and for the sending of them on the message queue where the HTEngine is listening. When a new message appears on the queue, the HTEngine filters the incoming coordinates set and notifies the event (HeadEvent) to all the registered listeners.
5.4 20BSpeech Recognition Engine The speech recognition engine is responsible for the processing of the user voice in order to detect commands and notify them to the third party application listeners. It is based on the sphinx speech recognition library, that allows the recognition of words accordingly with a BNF grammar (hi.gram file) and an xml configuration file (hi.config.xml). When the recognition starts, the HumanInteractionManager starts the SREngine thread that initializes the sphinx library and enters inside a recognition loop. Each time sphinx recognizes a word that has been defined inside the grammar, it returns the word and the engine notifies the corresponding events or sets the corresponding execution flags. Here, like as in the tracking engine, we have defined an additional thread for the continuous notification (SRNotifier) that can be activated in the case the third party applications execute only event based animations.
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 11
5.5 21BWorldWind Integration After the development of the library, we have proceeded with the integration inside the WorldWind sdk. The input handling inside this sdk is basically done by two classes: AWTInputHanlder that implements all the allowed listeners and handles all the input events and OrbitViewInputBroker that computes the position of the globe and of the objects when an event occurs. For these reasons starting from their code we decided to modify and extend them in order to produce two new classes: WiiAWTInputHanlder and WiiOrbitViewInputHandler located inside the it.unitn.cg2008 package. The former has been extended to be not only the listener for key and mouse events but also for all the events generated by the human interaction library, the latter has been tuned with some concurrency workaround to avoid unpredictable movements of the globe. To avoid the entire rewriting of the two classes, we choose to convert each incoming library event to a sequence of key and mouse events. In this way, a WiiMote drag event will be mapped as a sequence of MouseEvent (mousePressed(), mouseDragged(), mouseReleased()) as if the user is dragging the globe with the mouse. Navigation events, like zoom and rotation are mapped into KeyEvents (buttonPressed(), buttonReleased()). Inside the WiiOrbitViewInputHandler we have simply added some checks to ensure the consistency of the new camera and objects coordinates after an event occurs, because WorldWind methods are not completely thread safe. Finally, inside the it.unitn.cg2008, we have added three additional classes: WiiMonitorLayer, SRMessageListener and WiiPointListener. The first class is a WorldWind layer that is used to show the position of the user infrared emitters on the screen (as little triangles) and to show the voice command that has been recognized. The other two classes are listeners used to change the layer parameters, when a WiiMote movement or a voice command is detected.
6. 5BDeveloper Manual This chapter provides to the developer, detailed information about how to configure, compile and run the library for a general application and for the WorldWind sdk.
6.1 22BSystem Requirements ‐ A machine that is able to run smoothly WorlWind SDK for java version 0.4; ‐ Windows OS to run Head Tracking client; ‐ 128 MB of additional ram to run the speech recognition ; ‐ 1 Nintendo WiiMote® for finger tracking; ‐ 1 Bluetooth dongle with Widdcom Bluetooth Drivers version 1.4.2 or above; ‐ SoundCard and Microphone for speech recognition; ‐ 1 Webcam with a resolution of at least 320x240 pixel for the head tracking; ‐ Java JDK 1.6; ‐ Jogl version 1.1.1;
6.2 23BSystem Configuration In order to use all the features provided by the library, you must first ensure to meet all the requirements of the previous section. In particular, we are assuming that the last jdk, the 1.1.1 version of jogl and the external device drivers have been installed and properly configured on the system.
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 12
To create a new application, open your favorite IDE, create an empty project and add to it all the libraries (*.jar) that are stored inside the libraries directory. In particular: ‐ bluecove‐2.0.2.jar : the bluetooth stack; ‐ motej‐library‐0.8.jar : the WiiMote api; ‐ commons‐logging‐1.1.1.jar; commons‐logging‐adapters‐1.1.1.jar; commons‐logging‐api‐
1.1.1.jar; commons‐logging‐tests.jar : common libraries; ‐ js.jar; jsapi.jar; sphinx4.jar; tags.jar; WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar : the
sphinx libraries;
After that you have to place a copy of the three configuration files (settings.cfg, hi.gram, hi.config.xml) inside the working directory of your project. If you plan to use the head tracking capability, locate the ExecFR directory and remember to launch runme.cmd after the start of your application. To use the library in your application, you first have to implement one or more listeners, starting from the interfaces located at the package humanInteraction.core.listeners.ext: ‐ WiiPointsEventListener: to know the position of the IR emitters on the screen;
‐ WiiButtonsEventListener: to know if a WiiButton has been pressed;
‐ WiiNavigationEventListener: to get navigation events; ‐ WiiInteractionEventListener: to get editing events; ‐ HeadTrackingEventListener: to know the position of the user head; ‐ SpeechEventListener: to know the user voice command;
The final step consist of the instantiation of the main class and the addition of the just defined listeners. You can get an instance of the manager by typing: HumanInteractionManager manager = HumanInteractionManager.getInstance(); To add your listeners you can invoke the corresponding manager method: manager.setWiiButtonsEventListener(YourWiiButtonsEventListener); manager.setWiiIntEvtListener(YourWiiIntEvtListener); manager.setWiiNavEvtListener(YourWiiNavEvtListener); manager.setWiiPointsEventListener(YourWiiPointsEventListener); manager.setHeadTrackingListener(YourHeadTrkListener); Please note that an arbitrary number of them can be added. Finally, you can start and stop the tracking by typing: manager.startTracking(); manager.stopTracking(); If you want to calibrate the Nintendo WiiMote, call manager.startCalibration() at any time. The calibration is useful since it allows the system to configure most of its sensitivity settings automatically, in relation with the distance between the user and the WiiMote. If you call this method the library will output all the necessary instructions on the console. In addition, from the manager you can choose to pause the speech recognition (manager. muteMicroPhone()), the head tracking (manager.pauseHeadTracking()) and to
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 13
switch from navigation to edit mode (manager.switchMode()). For a better comprehension, refer to the TestManager class and to the code documentation. To achieve the best performance from the application an accurate tuning of the configuration files is essential. The main configuration file for the library is setting.cfg whose parameters are described in the following table:
Parameter Values Description
FT
fingerTrackingEnabled 0,1 Enables/Disables finger tracking ftBufferSize 0..n slot The message buffer sizeirCameraMode BASIC ,
EXTENDED, FULL The camera configuration mode: extended recommended
irCameraSensitivity CLIFF, MARCAN Sensitivity preset: marcan recommended recognizeDrag 0,1 Enables/Disables drag gesture recognitionrecognizeZoom 0,1 Enables/Disables zoom gesture recognitionrecognizeLook 0,1 Enables/Disables look gesture recognitionrecognizeRotation 0,1 Enables/Disables rotation gesture recognitionrecognizeClicks 0,1 Enables/Disables click recognition continuousZoom 0,1 Enables/Disables zoom continuous notificationcontinuousLook 0,1 Enables/Disables look continuous notificationcontinuousRotation 0,1 Enables/Disables rotation continuous notificationbufferTrackingSize 0..n ms Movements history sizeallowCalibration 0,1 Enables/Disables calibration zoomStopDistance 0..n pixel Distance under that two ir emitters are recognized
as a stop gesture zoomSensitivity 0..n pixel Tolerance to detects a zoom gesture lookSensitivity 0..n pixel Tolerance to detects a look gesture lookFingerSensitivity 0..n pixel Tolerance between two ir emitters rotationSensitivity 0..n pixel Tolerance to detect a rotation gesture rotationRadiusSensitivity 0..n pixel Radius tolerance to detect a rotation gestureminClickInterval 0..n ms Minimum interval to detect a click maxClickInterval 0..n ms Maximal interval to detect a click clickSensitivity 0..n pixel Tolerance to detect a click rightClickDetectionInterval 0..n ms History time to analyze for a right click detectionftNotificationInterval 0..n ms The continuous notification sleep time
SR
speechRecognitionEnabled 0,1 Enables/Disables speech recognition continuousSRnotification 0,1 Enables/Disables voice command continuous
notification srNotificationInterval 0..n ms The continuous notification sleep time xmlCfg String The xml configuration file path
HT
headTrackingEnabled 0,1 Enables/Disables head tracking htPort 1024..65535 The listening port for the head tracking clientcameraWidth 0..n pixel The pixel width of the camera cameraHeight 0..n pixel The pixel height of the camera camTimeOut 0..n ms The timeout after that the face recognition
disengages the user head camXSensitivity 0..n pixel The x sensitivitycamYSensitivity 0..n pixel The y sensitivitycamRadiusThreshold 0..n pixel The radius sensitivity
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 14
In addition there are two additional files for the speech recognition. The first is hi.gram that contains the following grammar in BNF: public <command> = Double Click | Edit Mode | Left | Look Down | Look Up
| Mouse Drag | Navigation Mode | Right | Right Click | Single Click | Stop | Turn Left | Turn Right | Zoom In | Zoom Out;
This is the set of recognizable voice commands. Please note that most of them are multiwords. This choice has been made to improve the speech recognition accuracy, since a lot of words appear similar in the pronounce like “look” and “zoom”. If you plan to extend the speech recognition capability with additional commands, you first have to define them inside this grammar. The second file is hi.config.xml and includes all the speech recognizer parameters. Although there are a lot of them, the most important are those placed at beginning of the file:
Parameter Value Description absoluteBeamWidth ‐1..n The maximum number of hypotheses to consider for
each frame (‐1 all) relativeBeamWidth 0..n The more negative you make the exponent of the
relative beam width, the less hypotheses you discard and the more accurate your recognition
wordInsertionProbability 0..n Controls word breaks recognition. Near 1 is more aggressive
languageWeight 0..n Tradeoff between grammar and acoustic scores addSilenceWords True, false Enables/Disables silence word insertion silenceInsertionProbability 0..1 How aggressive Sphinx is at inserting silences silenceThreshold 0..n Sensitivity level with respect from the environment
The actual values are not definitive and depend strictly on the environment and on the distance between the microphone and the user. Usually some empirical test are required to achieve a good speech recognition. For additional information on the other parameters, please refer to the Sphinx reference manual. The last configuration file is the one for the head tracking (htSettings.cfg) that includes the following parameters:
Parameter Value Description hostIp 1..255.1..255.1..255.1..255 The address of the machine running the
library hostPort 1024...65535 The port of the machine running the library cascadeMethod String The name of the detection rule file cutoff 1..1.4 The scale factor. Increase to improve speed
but to get less precise detection. cvSizehorizontal 1..cameraWidth Window sample pixel width. Increase to
improve speed but to get less accuracy. cvSizeVertical 1..cameraHeight Window sample pixel height. Increase to
improve speed but to get less accuracy.
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 15
horizontalScaleFactor 1..n The WiiMote horizontal Resolution verticalScaleFactor 1..n The WiiMote horizontal Resolution showCameraWindow 0,1 Enable/Disable the camera window waitTime 1..n The waiting time between each detection
6.3 24BWorldWind If your system has been correctly configured as explained in the previous section, the execution of WorldWind is relatively simple. Inside the Project directory there is an already configured copy of WorldWind 0.4 with the additional classes defined in section 5.5. To run the code, simply add all the files of the WiiWorldWind directory in your favorite IDE, add the libraries of the previous section and configure the setting files. If you want to use the library inside a clean copy of the WorldWind sdk, you have to follow these steps: ‐ copy the directory .\project\WiiWorldWind\src\it inside the src directory of WWJ; ‐ copy settings.cfg, hi.gram, hi.config.xml files from .\WorldWind\ to your WWJ working
directory and configure them; ‐ copy the directory .\project\WiiWorldWind\src\worldwinddemo\ inside the src directory of
WWJ; ‐ locate the WWJ configuration file worldwind.properties inside the
.\project\WiiWorldWind\src\config\ directory and change the line: gov.nasa.worldwind.avkey.InputHandlerClassName=gov.nasa.worldwind.awt.AWTInputHandler with gov.nasa.worldwind.avkey.InputHandlerClassName=it.unitn.cg2008.WiiAWTHandler;
‐ Change the visibility of gov.nasa.worldwind.awt.KeyPollTimer and gov.nasa.worldwind.awt.OrbitViewInputStateIterator from protected to public;
‐ Add all the libraries of section 6.2 to the project;
Finally, run the application. If you choose to enable the speech recognition, you will need to pass the additional parameter –Xmx256M to the java virtual machine. You will not see any difference with respect to WWJ, with the exception that by pressing the C button you will start/stop the human interaction library.
7. 6BUser Manual This chapter provides instructions for the final user on how to use WorldWind extended with our human interaction library. To activate the library, follow these steps: ‐ Once the WWJ window appears, click with the mouse inside the window in order to get the
focus on the canvas and press the C button; ‐ Turn the WiiMote in connection mode by pressing together button 1 and 2 and wait until it
connects with the computer. The WiiMote will rumble and the first led will turn on. ‐ Position the WiiMote above or below the screen, with the IR camera looking at you;
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 16
‐ To enable the head tracking, double click on the runme.cmd file locate inside the Exec FR directory and wait for the connection and initialization of the face recognizer (a window with the webcam stream will be opened);
Now that the library is running, you have to put the IR emitters / reflectors on your fingers (forefingers and thumbs) and start to interact with WorldWind.
7.1 25BGestures The allowed gestures are those defined in section 2.1 with the addition of the stop command that is achieved by putting your two forefingers one near the other, in front of the WiiMote. If the distance is below a certain value all the activated movements (zoom, look and turn) will be interrupted. For the best user experience it is strongly recommended to execute a calibration just after the application start, by pressing the R button. After that, place your two forefingers in front of the WiiMote at a short distance. This distance will be used as the stop distance and to configure all the sensitivity parameters for the other gestures. So if the system appears to sensitive, just recalibrate the WiiMote with a stop distance greater than the previous one. If you want to manually switch between navigation and edit mode, press the E button.
7.2 26BVoice Commands The allowed voice commands are shown in the table below. If you want, you can enable/disable the microphone (stop the speech recognition) by pressing the m button.
Command Description Left Moves to the West direction Right Moves to the East direction Look Up Moves to the North direction Look Down Moves to the South direction Turn Right Rotates the globe clockwise Turn Left Rotates the globe counterclockwise Zoom In Zoom in Zoom Out Zoom out Stop Stop all the movements Edit Mode Switch to the edit mode Navigation Mode Switch to the navigation mode Single Click Notifies a single left click to WWJ Double Click Notifies a double left click to WWJ Right Click Notifies a single right click to WWJ Mouse Drag Notifies a drag to WWJ
7.3 27BHead Tracking Once the client has been started, you can activate the head tracking on WorldWind by looking at center of the camera, so that the library can engage your head and move the WorldWind globe accordingly. If you go off the camera for a time that is above the one defined inside the htSettings.cfg file, you will be disengaged. If you want, you can enable/disable the head tracking by pressing the h button. The behavior of WWJ is summarized in the following table:
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 17
Command Description
Head Right Moves the globe to the East direction Head Left Moves the globe to the West direction Head Up Moves the globe to the North direction Head Down Moves the globe to the South direction Head Near the Screen Zoom in Head Far fro the Screen Zoom out
8. 7BAppendix
8.1 28BKnow Bugs and limitations ‐ The library can discover all the WiiMote devices that are near the bluetooth dongle but only
if they are in connection mode (button 1 and 2 pressed together) in a short time after the start of the application (method manager.startTracking());
‐ Although the library can connect with up to 4 WiiMote, it can handle just one of them at a time;
‐ Although the drag movement detected from the library is mapped exactly into the corresponding WorldWind events, sometimes the sdk generates some inconsistencies that lead to an incorrect positioning of the globe (flipping and turning). We were not able to solve this problem, neither to detect it, even after a discussion on the WorldWind forum with the developers of the library.
‐ The Speech Recognition accuracy depends strictly from the quality of the microphone and the noise of the environment. This means that each time the application is executed on a different machine, phone or environment, the configuration file hi.config.xml has to be reconfigured;
‐ As a general rule for a good speech recognition, it’s preferable to speak slowly, especially with multiwords (like “turn left”), and to pronounce the command “stop” with a long “o”.
‐ The library does not recognize combinations of buttons. This means that when the user presses two WiiMote buttons together, two different WiiButtonEvent will be generated.
‐ To allow the definition of the WiiAwtHandler and WiiOrbitViewInputBroker classes in a package different from gov.nasa.worldwind.awt , the visibility of the KeyPollTimer and OrbitViewInputStateIterator class has been changed from protected to public.
‐ The head tracking client sets always the camera to the maximal resolution. With webcams with a resolution greater than 640x480 pixel and slow cpus, the detection can be really slow. In this case, it’s preferable to set higher values for cvSizeHorizontal and cvSizeVertical inside the htSettings.cfg file.
‐ If you have multiple webcams, only the first installed will be used
8.2 29BDirectory Structure ‐ .\HumanInteractionLibrary\Docs : Directory including this report, the UML, the specification,
the slide show and the javadoc of the library; ‐ .\ HumanInteractionLibrary\FaceRecognition\ : Directory including the Visual Studio project
for the head tracking client; ‐ .\HumanInteractionLibrary\ExecFR\ : Directory including only the compiled executable for
the head tracking client;
Human Interaction Library – Final Report Version: <1.0> Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
©UNITN & Graphitech, 2008 Page 18
‐ .\ HumanInteractionLibrary\Project\ : Directory including the eclipse project of the library; ‐ .\ HumanInteractionLibrary\libraries\: Directory including all the libraries required for the
correct execution and compilation of the project. It also includes the HumanInteraction.jar file, a compiled version of the project that can be easily integrated with third party projects;
8.3 30BAdopted Tools Coding: Borland JBuilder 2007 Enterprise, Eclipse 3.3, Microsoft Visual Studio 2008
Modeling: Microsoft Visio 2007, Microsoft Word 2007, Microsoft PowerPoint 2007