an intuitive motion-based input model for mobile...
TRANSCRIPT
An intuitive motion-based input model
for mobile devices
Mark Richards
Thesis submitted for the degree of Masters of Information Technology (Research) to the
School of Information Systems at the Queensland University of Technology, Australia.
December, 2006.
Keywords Input Model, Human – Computer Interface, Mobile Device, User Interface, Input Devices,
Interaction Styles, Automated Survey, DirectX Mobile, DirectShow, Windows Mobile, Computer
Vision, Image Processing, Edge Detection, Object Detection, Motion Tracking, Scene Analysis,
Human Movement, ARToolkit, Augmented Reality.
Publication arising from this research
Richards, M., Dunn, T.L. and Pham, B. "Developing a Motion Input Model for Mobile Devices."
HCI International 2007.
LNCS Digital Library (LNCS, http://www.springer.com/lncs)
In Preparation:
“Automated Data Collection – User Testing on Mobile Devices”
“Implementing Reactive Object Detection on Windows Mobile”
“Analogue Input: Using a video and light to augment buttons”
1. INTRODUCTION ...................................................................................................... 15
1.1. Research Problem...............................................................................................................16
1.2. Research Aims & Objectives................................................................................................16
1.3. Research Questions ............................................................................................................17
1.4. Research Rationale .............................................................................................................17
1.5. Significance of Research......................................................................................................18
1.6. Limitations of Research.......................................................................................................19
1.7. Organisation of Thesis.........................................................................................................19
2. RESEARCH METHODOLOGY................................................................................ 22
2.1. Research Approach .............................................................................................................22
2.2. Requirement Analysis .........................................................................................................22
2.3. Research Breakdown ..........................................................................................................24 2.3.1. Communication model creation..........................................................................................24 2.3.2. Survey Construction/Consideration ....................................................................................24 2.3.3. Implementation upon desktop ...........................................................................................25 2.3.4. Implementation upon mobile .............................................................................................27
2.4. Deliverable Outcomes.........................................................................................................28
2.5. Reliability of Results ...........................................................................................................28
2.6. Problems Encountered........................................................................................................29
2.7. Ethical Considerations.........................................................................................................30
3. REQUIREMENT SPECIFICATIONS...................................................................... 31
3.1. User Requirements .............................................................................................................31
3.2. Scope of Communications...................................................................................................32 3.2.1. Low-Level.............................................................................................................................32
3.2.2. Textual Input .......................................................................................................................33 3.2.2.1. Hex ..............................................................................................................................33 3.2.2.1. Graffiti & Unistrokes ...................................................................................................35
3.2.3. High Level ............................................................................................................................36 3.2.3.1. DyPERS ........................................................................................................................36 3.2.3.2. Profiling Usage ............................................................................................................37
3.3. Motion and Error Detection ................................................................................................38 3.3.1. Movement Tracking ............................................................................................................38
3.3.1.1 Axial.................................................................................................................................38 3.3.1.2. Rotational....................................................................................................................38
3.3.2. Error Detection and Correction...........................................................................................39 3.3.2.1. Gait Phases..................................................................................................................39
3.4. Detection Algorithms ..........................................................................................................40 3.4.1. Edge Detection ....................................................................................................................41 3.4.2. Object Detection .................................................................................................................43
3.5. Algorithm to determine appropriateness ............................................................................44
3.6. Model Design......................................................................................................................45 3.6.1. Interaction Breakdown........................................................................................................45 3.6.2. Input Types (High level commands) ....................................................................................45 3.6.3. Input Motions......................................................................................................................48
3.7. Gesture Recognition ...........................................................................................................51 3.7.1. Profiling users......................................................................................................................52 3.7.2. Input Prediction...................................................................................................................53
4. SURVEY CONSTRUCTION ..................................................................................... 54
4.1. Aim.....................................................................................................................................55
4.2. Survey Structure .................................................................................................................55
4.3. Survey One – Initial Data Collecting.....................................................................................56 4.3.1. Design..................................................................................................................................56
4.3.1.1. Survey One: Part One – Understanding the Users......................................................56
4.3.1.2. Survey One: Part Two – Gauging User’s Reactions.....................................................58 4.3.2. Participants..........................................................................................................................59 4.3.3. Data Collection Methods.....................................................................................................59 4.3.4. Ethical Considerations.........................................................................................................60 4.3.5. Initial Analysis......................................................................................................................60 4.3.6. Particulars of Note ..............................................................................................................63
4.4. Development of an Autonomous Survey on a PDA ..............................................................64 4.4.1. An automated survey versus traditional means .................................................................64 4.4.2. Smartphone Development ..................................................................................................64
4.4.2.1. Device Information .....................................................................................................65 4.4.2.2. .NET CF and Embedded C++........................................................................................66 4.4.2.3. Camera API..................................................................................................................67
4.4.3. Input mediums of a PDA .....................................................................................................67 4.4.3.1. Textual Input ...............................................................................................................68 4.4.3.2. Touch Input .................................................................................................................68 4.4.3.3. Audio...........................................................................................................................69 4.4.3.4. Video ...........................................................................................................................69 4.4.3.5. Motion ........................................................................................................................70 4.4.3.6. Other...........................................................................................................................70
4.4.4. Information Storage ............................................................................................................71 4.4.4.1. Video and Audio..........................................................................................................71 4.4.4.2. Other Data ..................................................................................................................72
4.4.5. Questions and Survey Design..............................................................................................73 4.4.5.1. Survey .........................................................................................................................73 4.4.5.2. Questions ....................................................................................................................73
4.4.6. Survey Environments ..........................................................................................................74 4.4.6.1. Appropriate environments for use of an automated survey......................................74 4.4.6.2. Preparing environments for more meaningful video results......................................75
4.4.7. Common Obstacles .............................................................................................................76 4.4.8. Device Resources.................................................................................................................77 4.4.9. Automated Survey Summary ..............................................................................................77
4.5. Survey Two – Testing User Decisions and Reactions ............................................................78 4.5.1. Design..................................................................................................................................79 4.5.2. Data Collection Methods.....................................................................................................81
4.5.3. Survey Conditions................................................................................................................82 4.5.4. The Tests .............................................................................................................................84
4.5.4.1. Functionality Required................................................................................................84 4.5.4.2. Choosing Data on the Screen......................................................................................85 4.5.4.3. Adjustment to Counteract the Changing Image .........................................................87 4.5.4.4. Modification - Dealing with a Warped World.............................................................89 4.5.4.5. Confirmation with the Faces.......................................................................................91 4.5.4.6. Scrolling Functionality.................................................................................................92 4.5.4.7. Simulating a Phone Call...............................................................................................94
4.5.5. Participants..........................................................................................................................96 4.5.6. Additional Details/Observations .........................................................................................96
4.6. Survey and Concept Summary.............................................................................................97
5. BASIC MODEL CREATION..................................................................................... 98
5.1. Collected Data Classification ...............................................................................................99
5.2. Situational Motions ............................................................................................................99
5.3. Model Expandability .........................................................................................................100
5.4. Inappropriate Commands .................................................................................................101
5.5. The Base Model ................................................................................................................101 5.5.1. Confirmation .....................................................................................................................103 5.5.2. Movement.........................................................................................................................103 5.5.3. Choosing............................................................................................................................104 5.5.4. Selection............................................................................................................................105
5.6. Summary of Model Creation .............................................................................................106
6. PROTOTYPE DEVELOPMENT ...........................................................................107
6.1. Using DirectShow..............................................................................................................107 6.1.1. DirectShow Filters .............................................................................................................108
6.2. Desktop Development ......................................................................................................109
6.2.1. Using a Filter to detect squares.........................................................................................109 6.2.1.1. The Augmented Reality Toolkit.................................................................................109 6.2.1.2. DirectShow Filter Basics............................................................................................110
6.2.3. Mapping to 3D...................................................................................................................114 6.2.4. Tracking motion from cube transformations ....................................................................115
6.3. Windows Mobile 5 Development ......................................................................................116 6.3.1. Porting detection filters ....................................................................................................117 6.3.2. Camera Initialization .........................................................................................................118 6.3.3. Image Output ....................................................................................................................120 6.3.4. Data Display.......................................................................................................................120
6.4. Prototype Summary..........................................................................................................121
7. CONCLUSIONS AND FUTURE WORK...............................................................123
7.1. Answers to Research Questions ........................................................................................123 7.1.1. What functionalities of a phone's features are appropriate candidates to be used as parts
of a motion input scheme? ................................................................................................................123 7.1.2. Is it possible to construct a rational and useable mapping scheme for phone inputs?....123 7.1.3. Can people adapt to using motion gestures as an input medium and what are considered
suitable (not embarrassing. over-exertive) motions to perform? .....................................................123 7.1.4. How uniformly do people perform motions given to them (different people, slight
difference in movement) and can these variations be adapted to?..................................................124 7.1.5. How suitable are images (collected by the embedded mobile cameras) for in-depth image
processing? ........................................................................................................................................124 7.1.6. Can real-time performance of image detection algorithms and movement calculations on
Smartphones™ be achieved?.............................................................................................................124 7.1.7. Will tracking movement critical to this project unexpectedly interfere with the normal
usage of the phone? ..........................................................................................................................124
7.2. Contributions to Research.................................................................................................125
7.2. Limitations........................................................................................................................126
7.4. Potential Applications.......................................................................................................127
7.5. Future Work .....................................................................................................................128
APPENDIX A – SAMPLE INPUTS ..................................................................................129
APPENDIX B – INPUT TYPE BREAKDOWN ..............................................................131
APPENDIX C – SURVEY ONE HANDOUT....................................................................134
APPENDIX D – SURVEY ONE PARTICIPANT BREAKDOWN................................137
APPENDIX E – SURVEY TWO AUDIO..........................................................................139
APPENDIX F – SAMPLE SURVEY TWO RESULTS....................................................140
APPENDIX G – BASE INPUT COMPRESSION ............................................................142
APPENDIX H – BASE SITUATIONS ..............................................................................145
APPENDIX I – SAMPLE MOTION MODEL ..................................................................146
BIBLIOGRAPHY ................................................................................................................150
10 | P a g e
List of Figures Figure 1-1: Thesis Breakdown.................................................................................................. 19 Figure 3-1: The Hex Interface................................................................................................... 34 Figure 3-2: Hex in Action ......................................................................................................... 35 Figure 3-3: Natural Letter Matching......................................................................................... 35 Figure 3-4: Letter Subsets ......................................................................................................... 36 Figure 3-5: Single Strokes ........................................................................................................ 36 Figure 3-6: DyPERS in Action ................................................................................................. 37 Figure 4-1: Basic Outline of Survey Situation.......................................................................... 59 Figure 4-11: Modification Image, High Brightness.................................................................. 89 Figure 4-12: A Happy Face....................................................................................................... 91 Figure 4-13: Sad Face ............................................................................................................... 91 Figure 4-14: The Functionality Test and the Text within. ........................................................ 92 Figure 5-1: Example of how the Information fits together from the Model ........................... 102 Figure 6-1: GraphEdit ............................................................................................................. 107 Figure 6-2: Image Data with Alpha Channel (Note the 00’s) ................................................. 112 Figure 6-3: Image Data without an Alpha Channel ................................................................ 112 Figure 6-4: Translation of Two-Dimensional Screen Data to Direct3D Polygon Format. ..... 114 Figure 6-5: Image without and with Display Filter applied.................................................... 121 Figure D-1: Age of Participants, Survey One ......................................................................... 137 Figure D-2: Nationality of Participants, Survey One.............................................................. 137 Figure D-3: Education of Participants, Survey One ............................................................... 138 Figure D-4: Employment of Participants, Survey One ........................................................... 138 Figure I-1: Model Map, Direction Down................................................................................ 146 Figure I-2: Model Map, Direction Up..................................................................................... 147 Figure I-3: Model Map, Direction Left................................................................................... 148 Figure I-4: Model Map, Direction Right................................................................................. 149
11 | P a g e
List of Tables Table 1: Header of Video Filenames for Survey Two .............................................................. 81 Table 2: Movement to Input Mapping .................................................................................... 104 Table 3: Precise Input to Motion Mapping ............................................................................. 105 Table 4: Input Breakdown ...................................................................................................... 131 Table 5: Sample 1, Survey Two Motion Breakdown.............................................................. 141 Table 6: Input Compression Part 1, Survey Two.................................................................... 142 Table 7: Input Compression Part 2, Survey Two.................................................................... 144 Table 8: Context Relationships ............................................................................................... 145
12 | P a g e
Acronyms Used .NET Microsoft dotNet Framwork
.NETcf Microsoft dotNet Compact Framework
API Application Programming Interface
ASF Advanced Streaming Format
ATL Active Template Library
AVI Audio Video Interleaved
BDA Broadcast Driver Architecture
BGR Blue, Green, Red (Inverted Image Format)
BGRA Blue, Green, Red, Alpha (Inverted Image Format)
COM Component Object Model
DLL Dynamic Link Library
FOV Field of View
GAPI Graphics Application Programming Interface
GPS Global Positioning System
GUID Globally Unique Identifier
LoG Laplacian of Gaussian
MAP Most Appropriate Polygon
MFC Microsoft Foundation Classes
MPEG4 Moving Picture Experts Group Standard 4
MSDN Microsoft Developer Network
PDA Personal Digital Assistant
RGB Red, Green, Blue (Image Format)
SDK Software Development Kit
USB Universal Serial Bus
WDM Windows Driver Model
XVid Digital Video Compression Format based on DivX (MPEG-4)
13 | P a g e
Statement of Authorship The work contained in this thesis has not been previously submitted to meet requirements for
an award at this or any other higher education institution. To the best of my knowledge and
belief, this thesis contains no material previously published or written by another person
except where due reference is made.
Signature:
Date:
14 | P a g e
Abstract Traditional methods of input on mobile devices are cumbersome and difficult to use. Devices
have become smaller, while their operating systems have become more complex, to the extent
that they are approaching the level of functionality found on desktop computer operating
systems. The buttons and toggle-sticks currently employed by mobile devices are a relatively
poor replacement for the keyboard and mouse style user interfaces used on their desktop
computer counterparts. For example, when looking at a screen image on a device, we should
be able to move the device to the left to indicate we wish the image to be panned in the same
direction.
This research investigates a new input model based on the natural hand motions and reactions
of users. The model developed by this work uses the generic embedded video cameras
available on almost all current-generation mobile devices to determine how the device is being
moved and maps this movement to an appropriate action.
Surveys using mobile devices were undertaken to determine both the appropriateness and
efficacy of such a model as well as to collect the foundational data with which to build the
model. Direct mappings between motions and inputs were achieved by analysing users’
motions and reactions in response to different tasks.
Upon the framework being completed, a proof of concept was created upon the Windows
Mobile Platform. This proof of concept leverages both DirectShow and Direct3D to track
objects in the video stream, maps these objects to a three-dimensional plane, and determines
device movements from this data.
This input model holds the promise of being a simpler and more intuitive method for users to
interact with their mobile devices, and has the added advantage that no hardware additions or
modifications are required the existing mobile devices.
An intuitive motion based model for mobile devices - Introduction
15 | P a g e
1. Introduction Traditionally mobile devices have had a very restricted range of input models for user
interaction, with most current devices still relying on buttons for user input. More advanced
devices may also include 4 to 9-way toggle switches; however these work on an identical input
model. A major advancement, and the only concept, that can claim to have broken away from
button style input on these devices and achieve significant market penetration are the touch-
screen and stylus interfaces. These devices however often simply display “virtual buttons” to
act in a similar way as the physical kind.
Scratchpad writing employs a different approach whereby the user inputs information as if
they are writing with ink and paper. This is typically text-based input as the device deploys
processor power towards ‘hand-writing recognition’ and converts written text directly to
digital text. To further speed up input processing, several methods have been devised to
increase input speed. Graffiti [29] is one such method which simplifies inputs and decreases the
chance of error. Another is uni-strokes [12], a method that breaks letters into single lines
meaning there is no need to lift the stylus at all. Such advancements aid people who own
touch screen devices in their everyday interactions with the device.
Many current mobile devices currently in use (either touch screen or without) incorporate
digital image/video capture into their architecture. This opens up a new medium for users to
transmit information to the device. Image processing algorithms can be used to process video
information sent to the device from the user with total freedom of movement along all 3 axis.
This obviously opens up a new dimension to the touch screen concepts discussed in the
previous paragraph while removing two restrictions, the small input environment and the need
for an external input device (stylus).
This new input dimension allows motions to be tracked from the hand of the user holding the
device. This movement can supply information directly to the device. Such motion can be
taken advantage of as a new form of input, using the device and how it is used as the focus.
An intuitive motion based model for mobile devices - Introduction
16 | P a g e
1.1. Research Problem This project investigated the viability of a motion-based input model on mobile devices as a
whole. The focus has been on current (and next) generation Microsoft mobile devices. As this
is a new area of research, there is little published work regarding the development of a
framework mapping the possible input to device functionality. Therefore, this project began
with the development of a suitable framework to meet the needs of mobile device users, and
potentially make significant advances in the way users interact with these devices. The
framework developed incorporates appropriate commands, extensibility, ease of use and a
simplistic learning curve. Such a design should ensure it has relevance and advantages over
currently available input methods.
Currently devices use button presses as the major input medium. This can often be a problem
as the buttons are often small and difficult to use (this is becoming more and more apparent
with full keyboards being implemented on mobile devices). Many of these inputs are for
simple functions that realistically do not require these presses as the users intent can be
gathered from how they move the device about. Being small multifunctional devices they are
often held and moved in different ways by the user naturally depending on the desired use.
Detecting this movement and hence the users intent allows the removal of one step of the input
process and a more streamlined experience.
1.2. Research Aims & Objectives The ultimate goal of this project was to develop a functional prototype that interprets the user’s
motion input from an embedded camera and translates these into device instructions. This can
be broken down into four factors:
• Investigating a movement/motion framework that is intuitive to end users and adds a
respectable amount of functional possibilities for the phone.
• Creating a base framework by surveying a wide variety of mobile phone users and
mapping functionality to motions based on their responses.
• Developing an application that processes camera data and works out vector motions.
• Developing an application that takes this data and compiles it to be used with (a).
An intuitive motion based model for mobile devices - Introduction
17 | P a g e
1.3. Research Questions The following questions were investigated in this project:
• What functionalities of a phone's features are appropriate candidates to be used as parts of
a motion input scheme?
• Is it possible to construct a rational and useable mapping scheme for phone inputs?
• Can people adapt to using motion gestures as an input medium and what are considered
suitable (not embarrassing. over-exertive) motions to perform?
• How uniformly do people perform motions given to them (different people, slight
difference in movement) and can these variations be adapted to?
• How suitable are images (collected by the embedded mobile cameras) for in-depth image
processing?
• Can real-time performance of image detection algorithms and movement calculations on
Smartphones™ be achieved?
• Will tracking movement critical to this project unexpectedly interfere with the normal
usage of the phone?
1.4. Research Rationale This path of research was chosen and decided upon for a number of reasons, these include:
• The market penetration of mobile devices with embedded cameras that can be used for
more than just taking pictures/video.
• The lack of true user interface advancements upon such mobile devices.
• The recent popularisation of Microsoft and Java mobile devices opening up development
for additional users.
• The recently released Windows Mobile 5 has significant advancements in the ability to
interact with the camera of a device.
• Recent research papers showing some of the possibilities that can be achieved by using a
camera as the input stimulus.
An intuitive motion based model for mobile devices - Introduction
18 | P a g e
1.5. Significance of Research Mobile phone development is moving at a very rapid pace, with these devices integrating more
and more functionality with each generation of phone. Therefore, it is desirable to explore
possible alternative concepts that the enhanced power and functionality make available. A
principal component of phone architecture that has remained relatively unchanged has been the
way users interface with the devices. This is typically via the number pad buttons, soft keys or
a joystick. While it is commonly understood that a 103+ key desktop keyboard delivers more
throughput than a mouse (or other common interface device), the same does not hold for the
more limited 12-20 keys commonly available on smaller mobile devices.
It appears that there is a growing requirement for other more efficient interfacing models,
especially those that target the large proportion of less electronically savvy mobile phone end
users. Without an external device to aid input, motion and voice commands are the obvious
choices. Voice commands are already implemented in many commercial devices and most
motion research is currently reliant on external devices to collect the movement information.
Therefore creating a fully integrated solution that is easily expandable would prove useful in
both the commercial and research fields.
Creating a functional input model that is compatible with mobile devices as a whole would
allow further inroads into natural input interfaces. A device independent model could be
adapted to a multitude of devices, such as wearable equipment (including, but not limited to,
wristwatches and heads-up devices).
An intuitive motion based model for mobile devices - Introduction
19 | P a g e
1.6. Limitations of Research The experimental implementation of this project is obviously limited by currently available
hardware devices. In addition, there are a few other areas of limitation that should be noted:
• The acceptance of such a model by owners and potential owners of applicable devices.
• Only Microsoft Smartphones™ have been targeted, not Symbian™ or Linux based devices,
as development paths would be totally different for these devices.
• Lack of a mature Camera API may limit the number of Smartphone™ devices motion
detection will work on.
• Tuning of suitable means to track motion on devices because of the limited battery and
CPU power.
• Limited funding has restricted testing and the availability of hardware for the development
stage.
• Limited time has restricted the development of a complete package, limiting the output of
this work to the completion of an initial framework.
1.7. Organisation of Thesis This thesis is organised in major categories documenting the steps taken through the research
course. There are three major sections of the thesis, the collection of data, the building of the
model and an implementation phase. This is outlined in the diagram below (Figure 1.1.)
Figure 1-1: Thesis Breakdown
An intuitive motion based model for mobile devices - Introduction
20 | P a g e
A brief overview of the chapters follows:
Chapter 2 – Research Methodology
This chapter deals with the breakdown of what has to happen in terms of this research for it to
be considered a success. There is early planning on how the research should take place and
what steps need to happen in which order. Early dissection of some important topics
(communication) occurs so that further understanding can be achieved.
Chapter 3 – Requirement Specifications
Information regarding what users would need out of an input model is discussed here, along
with further discussion about motion itself and how to best recognise it with the resources
available.
Chapter 4 – Survey Construction
Documents the procedures taken to create and conduct surveys to collect user data for the
model. Two separate surveys were undertaken and their purposes and results are all recorded
here.
Chapter 5 – Basic Model Creation
With data available, a model to start mapping motions to inputs was possible. This chapter
talks about the design of the model itself, how it can be used and possible situations where the
model might not be the best input model to use. Early mappings that incorporate findings
from the surveys are included.
An intuitive motion based model for mobile devices - Introduction
21 | P a g e
Chapter 6 – Prototype Development
With a plausible model to use, effort was directed to determine if the mobile devices of this
day were capable of recording and processing video efficiently enough to track motion.
Approaches had been discussed though out the thesis and by this stage an approach had been
determined. This chapter discusses the processes undertaken to achieve a working prototype
using these means.
Chapter 7 – Conclusions and Findings
This chapter sums up the research performed by discussing the contributions made in the
process as well as uses for the findings, both by higher research and commercial factions.
An intuitive motion based model for mobile devices – Research Methodology
22 | P a g e
2. Research Methodology This chapter breaks down the proposed research at the highest level and discusses the expected
outcomes. Early thoughts on the separate sections (data, model and implementation) of the
research are also displayed along with brief information on how they were performed.
Problems and difficulties are also discussed.
2.1. Research Approach This research can be broken into multiple significant stages that are independent from each
other and required different approaches to reach a conclusion. These are:
• Defining Project Scope (Deliverable Outcomes, Sect 2.4)
• Requirement Analysis (Sect 2.2)
• Motion and input classification (Chapter 3)
• User surveys (Chapter 4)
• Communication model creation (Sect 2.3.1)
• Implementation upon desktop (Sect 2.3.2)
• Implementation upon mobile (Sect 2.3.3)
Strong separation between the tasks was made to ensure that strong goals remained in case of
complications. This approach ensured the tasks were completed and delivered significant
research contributions despite the inevitable unforseen problems encountered.
2.2. Requirement Analysis A general understanding of how to use and analyse both motions and inputs needed to be
undertaken so classifications can be created. These classifications were built upon throughout
the project timeline as the project progressed.
The communication model was initially devised by user input and suggestions via a specially
designed survey. Results of this survey were melded into an initial model that was tested and
revised by more user feedback into a suitable model. This model was then incorporated into a
final creation and integrated into the prototype. Before the surveys were created,
classifications and further understanding needed to be developed. A solid set of base data
(inputs to be understood and motions to be read) needed to be outlined and broken down. A
qualitative approach was taken though out the development of the model.
An intuitive motion based model for mobile devices – Research Methodology
23 | P a g e
The set of inputs created were varied, and encompassed a great deal of the functionality of the
phone at both low and high-level. Once this was done the inputs can were classified into
similar groups with the idea that similar inputs would have similar motions.
Similar motions may look different by sight only, but what makes them similar is the similarity
in the properties of the motion: short/long, number of direction changes, etc.
Breaking down development into desktop and mobile development was possible because of
the similarity of developing .NET/COM applications on both desktop and mobile, especially
when developing at the low level. An identical development suite and emulators ensured an
increase in productivity while remaining relevant to the final goal. The increased processing
power upon the desktop allowed us to see positive results earlier to ensure the right path was
being taken. Changes and tweaking were then developed to increase performance to such a
degree that the algorithms perform well enough on the less powerful mobile platform.
A large portion of the motion analysis research is already available [16], making the desktop
development stage experimental, involving testing the performance and reliability of available
algorithms. The majority of work was tested for performance and the suitability of approaches
to be ported to a mobile device. Preparation for phone porting was experimental, and included
some exploratory phases, as there is little previous work in this field. Changes to currently
available algorithms have been examined to cater to mobile devices which typically have
poorer cameras and less processing power.
An intuitive motion based model for mobile devices – Research Methodology
24 | P a g e
2.3. Research Breakdown The three stages (Survey, Model and Implementation) relied on each other for completion of
the research. This section gives a brief introduction to each of these stages.
2.3.1. Communication model creation A suitable communication model for the end-users’ is extremely important, and will ultimately
dictate if such a project is truly viable for general use. Simplified, this model will consist of a
list of commands and/or situations, and a corresponding list specifying how the user is to
achieve or act.
The first step taken towards developing such a communication model, was to collect research
data regarding the various possible input movements. Analysing data collected enabled the
creation of an initial prototype of the model to be created. A methodology was created to
classify the collected information to manage its storage and recall (see Section 3.5).
Experiments with users were then conducted to determine the general suitability of the model,
and user input can suggest possible modifications to the model. With classifications and the
breakdown of inputs into categories, guidelines were created that suggested appropriate
motions for inputs, depending on their category (covered in Sections 4-6).
2.3.2. Survey Construction/Consideration Due to limited resources, the scale of the conducted surveys has been small. Because of the
nature of the information required, it was simply not sufficient to request written answers to
the posed questions. Movement data was most efficiently captured on video. This further
stretched resources and time simply because only one set of answers could be collected at a
time in such a manner.
The initial plan entailed finding 5-15 volunteers and supplying them with the survey details the
day before so that they can think about their answers (motions) overnight. These answers can
be videotaped with audio prompts so they can be archived and analysed.
Common properties of motions were collected over the entire group of volunteers to determine
overall motion information. This was then be verified in the second survey.
An intuitive motion based model for mobile devices – Research Methodology
25 | P a g e
2.3.3. Implementation upon desktop Creating an initial image processing setup upon the desktop allows us to see the possible
results of our work far earlier. In that sense it can be considered a prototype of the final work.
Developed primarily in C++, a step-by-step approach of the actual final process in retrieving
the required data can be used. These steps were:
• Retrieving data from the camera.
This was emulated by the use of a web camera since they give a typically lower image quality
just like phone cameras. Since the data is a stream, the data was stored and analysed frame by
frame, with the aid of markers around the room.
• Collecting edge information from the camera
Processing the currently stored frame, edges were found in such a way that the resulting image
is not overly complex and hence simplify the next step in the process. This was done in
multiple ways to find the most efficient process (Figure 2.7).
• Creating vertex data from the edge image.
Finding true vertices in each image allowed the comparison of sequential frames. This is
because vertices will generally only change in location and the change will be constant
throughout the image. This vertex data was stored in arrays to allow quick and efficient access
this information.
• Comparing to previous data to obtain result
Finding the general movement direction by comparing arrays enabled us the ability to find
general directions of movement. A robust method was required that ignored false data and
averaged the vertex information over several frames. Some commands incorporated multiple
directions, and hence past information was stored as well. The performance of this method
depends on the complexity of the arrays of temporal data collected.
The development platform was a relatively powerful computer with a web-camera attached to
record information. Such a platform gave us the ability to emulate the final product with a
generally more flexible infrastructure. It also allowed us to find a solution that is applicable,
allowing work to improve efficiency and make the approach more applicable to the final
platform.
An intuitive motion based model for mobile devices – Research Methodology
26 | P a g e
For a test of edge detection performance in .NET, a simple application was created to load and
dynamically detect edges given certain thresholds. To improve speed and performance, a lot
of the code was created as 'unsafe' (unmanaged) to avoid.NET overheads (such as garbage
collection and memory management). This was an attempt to close in on the environment that
would be used upon mobile devices.
Figure 2-1: A Dynamic Canny Edge Detection Algorithm using .NET
An intuitive motion based model for mobile devices – Research Methodology
27 | P a g e
2.3.4. Implementation upon mobile Taking the desktop implementation and porting it to a mobile application depended greatly on
consideration taken earlier. Relying upon low-level C++ programming earlier in the project
facilitated the conversion. The greatest change was the modification from web camera data, to
relying on the embedded camera of the devices. Code changes to improve performance on the
more restricted mobile platform were required. But generally, most of the coding was
implemented on the original desktop implementation. Figure 2.2 illustrates the porting of
Figure 2.1 to a mobile device emulator.
A simplified proof of concept of the desktop implementation was created that performed edge
detection. This example used GAPI, however, the final version used the DirectShow
implementation within Windows Mobile 5.
Figure 2-2: Edge Detection upon Mobile Emulator
An intuitive motion based model for mobile devices – Research Methodology
28 | P a g e
2.4. Deliverable Outcomes There were well defined deliverables expected from each stage to show definite progression in
the work. These were (stages highlighted in the previous section):
• Stage 1
A detailed and appropriate map showing basic functionality and the recommended motions for
that functionality should be deliverable. Documentation on how this was achieved and how it
is suitable for applying to other inputs must be present. Multiple repetitions and documented
changes/advancements should also be presentable as well as user feedback through the
process.
These maps should be easily applied by other designers for their own input motion techniques
and therefore be simple to understand and the steps behind them straight forward to
implement.
• Stage 2
A working prototype that maps direction vectors to the direction a web camera is moving.
This should display and record the information on the direction being moved as well as do
some simple filtering of garbage data collected. This code should be designed in such a way
that it is possible to transfer it relatively simply between the desktop and a mobile device.
Ultimately the two should be capable of being worked on simultaneously.
• Stage 3
Stage 3 consists of a mobile implementation of stage 2, as well as applications to test/showcase
the functionality. This will be done by using the in-built camera of the mobile devices to track
the motion the device is being moved in. Given the limited resources available on these
devices the software performing this task should use few resources so that the device can still
operate as a phone without significant slowdown caused by the application. Some of these
applications should also incorporate data collected from stage 1.
2.5. Reliability of Results Results from stage 2 and 3 were verified by test and demonstration since the original data (the
direction the devices are moved) can easily be verified by results (direction the algorithms
An intuitive motion based model for mobile devices – Research Methodology
29 | P a g e
believe the device has moved in.) Applications were during the course of the detection
development to test that movement was being tracked correctly. The scope of these
applications gradually increased as the development grew.
Originally the applications developed simply showed this movement information so that
multiple people could judge the correctness of the detection routines. These applications
include firstly the rendering of boxes (rendered upon tracked sections of the image, section
7.2.1.) that moved as the device moved. A simple 3D compass that pointed in the direction the
device was being moved and finally an application that drew (in a 2D representation) the
movement ‘strokes’ the device made as it was moved. These applications are shown
throughout chapter 7 as demonstrations of the development process
The success of stage 1 was judged by taking the designed model and applying it to certain
application types (such as the web-browser in Appendix I) and then determining if indeed it
was useful. This was to be performed by querying end users, but the release of VueFlo
(http://www.theunwired.net/?item=videoview-htc-vueflo-easy-navigation-technology) on the
HTC Athena mobile device allowed a different approach. It uses the same concept but with
specialist hardware, the model uses nearly identical motions to this software and the overall
very positive reviews of VueFlo also show that the model allows for reliable motions to be
used.
2.6. Problems Encountered Being a fast moving and relatively new field, the possible risks are generally not well defined.
The following are the risks identified at the outset of this project, which were encountered
during the course of this work.
• Redundancy of research
With the release of Windows Mobile 5 [22], there was a wealth of undocumented functionality
(in particular with interaction with camera) and improved interaction over older versions of the
mobile device operating system. With this improved functionality, some of the earlier work
(in particular trying to operate with the camera were made redundant. Most of this older work
had to be restarted with the new methods to interact.
• Processing power
An intuitive motion based model for mobile devices – Research Methodology
30 | P a g e
Currently available (to myself and others) devices simply not have enough grunt to perform
the tasks required, regardless of how optimised the code is. What has been developed stresses
the devices used significantly, and hence less powerful devices currently out there may
struggle. Significantly more powerful devices are just becoming available that will supersede
the current devices, making this work far more effective.
• Camera imagery
The images produced by devices are currently of relatively low quality and frame rates are
quite low. Again newer devices will aid in alleviating this problem with hardware
improvements and allow this work to more closely mimic the desktop counterpart. Lower
quality video mean less information can be extracted from it, and therefore less accurate
motion detection.
2.7. Ethical Considerations The actual development and testing of the model required no special permissions or
considerations in regards to people or animals. However, while developing survey questions it
was necessary to consider cultural backgrounds, because certain motions may be considered
offensive or degrading. The ethical statement included in the transcript of Survey 1 is in
Appendix C.
An intuitive motion based model for mobile devices – Requirement Specifications
31 | P a g e
3. Requirement Specifications When developing a model aimed at use by normal everyday users, consideration has to be
given to ensure it suits the diverse requirements of the target users. Discussion of such
requirements is included in section 3.1. Possible expansions of these requirements once the
model and implementation have become more mature are also discussed, along with factors
that are of no importance to the end-user, such as the question of how to best find motion from
video.
3.1. User Requirements To be considered a success, this project needed to meet users’ needs in such a way that the
created framework offers significant improvements over existing input frameworks.
Requirements to achieve these goals included:
• Easy to remember commands
• Logical actions
• Rapid access to functions
• Reliable input parsing
• Easily extendable
• Inter-operable with other input methods
Easy to remember commands – Short and simple commands that make sense to the user are
highly practical and far more likely to be used than commands that make no sense (i.e. Moving
the device up, left then forward to choose an option to the left is confusing and should be
avoided at all cost). The same applies to command time. A long command with many
instructions is inferior to a short one or two stroke command.
Logical actions – Moving the device left to do a command that naturally involves a left
inclination is far more logical than an ambiguous set of commands. This can be applied to
more complex actions as well. The natural reaction to an overly loud volume is to move the
device away from the ear, so this would be a logical command for this action.
Rapid access to functions – If a hierarchical system is used to access commands then the
traversal though these menus should be streamlined. This is to decrease the time taken to
An intuitive motion based model for mobile devices – Requirement Specifications
32 | P a g e
access commands. This suggests that options such as shortcuts for most commonly used
commands should be available.
Reliable input parsing – Understanding ambiguous commands is a necessity with such input
methods. Should the device drop commands it can't fully understand or try to derive the user's
intending meaning. And if option two is chosen do we compare it to the most commonly used
commands give rankings and judge with actions it resembles?
Easily extendable – Both commands and actions must be able to be easily added to the
completed work.
Inter-operable with other input methods – Users may wish to also use other available
communication methods with their devices. The framework should respect this and keep such
methods available for use at all times.
3.2. Scope of Communications Applying meaning to a user’s motion is the crux of this project, and as time has passed the
importance of different levels of communication the user can perform has become more
apparent. Initially, the scope of this project was centred on the input of text, which has taken a
far less important role over the duration of this work.
Communication with the device can incorporate simple computer like instructions or more
complex motions. Such motions can be mapped to specific functionality depending on the
current situation and context.
3.2.1. Low-Level Low level communications, much like its language counterpart, are simple phrases that
perform a pre-defined action. In the typical computer domain such actions as file deletion,
clicking a button or closing a window would be considered low-level interactions by the user [1, 16, 28].
Many human motions can also be considered low-level actions. Nodding the head in
confirmation or denial are such examples. A simple Yes/No situation will occur often while
An intuitive motion based model for mobile devices – Requirement Specifications
33 | P a g e
interacting with a mobile device. Input for such events can easily be transferred to a simple
movement of the device that imitates the head nodding [4].
Other basic motions can also be applied to other low level situations. Simple examples
include:
• Rolling the device down/up to scroll though a list of options.
• Moving the device along a plane to pan around a displayed image.
• A simple shake of the device to choose an option or to acknowledge an event.
Many of these simple inputs currently require forces applied to numerous input receptors
(buttons, scroll wheels, toggle sticks and scratchpads) simply because of the limitations of
these switches. This can cause inconsistency problems between applications and extra effort
as the user tries to operate multiple input mechanisms simultaneously. A single input model
based on a user’s natural actions will provide significant progress.
Identifying when the user’s motion is to be read and interpreted may be problematic, therefore
it is quite possible that a way to activate the reading of motion may be advantageous. A
simple push of a button to enable/disable the reading of camera motion was investigated in the
early stages of development.
3.2.2. Textual Input T9 [34] and other button press methods are not the only ways a user can input text into a device.
Motion, much like the natural writing of text can be applied to a mobile device as an input
mechanism. Whether it is the moving of a pen-like implement (stylus) over a flat surface, or
the moving of the device itself, text input has always been a significant portion of human-
computer interaction [13].
3.2.2.1. Hex Hex [40] is currently under development at the University of Glasgow as a text-entry human-
computer interface. A typical PDA device is connected to an accelerometer (a device that
detects movement, and in particular device rotation) while the user is presented with a
graphical sphere of hexagons as an input aid. A dot is manipulated upon the screen by tilting
the device and letting the dot ‘fall’ in a certain direction.
An intuitive motion based model for mobile devices – Requirement Specifications
34 | P a g e
Figure 3-1: The Hex Interface
Letters are grouped in the hexagonal grids depending on certain guidelines that are designed to
aid text input speed. Vowels are grouped together at the top since they are commonly
accessed. Other common groupings are bundled together with the least common set being
given the down-left position (Figure 3.1). Upon entering a new hexagon the device switches to the
new screen and again splits the letters, this time into single letters so one can be entered.
A predictive model is applied for fuzzy inputs (when input is close to two hexagons, likelihood
of next letter group/closeness to border is examined and user choice determined) to aid input.
Another recent addition is another predictive model that aids choosing the next letter by
making it easier to tilt to. Letters less likely to be used take more effort to reach (rolling up
hill, for example.)
This results in a method where each letter can be accessed with two tilts of the device while
providing a graphical aid for the user in their adoption of the technique. This model has
already been applied to mobile devices with good results (Figure 3.2) , however the wide use of
this approach will depend on the various manufacturers embedding an accelerometer into their
products.
An intuitive motion based model for mobile devices – Requirement Specifications
35 | P a g e
Figure 3-2: Hex in Action
3.2.2.1. Graffiti & Unistrokes Graffiti [29] and Unistrokes [12] are two similar stylus text input mechanisms that attempt to
simplify the English alphabet. They were created in an attempt to increase speed and further
differentiate the letters to aid in CPU recognition.
Letters inputted in such a manner are required to be single strokes so that multiple strokes
cannot be misinterpreted or matched to the wrong letter. This limits the number of strokes
available, typically to somewhere in the region of 5-8 unique strokes, depending on the
implementation. This is obviously insufficient for the 26 letter alphabet; therefore other
factors have to be included. The most common of these are the reversal of the stroke and the
rotation of the strokes. Some strokes by nature can start at their end point and work back,
making a new stroke to use. Rotating strokes 90 degrees around their centre also increases the
stroke count. Obviously symmetric strokes cannot have both applied since they would result
in identical strokes.
Figure 3-3: Natural Letter Matching
An intuitive motion based model for mobile devices – Requirement Specifications
36 | P a g e
When matching letters to relevant strokes the speed and familiarity are dominant factors in
decision making. For example (Figure 3.3), letters that closely resemble strokes are often placed
together. When this could not be applied subsets of the letter stroke are considered (Fig 3.4) so
that the actual stroke is one of the strokes made while writing the letter.
Figure 3-4: Letter Subsets
Common letters are also given quicker strokes to increase input speed (Fig 3.5)
Figure 3-5: Single Strokes
Simple strokes translate well when combined with motion detection discussed in the previous
section. Familiar patterns are a large boon for new users learning the new input method and
will ease migration.
While these approaches also show promise, they require mobile devices equipped with stylus
and touch screens, which therefore restricts their applicability.
3.2.3. High Level High level input goes further than simple one word actions and attempts to communicate
events or situations. During the early stages of my research it became apparent that this was
by far the least explored area of study. This also has the most potential in future work, as well
as the ability to spread into other fields.
3.2.3.1. DyPERS DyPERS [25] is a system initially developed at the Massachusetts Institute of Technology as an
augmented reality system. The system scans the environment the user is examining for key
objects. If the system finds such objects then triggers are fired. In its current form this usually
entails launching short video clips about the noticed object.
An intuitive motion based model for mobile devices – Requirement Specifications
37 | P a g e
The System learns visual cues and therefore can be taught by the user to recognize objects in
the world around him (Figure 3.6). Such a system can be embedded into the framework of this
research to aid intelligent input (see Section 3.7.2.). For example, detecting a specific business card
with appropriate visual cues upon it (logos, names, etc) could automatically call the phone
number related to that business card.
The DyPERS system shares some similarities to the work presented in this thesis, particularly
in image recognition triggering events, however it relies on identifiable visual elements being
present rather than device movement tracking. We therefore feel our approach has more
potential and scope for usefulness as an input model.
Figure 3-6: DyPERS in Action
3.2.3.2. Profiling Usage Each user has a unique style when using a device and hence would benefit from having
different functions available to them which are different to other users. For example, Person A
may spend a significant amount of time listening to music on their device while Person B may
use personal information management tools to a greater extent. Such user patterns can be
monitored and specific communication paths may be opened via this usage.
These predictive models based on a user’s past activities can map common actions to common
activities [31]. This makes device usage not only simpler for the end user but also more natural.
A user may turn the phone 90 degrees to the right to start the music player if this was a
common task of the user (the user may turn the device instinctively to insert headphones into
the jack). Such functionality could be learned over time and adapted into the input model of
the phone.
An intuitive motion based model for mobile devices – Requirement Specifications
38 | P a g e
Models could also be applied less dynamically, much like a shortcut system. Upon opening a
general ‘programs’ menu favourite programs can be assigned directions to open, again based
on how commonly used they are.
3.3. Motion and Error Detection Transferring image data from a device’s camera (to be processed by the device) will enable us
to recognise changes in location and direction in the 3D environment of the phone. Being a
single camera without a free-moving lens (omni-directional) meant additional processing time
and the possibility of errors. Using a variety of already researched techniques and methods
minimized such problems.
3.3.1. Movement Tracking Processing captured images allows us to track the movement of the camera and its user. In a
three-dimensional plane there are two methods of movement, traversal along one or more axis,
or rotation around these axes.
3.3.1.1 Axial Axial traversal is the simplest motion to track with simple camera equipment. It revolves
around the device/camera travelling in a set of directions that can be mapped in the three-
dimensional planes and are upon the optical axis of the camera.. While moving, the device
cannot rotate and the focal point in the horizon must remain constant (obviously an imaginary
point in the distance if not facing the horizon).
Upon receiving a processed image (Section. 7.2.1.) the pictures were scanned for the remaining
edges and broken down to straight lines with given start and end points (pixel locations.) With
this information vector sets are created that outline the information in the image. Not only is
this an efficient way to store the image data but also gives a method to directly compare
images to those previously captured and stored [14].
Vertices of similar length and direction were matched between frames (rotation can interfere
with this, and is hence why it is dealt with separately in the following section). With multiple
vertices, these changes can be confirmed and considered consistent. When start and end points
change and lengths remain constant, then it can be determined that the device has moved either
left, right, up or down depending on the new values. In and out of the picture can be
determined with similar line direction but an increase or decrease in vector length.
3.3.1.2. Rotational
An intuitive motion based model for mobile devices – Requirement Specifications
39 | P a g e
Tracking rotational movement with a standard camera and lens is a far more sketchy process.
To truly track rotation direction and amount, a sense of image depth has to be established.
Many new concept devices are currently being announced with 2 cameras, and while there is
no information on if or how they can interoperate, this could potentially be utilised to generate
a sense of three-dimensional vision.
The currently available single lens devices require compromises to be made when attempting
to track rotational movement. The currently accepted method to track depth is to increase the
field-of-view of the camera device. A typical camera has a FOV of around 45-50 degrees
directly in front. Increasing this value to around 80 degrees helps give the image a concept of
depth, and is achieved by special ‘fish-eye’ lenses or convex mirrors [14]. Unfortunately, this is
quite impractical for a phone, and hence exploratory vector mathematics was applied to
examine the practicality of rotation tracking on a typical FOV device.
Such mathematics relied on vector changes discussed previously being inconsistent across the
image(Section. 3.2.1.1). For example, if rotating the device to the left, a vector on the left of the
image would increase in size while getting closer to the centre of the image. A vector on the
right would decrease in size and move closer to the right edge of the image.
3.3.2. Error Detection and Correction Being able to judge when a user has made a minor mistake in their inputs, and adapting
accordingly, has increasingly become a more important feature of recent input models. This
project’s scope included spelling and movement error (low, medium and high level inputs (Section 2.3).
Determining what the user intended with their motion not only makes the model more
attractive to users, but also increases speed and obviously accuracy.
3.3.2.1. Gait Phases Gait Phases are a breakdown of the human walking cycle where each step is broken into eight
segments [10]. Research has shown that independent motion (entering information) typically
occurs during the ending steps of the cycle (as the person finishes their step and is putting their
foot onto the ground.) If human sway while walking is to be taken into account while
receiving input, this indicated that the device will typically be moving downwards while the
input is occurring.
An intuitive motion based model for mobile devices – Requirement Specifications
40 | P a g e
To take this a step further, human motion can be monitored while no input is being inputted.
Because of the walking phase and the above findings it can be assumed that most motion
inputs would occur during this downward motion. Therefore in most inputs, a minor
downwards motion can be ignored. Since walking is typically in a cycling and consistent
manner, this collected information could be applied while input is being received to derive
correct input.
3.4. Detection Algorithms The two most important factors when deciding on a suitable detection technique are noise
filtering and processing speed. When performed on a mobile device with many restrictions,
these factors become increasingly important. Slower processing speeds require more efficient
algorithms to be effective. Mediocre cameras result in increased image noise, hence the need
to filter this noise out.
Five common algorithms have been examined for effectiveness:
• Zero Crossing Detector (Laplacian of Gaussian/LoG) [35]
• Robert Cross Edge Detector [3]
• Canny Edge Detector [5]
• Compass Edge Detector (Prewitt) [11]
• Sobel Edge Filter [3]
LoG algorithms are probably the most computational expensive of all the examined methods
though some variations are reasonably fast (Difference of Boxes, for example [35].) A
Laplacian isotope is initially created to measure the changes of intensity of an image. Zero
crossing marks the places where this value crosses the zero line (changes from positive value
to negative, or reverse). These are typically edges within an image, but occasionally are not
true edges (referred to as 'features'). Noisy pictures are usually smoothed first using a
Gaussian filter [11]. Such a detector may be appropriate for scene change (Section 3.5.2.)
functionality but is too expensive for standard detection.
The Robert Cross detector is a small and fast detector relying on two 2x2 matrices for
detection after colour data has been removed from the image. One problem with this method
An intuitive motion based model for mobile devices – Requirement Specifications
41 | P a g e
is that it is far more reliable in finding 45deg edges than horizontal and vertical. This is not
ideal for low resolution images. Also the small matrix masks are highly susceptible to noise.
The Canny detector is much like LoG, starting by applying a Gaussian filter to blur the image
and remove noise. Small, but slightly more advanced matrix passes like Robert Cross are then
applied. A third pass is then made to only display the 'maximum' of the edges detected (results
in only pixel wide edges). Canny detection handles noise well and delivers good edges but has
trouble when multiple edges converge at one point, making one edge appear disconnected
while merging the resulting edges into a single edge. This method is also very intensive as
seen with the many passes needed over an image.
Compass edge detection uses eight small matrices to determine both edge gradient and
orientation in separate images. This could be useful in more powerful processors as the
information could be used to track edges over a series of frames as well. However, on smaller
devices, the two output images are difficult to handle. Prewitt detection is also very vulnerable
to noise.
The Sobel detector is similar to the Robert Cross detector in that it uses two small matrices to
detect edges of a gray-scaled image. However, this time horizontal and vertical images are
favoured. This has the repercussion of the filter being less vulnerable to picture noise than
Robert Cross while retaining the speed.
It is apparent that both strong noise filtering and fast edge detection cannot truly be obtained
and sacrifices in both fields must be made to choose the best algorithm. Sobel has been chosen
as the most appropriate edge detection algorithm in the working environment of this project.
As the computational power of mobile devices increases, it may be possible to switch to Canny
or Zero Crossing.
3.4.1. Edge Detection The input images were processed by passing a 3x3 (or higher) matrix over each pixel and its
surrounding neighbours [5, 42]. If certain patterns are found in these matrices, then it can usually
be determined that this pixel is part of an edge in the image. Typically edges are identifiable
An intuitive motion based model for mobile devices – Requirement Specifications
42 | P a g e
when there is a sudden change of colour or luminosity of neighbouring pixels, and that change
travels consistently in a line.
The Sobel edge detection matrices (Fig 3.7) are considered by many as the most efficient method
to process edges and therefore should take precedence as the solution of choice on processing
power limited devices [7].
Figure 3-7: Sobel Horizontal and Vertical Edge Detection Matrices
Considering the small size of current image capturing devices currently embedded inside
mobile devices, other algorithms may be considered depending on performance. The option to
capture in gray-scale to again increase performance is another possibility.
Even edge-processed images can remain complex and difficult to perform tracking methods
upon if there is a large number of edges present in the image. Bi-level masks can be applied to
such images to reduce the complexity of the image so that only the important features of the
image remain (Fig 3.8.). Such images are ready to be processed to retrieve relevant and valuable
information.
Figure 3-8: Original Image → Grayscale & Sobel Filters → Bi-level Mask
An intuitive motion based model for mobile devices – Requirement Specifications
43 | P a g e
3.4.2. Object Detection Initially, it was thought that edge-detection algorithms would be the most appropriate way to
approach this problem. Straight edges are commonly attributed to static objects in the
environment (tables, buildings, roads) and are relatively easy to track between frames.
Problems arose when trying to determine the differences between rotation and movement since
the edges had no reference point to compare when determining rotation. This resulted in
having to examine edges in order to compare their changes to each and every other edge. If an
edge grew smaller on one side while one on the other side grew then it was presumed rotation
was being performed. This resulted in additional complexity and often unreliable results as
reference edges were often inappropriate to compare with. Therefore a simpler method was
required.
Shape detection algorithms avoid this problem as each vertex found has more reference data to
compare to. Squares seem the most appropriate shape to track as they are easily identified in
an image and polygons of all orientations can easily be mapped back to an original square.
Comparing rotation information on a square is also far easier to manage
Hough Shape detection and the Augmented Reality Toolkit (Section 7.2.1.1) were both examined for
their shape detection abilities. Both were more than capable in detecting squares in the image,
but the ARToolkit™ contained multiple libraries to aid in additional functionality such as the
converting to three-dimensional information (Section 7.2.3.).
Therefore the ARToolkit™ was deemed to be the best processing to apply to motion detection.
In fact it offered too much processing, so a stripped down version of the toolkit appeared to be
the best solution to the problem.
An intuitive motion based model for mobile devices – Requirement Specifications
44 | P a g e
3.5. Algorithm to determine appropriateness When parsing video, it is common to find multiple squares upon the scene. While the best
result would be to take all these squares into account when determining motion it is not viable
with current computational power. Therefore it is best to concentrate on the movement of only
one square in the scene. To achieve the best results, such squares need to be judged to
determine which will give the best results when tracking, i.e. their appropriateness. Such
variables such as where the square is on the screen, its current orientation and size were all
taken into account when determining appropriateness.
To decide this, an algorithm was devised to determine the ‘most appropriate polygon (MAP).’
Each polygon returned by the ARToolkit’s detection routine was given a score to grade how
appropriate the polygon would be to track. Quick tests are then performed to determine if the
MAP of the current frame is likely to be the same as the MAP from the frame before (by
comparing the size of the polygon on the screen, centre and vertex information). If so, then
the changes between the transformed polygons that relate to these squares can be calculated. If
not, a look through the other square data in the screen is performed, in an attempt to find the
previous polygon. If a possible candidate is found, the calculations are applied to that
polygon, before moving onto the new MAP for the next processed frame.
To determine a MAP score, the following factors were include:
α - Appropriate size (easier to detect changes on polygons taking up more screen space)
β – Vertices distance from screen edge (further from edge, higher chance square will still exist
in the next frame. If the polygon gets cut off by the screen edge it can no longer be detected.)
Determined by adding the distance of the X*10 and Y*10 co-ordinate of each vertex to the
closest screen-corner and taking the smallest result from the four vertices.
∆ - Edge Length (easier to track rotation in both directions if the edges are not short.) Result is
the addition of (100/edge length) for each edge.
θ - Chance it is a true polygon (0.0-1.0) (ARToolkit returns confidence results. The lower the
confidence the less likely this is a square, and the more likely it will not be detected next
frame.)
An intuitive motion based model for mobile devices – Requirement Specifications
45 | P a g e
Examination, along with trial and error, was employed to determine an algorithm that gives the
strongest results.
The algorithm used to derive a MAP score in the work presented here is:
(α/10 + β/10 - ∆) * θ.
3.6. Model Design Creating an input model for mobile devices required breaking down the available movement
derived input information, as well as creating input categories into which the possible inputs
can be placed.
3.6.1. Interaction Breakdown To determine possible relations between input types and the general movements that can be
performed on a mobile device, a list of commonly used functions was collected to be used for
test cases. These were collected over a 4 day period by three other Microsoft Smartphone™
users. We each noted the functionality used on our devices over this time. This enabled initial
work on classifications to be performed.
The complete collection of inputs is available in Appendix A.
3.6.2. Input Types (High level commands) The inputs from Appendix A can be classified into eight general input types (Figure 3.9). These
help generalise and categorise the inputs into distinct types so that general motion types can be
applied to these generalised inputs. This adds a layer of logic to the model since if a particular
input can be classified, then a general indication of what the motion to perform the input can
be created.
An intuitive motion based model for mobile devices – Requirement Specifications
46 | P a g e
Figure 3-9: Input Types
From initial analysis, eight types of input have been classified. Many of these classifications
may seem superficially similar. However, further justification and explanation is given below.
Choosing – Navigating around a set of options/possibilities to find the most appropriate. Such
inputs are generally performed by directional keys/toggle stick. A limited set of options are
given to choose from.
e.g.: Choosing an option on a menu or a list with a pre-defined number of items.
Selection – Selecting one of an unrestricted set of items. This can be often considered the
selection of an item within the user’s created content.
e.g.: Selecting a circle drawn in a vector graphics program.
An intuitive motion based model for mobile devices – Requirement Specifications
47 | P a g e
Confirmation – A message sent back to device denoting that the user had understood the
current situation and agrees/disagrees with it. Usually performed by pressing a yes key or a no
key, these are often the call/end call keys.
e.g.: Ensuring the user wants to delete photo in photo album after the user requests a delete.
Adjustment – Minor changes to already confirmed/selected situation. This is often performed
by the directional keys or toggle stick.
e.g.: Increasing the contrast in a displayed Photo, increasing the value in a number box by one.
Moving – Changing contents displayed in the screen by scrolling. Again another input usually
covered by directional keys, toggle stick.
e.g.: Panning picture across to see other side.
Functionality – Accessing a program/ability of the device. Soft menus, dedicated keys or the
act of opening a menu, choosing and selecting.
e.g.: Opening up the photo album.
Menu Access – Displaying menus that list available actions in current situation. This is
commonly a specialised key.
e.g.: Opening up zoom menu to zoom in on picture.
Modification – Significant changes to a currently available item. This can be performed with
keypads, direction keys or a toggle stick.
e.g.: Flood filling water in picture to colour red, changing the input type from numerical to
English.
Preliminary classifications of such inputs are available in Appendix B
An intuitive motion based model for mobile devices – Requirement Specifications
48 | P a g e
3.6.3. Input Motions Motions that are applied to inputs can be further described by breaking down the input
movements into their singular motions (Figure 3.10.) and then examining the properties of these
motions. The most significant properties can help define an Input Type (all inputs of a specific
type share similar properties.)
Figure 3-10: Input Motions
An intuitive motion based model for mobile devices – Requirement Specifications
49 | P a g e
Direction: The general directions used in the three-dimensional plane from the user’s
perspective (up, down, left, right, towards user, away). Most commands will use multiple (2
or 3) directions to increase the total number of combinations available. Conjunctions of
directions (up-right for example) can be employed as well. Directions are easily classified as a
list which most likely provides the most important/prominent classification. Algorithms to
detect motion will be able to gather direction information directly.
Values: Up, Down, Left, Right, Towards, Away
Rotation: Movement around the phone’s axis. This will be the most difficult information to
gather from current day devices and hence will only play a minor role in the initial model.
Rotation does play a key faction in several functions when trying to mimic human motion (i.e.
making a phone call requiring the user to put the phone to their ear, which requires rotation.)
Values: Rotate Left, Rotate Right, Tilt Up, Tilt Down
Speed: The speed a motion is performed at is a subtle indicator of what type of command is
being input into the device. Slow motions indicate a softer response to one that is performed
faster and with more confidence. Hence identical motions performed at different speeds can
indicate similar inputs but with slightly different outcomes. A slow motion may indicate the
user wants to fast forward though a music track, while a quicker motion can indicate skipping
the track altogether. Detecting speed via a video stream cannot be precise because of depth
issues, but limiting the available values makes it more viable.
Values: Fast, Slow
An intuitive motion based model for mobile devices – Requirement Specifications
50 | P a g e
Angles: When adding another direction to an input, an angle is formed (Figure 3.11.). These
angles can be classified as obtuse, acute or rounded. Obtuse and acute angles are obviously
dependent on direction, while rounded would indicate many minor direction changes over a
small area. Tests to decide what constitutes a rounded corner will have to be devised. The
additional processing power required for such a task deems that they should only have a minor
role in the input modal. The other angle types can easily be determined via vector
mathematics.
Values: Acute, Obtuse, Rounded
Figure 3-11 Angle Types
Acceleration: Speed variations within a motion are obviously beneficial in situations when
dynamic changes are common. While changing numerical values acceleration would indicate
an increase of how fast the numbers increase/decrease while a slow down would also slow
down the change to enable more precise input. Since we are relying on only two speed values
(fast and slow) this limits the possibilities of acceleration to faster, slower or no change.
Acceleration is a function of the speed tracking and its reliability will be directly linked to that
of speed.
Values: Faster, Slower, No Change
Length: The length of the accumulation of direction motions can change from minor
movement to large sweeping motions. Smaller motions require less effort and should
dominate inputs. One should also take into account that many inputs would require specific
distances to be moved to mimic motion (moving phone from pocket to ear).
An intuitive motion based model for mobile devices – Requirement Specifications
51 | P a g e
To keep things simple, ‘short’ and ‘long’ will be used as values. To determine length a
product of speed and the time taken to enter the input will be taken, inputs that exceed a set
value will be considered long, and otherwise they will be considered short.
Values: Short, Long
Scene Change: A unique input type. A scene change is not related to motion but instead the
device keeps track of the general video seen and looks for keys or significant changes to
indicate a scene change.
Changes to the environment, such as the darkness of being put into a handbag/pocket down to
the possibility of seeing a phone number on a business card could all be considered scene
changes. Many changes would be unique only to the input they are designed for since there
really is no limit to what can constitute a change. Therefore consideration must be applied to
restricting their use since such a property could easily get out of control. That said, huge
possibilities abound depending on the power of the operating hardware and software.
Values (Current): Dark, Light
As can be seen from above, some of these properties are directly related to a combination of
other properties. Therefore not all of them have to be tracked. Further investigation was
conducted to show which are the easiest to track and which supply the richest information.
3.7. Gesture Recognition When a user operates a mobile device, there are many actions that a user performs that are
directly related with the current situation of the device. These movement actions have to be
preceded or followed up with additional interaction with the phone to perform the task.
Tracking the movement of the device in these situations allows the removal of this step
altogether and hence greatly increases the users experience with the phone [17, 23].
Such communication can work in both directions, whereby the user can react to an event with
relevant motion or the device can react to a motion event generated by the user.
An intuitive motion based model for mobile devices – Requirement Specifications
52 | P a g e
There are many of these situations that can occur while working with a mobile phone device.
Possible examples include:
User → Device
When a user takes the mobile device out of their trouser pocket while the key lock is enabled.
The device notices the movement and disables the key lock once the device is in front of the
user.
User dials a number and puts phone to ear. The device makes outgoing call to number entered.
Device → User
The phone receives an incoming call and starts ringing. The user moves to phone to the ear to
answer the call.
Device receives incoming message. The user moves device to viewing position, device shows
message.
Such situation specific communications can also be applied to more highly defined
environments such as mobile gaming where game movements or actions could imitate real life
movement (sitting down, walking in a direction.)
3.7.1. Profiling users Each user may exhibit specific “quirks” while performing actions, and these unexpected
actions may need to be accessed and adapted to while the user interacts with the device. Some
users may find it more comfortable to move the device to the left than to the right and hence
can move it faster in a particular direction, or perhaps arthritis inhibits the user’s ability to
rotate the device. Even factors such as whether the user is left or right-handed or how the user
grips the device can interfere with the device’s usage in extreme circumstances.
An intuitive motion based model for mobile devices – Requirement Specifications
53 | P a g e
Therefore a user profile should be created to store the unique information about a user, this can
be created by a simple configuration program that requests the user to partake in some simple
tests to determine their abilities and how they would naturally interact with the device. A test
could be as simple as getting the user to move the device in the requested direction and
comparing with the actual direction moved. Weights can then be applied to each direction to
determine the intended direction of motion [37].
More exhaustive tests such as getting the user to manipulate a ball into a small hole could be
used to judge users reactions and how they move the device in certain situations. All of this
can be profiled and stored to aid in determining the user’s true intentions.
3.7.2. Input Prediction Being able to pick what task the user is most likely to perform next greatly assists the error
detection process. This is most notable in applications that require spelling. Internal
dictionaries can be accessed within the device to determine what a word the user is entering
can be depending on what has currently been entered [31, 39]. Weights can be applied to the
likelihood of certain letters being entered and then these letters are compared to what were
actually entered, this comparison give a final result of what the user most likely intended the
letter to be.
If the motion does not closely match any of the likely letters then it can be assumed that
whatever was actually entered is the intended input and the dictionary will not come into play.
Otherwise the weighted letters are compared to the input entered. A weighting multiplied by
closeness to weighted letter value is then created and whichever letter ends up with the highest
value is inserted.
Input prediction can also be combined with the user profiles (Section 3.6.1) to increase the success
rate by comparing input motions to most commonly used applications or answers the user uses
on their device. This information would be unique to each user and therefore prediction would
be different between users.
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
54 | P a g e
4. Survey Construction Surveys allow the collection of both qualitative and quantitative data, required for creating a
motion-based input model. Quantitative data helps define precise motions in which the users
will typically use to input their motions, while the more subjective data (qualitative) that users
supply will help define the inputs the users typically use and which should be applied to the
model. Both models will be employed to gather information relevant to the model and how it
can work.
To create a basic model it was decided that it would be best to place users in situations that
closely represented actual situations such a model could be used. To achieve this it was
decided that using a mobile device as the actual focus of the surveys was the best course of
action. Participants would interact with a mobile device when placed in specific situations and
this information would be recorded to study and compare results.
Seeing how users responded in this environment would allow some early work on defining the
model’s direction to be completed. This would then lead into a far more specialized survey
that placed the user in far more controlled situations so specific information on specific
situations could be collected.
To develop a model that can be used by the general populace an understanding of how the
users typically used a mobile device was required. Once this was understood steps could be
taken to improve this interaction by using this motion concept.
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
55 | P a g e
4.1. Aim The goals of the surveys were two-fold. Firstly, there was the discovery of whether such an
input method was in fact viable for the public to use. To achieve this goal a reasonable
percentage of the survey participants would have to understand the concepts behind motion
input. This was to be judged by how they responded to such a survey. If people required
explanations for tasks that required them to interact with the device via motion instead of
traditional means, then it would be considered a failure. In such a case, the entire concept of
motion input would have to be rethought.
Otherwise, if the participants embraced the concept (either naturally or after additional
clarification), then their responses to the survey judges how well they could adapt to a motion-
based environment. If the subject understood the concept in theory, but struggled to apply the
concepts in action (typically reverting to older methods of interaction to respond to a
situation). The input interaction would only be considered a mild success.
The second goal of the surveys was to create the basis of an interaction model for simple (low-
level) inputs, as well as a few phone-specific high level inputs. Because of the constraints
applied by the first success criteria, information could only be gathered from participants who
responded positively to the first criteria and were able to give meaningful responses to the
survey. Information gathered from the remaining participants will be used to improve the
entire understanding process.
4.2. Survey Structure Early on in the creation of the survey, it was decided to break the entire process into two
specific surveys (one qualitative and one quantitative). A qualitative pilot survey would be
used to gain basic information about the participant themselves, their prior mobile device
usage and their general understanding of how they would use a motion based input model.
This would be obtained by placing them in some informal situations and seeing how they
reacted to the situations. The testing procedure is discussed in section 4.3
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
56 | P a g e
The information gathered was then used as a basis for a quantitative survey by seeing how
users responded to the motion based situation. A formal survey in a controlled environment
was then applied to a larger user-base to gain detailed information. This information has been
used to create a formal model that defines how the input types from section 3.5.2 can be
mapped directly to simple motion inputs.
4.3. Survey One – Initial Data Collecting The pilot survey was designed to gain a broader concept of the general population's grasp and
understanding, and using a motion-based input model in its simplest form. A general
understanding of the users' background and knowledge was gathered by survey. This was
performed in order to profile the users before their grasp of the input model was recorded.
This information was then used as a baseline in creating the second survey.
Information collected would be used to study the feasibility of such a model as well as define
some important guidelines on how a standard representation of the population perceives how
they would interact with a mobile device via motion.
4.3.1. Design As the pilot survey targeted a population that would almost certainly have no familiarity to this
input concept, the survey had to be created in a way not to intimidate these participants. As
such, an informal approach to data collection was employed. General questions were asked of
the user, as well as simple tasks performed. In both sections, users were offered plenty of time
so that they were comfortable with what they were being asked. Sections 4.3.1.1 and 2
describe the two parts of the survey, which seek the understanding of the users and assess their
reactions.
4.3.1.1. Survey One: Part One – Understanding the Users To understand the users, their past experiences and proficiency with mobile devices was
collected. This information was collected via a simple written multiple question survey
(Appendix. C) These questions included:
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
57 | P a g e
How long had the person owned a mobile device?
The more a user works with a specific technology there is an increased likelihood the user will
embrace new concepts the device can offer. Knowing this length of time allows us to not only
see if it applies here, but also gives us a guide and reference to their proficiency with mobile
devices. If people struggle to embrace the concept there is a high possibility that such an input
model would have difficulties in being accepted or adopted by the public as a whole.
What previous models of mobile phones had the user used?
With competing brands using differing interfaces and key interaction concepts there is a
possibility that people will have a specific mindset on what functionality is available and how
to access it. Such a mindset will have been subconsciously trained over time, especially if the
user has developed a brand loyalty and had only been exposed to one path of input concepts.
These thoughts are very likely to pass over to a motion framework and a motion framework
should be given weighting from input concepts from more popular devices. Tracking how
people with differing mindsets interact with a phone will become a large factor in designing an
input model.
What level of competency do they believe they had reached with these phones?
More experienced users may typically employ shortcuts to access functionality; this familiarity
may pass over to motion. On the other hand, inexperienced users may be ignorant of
functionality and this could in fact be detrimental to their understanding of the motion based
survey. Users might instinctively try to apply these shortcut concepts to the motion model
since they are so used to doing them. An example would be locking a mobile phone so the
input keys do not respond. This is a typical shortcut where the motion to apply the input does
not have any grounding in logic, only efficiency.
On the other hand, users with the knowledge of shortcuts may already have the understanding
of multiple paths to obtain the same result. Such users may be more open to the concepts of a
motion model.
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
58 | P a g e
What functionality was available on their current phones, and was it taken advantage of?
Tying in with the previous question, knowing what the device can do plays a huge factor in
what the user does with the device, and how to go about doing it. Knowing a device can do
something aids the user in visualizing how to perform this function. If the user has trouble
grasping the concept in general, then their input (motion) may be flawed.
Key profile questions are also included to see if certain demographics react differently to each
other given the same situation.
4.3.1.2. Survey One: Part Two – Gauging User’s Reactions Users were given part two of the survey along with the written part one. This was so that they
had ample time to examine what they were going to be asked and what they were actually
partaking in by performing the second part of the survey. The surveys were supplied to the
participant the day before the actual survey so they could fill in part one when they saw fit as
well as understand what would be asked of them in part two. The goal was to find how users
could incorporate motion into some very simple situations that could occur upon a mobile
device.
The user was allowed to use their own mobile device as a prop to act out the situations
presented to them. This ensured a familiarity with the device they were using. The device was
to be held upside down to ensure buttons did not interfere with the interaction and ensure than
only motion was used. Users were recorded over the left shoulder as they performed the
motions (Figure 4.1.). A whiteboard was at all times behind the user so that background
information did not interfere with the video recorded. The video was recorded by hand to
allow focus to move around in case the device was obstructed by the user.
The users were allowed to ask questions before the survey because of its informal nature to
gain more of an understanding of what was required. During the survey itself, further
information was also supplied if the user appeared to be struggling.
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
59 | P a g e
Figure 4-1: Basic Outline of Survey Situation
The simple situations to act out were designed to cover each of the motion input types
previously defined (choosing, selection, confirmation, adjustment, movement, functionality,
menu interaction and modification.) These tests were:
1. Selecting an option from a vertical list displayed to the user (movement, choosing, selection)
2. Saying yes to a prompt (confirmation)
3. Saying no to a prompt (confirmation)
4. Selecting an option from a vertical list (movement, choosing, selection, menu interaction)
5. Rotating an object on the screen to the left (adjustment)
6. Increasing a displayed number by two (adjustment)
7. Increasing device volume (functionality)
8. Pan right while viewing an image (movement)
9. Reloading a webpage (functionality)
Questions asked are also included in Appendix C. Information was recorded digitally,
whereby all of the motions for a user recorded to one file. Voice prompts were used to
indicate the start of each motion and allowed the repeat of a motion if the user became
confused part of the way through.
4.3.2. Participants Thirty participants were interviewed for this initial pilot survey. All thirty people answered
part one of the surveys, while twenty-five of these completed part two successfully.
Participants were aged from seventeen to thirty-six and were from a variety of backgrounds
with the common factor being the ownership of a mobile phone. The breakdown of user
information is included in Appendix D in quantitative format.
4.3.3. Data Collection Methods Part One
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
60 | P a g e
Surveys were distributed as hardcopy with ample space for users to record information. These
were processed into a spreadsheet so information could easily be compared, before being filed
away. The survey itself is included verbatim in Appendix C. Every user was supplied with a
copy of the survey at least a day before it was expected to be completed, giving the
participants ample time to consider their answers.
Eight of the thirty users had to be re-supplied with the survey during the time allocated for
survey part two since they had failed to bring their originally supplied copy. These were
completed before part two commenced.
Part Two
Each video file for the questions was prefixed with the participant’s name. They are encoded
in Windows Media Video 9 with Lame MP3 audio encoding. Each collection of video is
stored in a separate folder per participant. Since survey sheets were marked and contained
occasional notes by myself and the participants (user feedback, personal observations during
recording (usually revolving around participant confusion and retakes of video)), the
information related to the video was also stored in a text file that followed the same naming
procedure.
Typically the video files were under 2 minutes in length, though in situations where additional
information (instructions and questions) occurred, this was longer.
4.3.4. Ethical Considerations All forms supplied and described to the participants (Appendix C), and included information
regarding who to contact regarding ethical issues. Participants were also given ample time
prior to commencement of the survey to voice any objections to what was being asked. No
complaints were received.
4.3.5. Initial Analysis It was apparent at this early date that there were two distinct groups of users; those who had
little trouble applying motion to the simple input tasks supplied, and those who seemed
uncomfortable or confused by the concept.
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
61 | P a g e
The confused users typically looked up for guidance or prompts (this can be attributed once
again to the informal nature of the survey) or those who attempted to use the traditional
features of their own phone to perform the tasks (by turning the phone over and using the
keys.) This was often accompanied with a comment along the lines of ‘this is how I usually do
it’ indicating they were not comfortable with the motion concept in general.
Data collected from users who understood the requirements of the survey appeared to be
relatively consistent. A lot of the motions between these users followed the same movement
paradigms as other users based on these questions. One point of interest is that it appears a lot
of users over emphasised their motions. Perhaps users were just ensuring their motions were
captured, or thought such large motions would make it easier for the device to understand.
Large motions like these are typically not what is being looked for and therefore were not
appropriate natural motion. With the exception of this scaling issue of motions, the supplied
answers were consistent.
The most common answers supplied were:
Please perform natural motion that you believe would best interpret the choosing of an object
in a vertical list (demonstrate movement down and up the list and the choosing of the object)
This was indicated by Direction Up and Down (y-axis) to select an item followed by direction
away (z-axis) from user to select.
A motion to confirm (say yes) to an action.
A combination of Direction Up and Down and Rotation Forward and back on the X-axis
• A motion to deny (say no) to an action.
This was Rotation Left and Right along the Y-axis and Z-axis, no direction information like
the above yes example. This is the first instancing of rotations and direction confusion.
• Rotate an object on the screen to the left. Example:
Rotation left along the Z-axis. Appeared to be the simplest input for people to articulate
• There is the number 18 in a box, how would you increase this by 2 (to 20).
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
62 | P a g e
This resulted in many different responses, but the most common was Direction Up followed by
Direction down to the starting point twice. Directions seemed confusing to a lot of people,
perhaps such precise inputs would need a better knowledge of the mode beforehand to
perform?
• How would you increase the device's volume while talking on the phone.
This was answered by moving the device Direction down in general. Such a motion would
move the device away from the ear and make the phone conversation harder, so it is not the
ideal answer.
• Reload a web page that is being currently viewed.
The shaking of the device here was by far the most popular answer. Perhaps it is the confusion
of the user without knowledge on how to express the answer. Or maybe this was a motion
indication of the reload symbol typically seen on web browsers. I assumed the first.
• Answer an incoming phone call.
Most users moved the device up to their ear. There was significantly better than average
understanding of this question. In hindsight, it should have been the first or second question
on the survey. But because of the survey being conducted during holidays where people were
readily available, it was decided to not restructure this survey, but instead take this finding into
the next data collection stage.
• Pan right while viewing an image.
Direction Right (x-axis) was the most common answer.
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
63 | P a g e
It has become apparent that the order in which the questions were presented was an important
factor in the outcome of the pilot survey. Simple motions that were typically used (e.g. answer
phone) and questions with direction prompts (e.g. panning right) were answered with less
confusion and more often, than the less defined questions. Ordering the questions via this
perceived difficulty scale may have resulted in more successful completions of the survey.
19 of the 30 participants could be classified as being comfortable with the concept of applying
motion to register as inputs on their mobile device. The remaining 11 all had the same
objections and mind-set, one which was apprehensive of using motions. This is something that
could most likely be overcome by a gentler familiarisation process.
4.3.6. Particulars of Note • Many of the users who had no problem completing the survey expressed surprise that prior
participants had informed them the survey was confusing.
• Often when users asked about redoing a specific motion question, their second and third
attempts were identical, thus placing emphasis on the importance of natural responses.
These motions were also always comparable to responses by other participants. When
asked why they needed to redo them afterwards, a typical response was that the answer was
not good enough.
• The variance in answers was significantly less than expected.
• Reloading a web page caused the most confusion, even though it was placed last in the list
(in an attempt to lessen its impact because of obscurity).
• Two participants performed the exact same motion for all of the inputs. One appeared to
be trying to emulate double tapping a mouse, while the other performed a simple shake
each time.
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
64 | P a g e
4.4. Development of an Autonomous Survey on a PDA Autonomous surveys follow a far more quantitative path than traditional face-to-face surveys
as the information given to the user and the results recorded are without influence from third
parties. An automated survey is strictly between the participant and the device so additional
preparations must be made. However, it would have made the actual survey task a lot easier if
it had commenced earlier. Such a survey could be used to follow up from the initial survey
conducted to collect far more exact data from users now that the basics had been established.
4.4.1. An automated survey versus traditional means Both methodologies have advantages, and these were examined to determine which path offers
not only the simplest passage, but the most useful final result. Therefore, a study of what
information needed to be collected was performed to determine which type will yield the most
useful results.
Typically the scope of the data collected is an important factor on determining the usefulness
of an automated survey. If there is a limited scope of answers that a user can choose then
automated surveys are useful, but so would multiple choice. Therefore the type of data
collected also becomes a factor. If one can take advantage of the mechanisms available in the
mobile device while collecting and leveraging the data then they suddenly become a very
useful tool in the process.
4.4.2. Smartphone Development There are many Smartphones™ out in the market at this time competing for market supremacy.
These include Microsoft (http://www.microsoft.com/windowsmobile/smartphone/default.mspx/) phones (created by 3rd
parties and re-sold,) Symbian™ phones (typically Sony Ericsson and Nokia,) Palm
(http://www.palmone.com/) and BlackBerry™ (http://www.blackberry.com/) devices. Most companies (with the
possible exception of Motorola) have stacked their chips in the corner of one of the above
devices. This makes the industry not only very fragmented but very fast moving as new
standards and development ideas are constantly being introduced.
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
65 | P a g e
4.4.2.1. Device Information Smartphones™ are often considered a convergence of a regular mobile phone and a PDA (Fig
4.2.). Microsoft Smartphones™ are usually developed offshore (typically in China by
companies such as HTC) and resold in the separate regions as rebadged phones by companies
such as I-Mate (http://www.carrierdevices.com.au/,) Orange (http://www.orange.co.uk/,) O2 (http://www.o2.com/) and QTek (http://www.qtek.fi/).
Figure 4-2: I-Mate Smartphone2 vs O2 Xphone vs Orange SPV E200 (HTC Voyager)
While it varies between devices, most devices run a Texas Instruments OMAP or Intel XScale
processor running in the range of 132-624Mhz with 32-128 Megabytes of RAM for storage
and the same range of ROM to hold the operating system [41]. Devices have an ISO compliant
key layout as well as a 4/8/9 way toggle-stick and hardware buttons to operate the camera or to
take voice notes. Most devices are tri-band capable and have methods for external storage
(MMC, SD cards or mini-SD cards) and a variety of communication modules (Bluetooth,
infra-red, GPRS.)
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
66 | P a g e
The operating system itself had gone though many revisions over the years, originating with
Windows CE 1.0 which was developed back in 1996 for miniature devices. This had
undergone many revisions (CE though to 5.0) and branches (Windows Mobile for PocketPC,
Windows Mobile for PocketPC – Phone Edition and Windows Smartphone™). Each of these
has had sub-versions as well (Smartphone™ 2002, 2003, 2003 SE.) The operating system
comes with functionality that is quite similar to a base install of Windows 98, offering a
contact book, solitaire, internet explorer and phone specific options such as dedicated SMS
and MMS programs. The devices can be synchronised with a desktop computer via
ActiveSync and information can be shared between devices, this is usually through a USB
cradle or Bluetooth adapter.
Of particular interest is the recent release of Windows Mobile 5 [22], which has been living
under the codename “Magneto” for the last year. It has been designed as a total solution for
the mobile device market where it will work for PocketPC™, PocketPC™ with phone
capabilities and Smartphones™. Along with yet another new generation of mobile hardware it
will be an interesting development.
4.4.2.2. .NET CF and Embedded C++ This project utilises the Microsoft .NET Compact Framework (.NET CF) that was first
introduced in the Smartphone™ 2003/PocketPC 2003 as a successor to Embedded C++ 4.0. It
follows many of its parent (Microsoft .NET) design traits such as garbage collection,
consistent typing and delegate event handling [38]. It also has the major bonus of integrating
into the Microsoft Visual Studio development environment.
The use of the .Net CF for this project does carry with it some significant weaknesses, which
in general can be addressed by the use of unmanaged C++. The .Net framework languages are
interpreted via the Common Language Runtime (CLR), making them inherently slower than
compiled machine code, and this can become very noticeable on mobile devices. In addition,
the .Net CF is a significantly cut down version of the full .NET Framework, with a large
proportion of the class libraries unavailable [18].
The .NET CF’s ability to interact with other languages [18] allows the use of compiled C++ (via
linked libraries) to be available to the C# .Net runtime, and to thereby significantly improve
performance.
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
67 | P a g e
This project also utilises the open compact framework initiative, [27] which is an attempt to add
functionality to the existing .NET compact framework. Such functions made available such as
load/save dialog boxes are used by nearly every .NET CF developer.
4.4.2.3. Camera API Cameras currently available on Microsoft Smartphones™ have a significant bonus for
developers over most other camera-phones available, in that they are run on the software level
of the operating system rather than on an independent hardware chip. This means that a
developer can add hooks to the operating system to directly affect/read information coming
from the camera. Unfortunately, while this is possible, the API to work with the camera is not
documented, making implementation a matter of trial and error. The release of Windows
Mobile 5 has addressed this issue.
An open source initiative [9] to interoperate with the camera was produced though that
specifically only works on certain devices. The implementation of this is very close to the
.NET solution in the .NETcf version 2. Unfortunately, this solution was not sufficient for the
goals of this research. The Windows Mobile 5 SDK allows interaction with these cameras via
DirectShow, though documentation is extremely sparse. The procedures for using the camera
and DirectShow are included in Chapter 7.
4.4.3. Input mediums of a PDA Using an electronic device gives a lot more freedom and avenues to explore different
possibilities when conducting a survey. As these devices become more and more convergent
and aim to become the ultimate all-in-one device, their functionalities increase. Using these
additional features allows a far more complete experience when collecting data, and in the long
run, if designed well, allows the data collection task to be far easier.
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
68 | P a g e
4.4.3.1. Textual Input In developing an automated survey, there are multiple mechanisms the developer can monitor
in the device to collect this information. With touch screens written input has become a
possibility, but that alone is not a sufficient tool since in the participants mind, writing an
answer on paper would be just as easy, possibly more so. But if there is additional information
that needs to be derived from this written text, then this can be collected by a device.
Examples of additional information that can be gathered from text written to a PDA includes:
• The stroke order of lines
• Previous answers (erased)
• The time taken to write out the answer
• Response time before writing commenced
• If any periods of thought (non-action) occurred during the written answer.
All of this additional information can be informative when trying to parse additional meaning
from text and, once again, be very useful if the situation is right. For example, if you were
trying to read the thought processes behind an answer, instead of just the answer supplied.
4.4.3.2. Touch Input A device’s touch screen is not limited to just textual inputs. A selection of items can be done
via touch as well as by text. Displaying information to the user then requesting feedback by
allowing the user to interact with the displayed information can be much more intuitive. For
example, when making a participant choose between pictures all displayed upon the screen at
the same time. Touch also allows the option of using implements other than pencils/pens as
the input mechanism (i.e. fingers). Again details such as response time can be recorded. Other
information such as where the participant chose the option becomes available (and easily
recordable as the selection point can be tracked to the pixel).
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
69 | P a g e
Other ways of using touch become available which are impossible on paper. Such inputs such
as sliding information around on the screen are possible with an interactive medium. A user
could be requested to sort a jumble of numbers into order. In this example, the process of
achieving the final answer is just as important as the answer itself. With pen and paper there is
no viable way to collect this information. Without an interactive medium then it would have
to be done physically and recorded via video. This makes the survey process itself
significantly more complex.
4.4.3.3. Audio Devices have in-built microphones capable of capturing audio much like any standard voice
recorder. This audio by itself is not a sufficient reason to lean towards an automated survey.
But with processing power, this audio can be processed during the course of the survey
allowing other the survey to be moulded depending on the audio. This could be as simple as
filtering out unneeded audio, or physically modifying the path the survey takes depending on
the audio.
If there is a large background noise being recorded while the survey takes place then the
volume of audio output by the device can be lifted dynamically based on this amount. Surveys
could wait for a significant quiet period (participant has finished talking) before going on to
the next question. This audio could directly interact with other cues as well (take visual for
example) with what the user seeing being directly related to what they say.
4.4.3.4. Video Recording video from a device allows capture from a first person perspective (i.e. directly
what the user sees). This perspective can grab a lot of additional information that static camera
locations would fail to pick up.
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
70 | P a g e
Usage information can also be retrieved by analysing video taken by the camera. Fingers and
hands can interfere with the video. While this could be considered a hindrance, it also offers
information. This can define not only how the device is being held during a question, or even
which hand it is being held in. Swapping of hands is also easy to pick up with such video.
The angles a device is held at can also be compared between participants.
Angles can also change between questions, and this can help give information of the responses
being supplied. Sharp and sudden changes of the device can indicate a person under more
pressure while a slower motion can indicate a calmer response. With questions specifically
designed to place the participant under duress, or to make them feel comfortable, this
information can be invaluable.
Many devices also include cameras that point towards the person’s face. These cameras are
typically small and used for voice chat, but can be valuable sources of information to see the
user’s natural physical responses to questions.
4.4.3.5. Motion By processing the data between frames of video it is possible to detect the general motion of a
device. This data can be collected to read natural motion reactions to questions and situations.
It also allows questions to be created that allow movement as a response.
Using motion has been examined in depth throughout this document.
4.4.3.6. Other There are other resources available on mobile devices that can be leveraged. Wireless
communication protocols such as Bluetooth and infra-red are being used to control external
devices automatically. Bluetooth is commonly used to interact with headsets while many
applications exist to make PDAs act like a TV remote
(http://www.pdawin.com/tvremote.html). Being able to control and fire-off events on other
devices like this can bring a true level of interactivity to a survey.
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
71 | P a g e
Other devices (with no need to be portable) can be used to add elements not available or
simply not physically possible on a mobile device. Infra-red can be used to start recording on
a web-camera to enhance the video information collected by another stream, or it could be
used much like a remote and pause playback/recording on a VCR [36]. Surround sound setups
can be used to track reactions, and communication protocols can be used by the device to start
such audio on a computer located nearby.
GPS functionality and tracking could be used to determine location and movement during a
survey. Such environmental information can influence results during a survey, and having this
information at hand allows the researcher to study this. GPS by itself can also be used as a
mechanism to collect answers. Questions can be designed asking the participants to travel to
certain locations, for example, the time taken to get from location A to B would easily be
tracked using such a mobile device.
Other standard information can very easily be taken advantage of during a survey. Elements
revolving around time can easily be taken such as the time taken to complete a survey (or even
limitations to the time allowed to answer it) the time and date a survey took place or data
structures such as date of births.
4.4.4. Information Storage These devices are capable of recording large amount of information at relatively fast rates.
This has the downfall of having a lot of information to parse though if the storage is not
carefully planned. Hence, an effective and obvious methodology to store data is required.
4.4.4.1. Video and Audio Care has to be taken when writing the data collected by automated means. Devices have
limited storage space, so judicious use of this resource is required. Video in particular, even at
low quality settings and resolution, can consume an unacceptable multiple megabytes per
minute. Even with large storage cards these devices can struggle to push so much data onto a
card at such a high rate, as this video throughput can often exceed the write speed of storage
cards. This results in buffers getting full and doing the only thing possible, dropping data, and
hence dropping frames in the saved video.
Other problems resulting from this include battery drain (even on solid state media) and
increased down time as data has to be transferred over to another media format more
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
72 | P a g e
frequently as storage fills quicker. Obviously the solution to this is data compression, and
thankfully such additional overhead in processing power is well within the means of current
generation devices to perform on the fly, be it hardware or software encoded.
Considering that the data collected on devices is typically of a lower quality, lossy
compression techniques yield good results when applied to the video stream. Filters applied to
the stream can be much harsher if only certain bits of information are required from the stream
and they are clearly defined. Gray-scaling or even two-tone video can be valid options if only
specific features are being looked for. These techniques can decrease data usage dramatically.
Processing video in real-time while the survey takes place can also allow the extraction of only
the data required and being recorded as it happens, typically in a non-video format. Video
information about the darkness of the environment while the survey takes place can easily be
averaged out and given a numerical value instead of video being recorded and allowing human
interpretation of the data, or possibly even automated processing externally when it could have
been done while recording in the first place.
4.4.4.2. Other Data As with other means of data collection, storing results should be both convenient and easy to
access, interpret and compare/. Therefore binary streams of data should be free of redundant
data and clearly named, possibly even self describing (with, for example, XML tags).
Archiving information can be automated as well and can be achieved via multiple means.
Such archiving can be from simply storing a second copy of data upon the device after the first
is recorded, or streaming low bandwidth data to an external mechanism to store.
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
73 | P a g e
4.4.5. Questions and Survey Design Surveys developed for mobile devices simply can not follow traditional computer-based
usability survey guidelines because of the lack of many features such as keyboards and mice
(Sect 4.4.3). A mobile device also adds many limitations not commonly encountered (such as
screen size, Sect 4.4.2.1.) Therefore, there are additional factors to take into account during
both the design and collection stages that do not occur during the creation of surveys on
traditional computers.
4.4.5.1. Survey Automated surveys can take advantage of the above input techniques in an attempt to take full
advantage of the device. Ideally the participant should be informed as little as possible about
these technologies available. This way the participants will hopefully be ignorant of these
mechanisms and not attempt to over-compensate for perceived weakness in the devices
(speaking louder/slower than normal for example). It became apparent in both surveys
undertaken in this research that several participants over compensated their movements in such
a way to ensure they were captured. This was most noticeable with large, over-emphasised
motions.
With such a device, there are plenty of avenues to travel, but with the resources available, we
can make the surveys fun! Enjoyable, stress-free environments result in more realistic answers
than an environment where the participant simply wants to get out of there [6].
4.4.5.2. Questions When posing a question, resources should be used to their fullest. The devices are capable of
writing out and speaking questions and can use animated diagrams to explain what is required
and expected of the participant. Being automated should mean as little involvement by the
researcher as possible. The questions should be developed to handle the possible scenarios
that can occur. If the person does not understand then a repeat of the question, or a way to find
further information via the device should be available.
During the course of these surveys, when people were confused with what they had to do, their
answers were unrealistic. With better defined questions the amount of flawed answers will
decrease. A review over the answers given showed that questions designed to be as clear and
instinctive as possible resulted in the highest percentage of usable answers.
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
74 | P a g e
If the participant makes a mistake then there should be a plan of action available to either skip
or repeat the question. Other segments that need to be considered if the scenario occurs
include:
• Does the stored file get overwritten?
• Is their a way to avoid having to take the entire survey again to just get the one answer?
• Is a second attempt at the question going to give more valuable data, or are instinctive
responses more important?
Such details and the repercussions of such actions must be considered when designing
questions.
4.4.6. Survey Environments When designing questions to take advantage of the input mechanisms available, the
environments these surveys take place in should be designed in such a way to interfere with
this data as little as possible, and wherever possible actually facilitate in its collection.
While a truly mobile survey cannot be guaranteed to take place in an ideal environment, this
does not mean automated surveys cannot. In the case of the former, question design and data
collection techniques have to be developed in such a way to ensure the data in unaffected by
external stimuli as much as possible. But with automated surveys in controlled environments,
the data may be much more valuable if it is indeed affected by the environment.
4.4.6.1. Appropriate environments for use of an automated survey In standard surveys, the environment is typically designed to provoke a certain feeling in the
participant so that answers supplied are appropriate while the participant is in a certain state of
mind. Normally these environments are designed to ensure the participant is not nervous
(calming and relaxing environments), is clearly thinking (organized and clean) or perhaps is in
a specific mindset (poster placements around the room, or conveniently placed food and
beverages).
Such environments apply again to automated surveys depending on what information is to be
collected. But additional factors should be considered when preparing environments, in
particular when collecting video.
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
75 | P a g e
With the video being collecting typically being of a lower quality then the environment can be
set up to enhance the information retrieved. Since controlled environments are typically rooms
then the features available in the room should be clearly identifiable in that room. Objects
should be easy to spot in the video; this can be done with strong contrasting colours or easy to
recognize (unique and easily identifiable shapes.) Objects such as books all have easily to
recognize shapes (rectangles) and are not limited by colour. Such items are easily identifiable
when the camera is moving or the image is blurry.
Strong contrasting walls and floors along with straight lines also make video information
easier to process. Unique items that are evenly spaced out and uncluttered also make them
easier to notice in a video stream.
Quiet environments make audio easier to record, so the absence of humming computers or
additional electronic devices can make this data easier to process.
4.4.6.2. Preparing environments for more meaningful video results When preparing environments to aid in the data collection from automated surveys, certain
items can assist in the collection of information, in particular motion. With the device rapidly
moving and rotating, the easiest way to process this information is if there is always a
reference image on the screen that can identify the device’s location viewing angle and
rotation. Placing such ‘icons’ around the environment can greatly assist with this motion
tracking process. Not only can the devices properties be collected when an icon is on the
screen, but transformations between frames (icon to icon) can also be tracked.
To be able to gather this information, the icons would have to be unique and difficult to
confuse or misinterpret. This allows the researcher to determine when the device is being
pointed but the icons would have to adhere to other properties to gather rotation information.
Icons would have to be designed in such a way that their rotation can always be determined.
While many designs offer this, the easiest way is to identify which part of the icon is up (Figure
4.3).
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
76 | P a g e
Figure 4-3: A Sample Icon with a Tilde Representing Up
Icon spacing should also be devise so that the video stream will not be over cluttered with
information (so typically not more than one icon in the picture at one). But sufficient enough
that the video stream is not totally void of such information.
4.4.7. Common Obstacles Many people are intimidated by electronics and this is one of the major hurdles of an
automated electronic survey and if such demographics are a major target of the research work
then one must seriously consider if using an automated survey is in fact the best way to
proceed. Question design can ease this problem by making the survey itself more fun and
gently guiding such users into using the device by staging the questions by levels of
increasingly perceived difficulty.
It is possible that if this is the demographic being targeted that additional input streams are
recorded simply to collect any data being recorded, even if it is not of the type requested.
Audio is the best backup solution as people can often supply useful information if confused
and are aware that the device is recording their voice.
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
77 | P a g e
The direct opposite can also pose problems as people who are comfortable with such devices
can be extremely set in their ways of interacting with them. If people insist on interacting with
the devices via their comfortable means, usually buttons or touching, then it must be decided
how to in fact deal with this. Should it be recorded at all? Again, survey design plays a major
factor in this.
As with all software engineering, automated surveys need to be tested. When posed with
certain situations, participants might respond in unexpected ways. If these responses are not
handled then surveys may record wrong information or crash/hang. Black-box testing with
testers who could be applied to the survey demographic will be by far the best way to ensure
the surveys are stable and handle all extreme situations appropriately.
4.4.8. Device Resources With a device that is capable of communicating information to its users in a variety of ways
there is little reason not to take advantage of these mechanisms. Simultaneous text and audio
can be used to outline what is required of the participant to answer questions and imagery can
easily be included in the questions. Being able to repeat a question in exactly the same manner
as it was originally delivered is simple and should be available to be used before actual
recording take place.
If a device has phone capabilities, there is no reason why these functions should not be taken
advantage of. By allowing the device to simulate phone calls and letting the user give
responses as if they are talking on the phone, this may in fact make them a lot more
comfortable with the entire process.
4.4.9. Automated Survey Summary Automated surveys such as the ones undertaken in this research can be of great benefit to the
researcher if they are designed to take advantage of the resources offered by the devices.
Using such a survey also contributes to familiarisation with video programming on these
mobile devices and such information aids greatly during development
As development for mobile devices becomes more common I can foresee frameworks for such
surveys being developed as the development time is the only significant detriment of such a
An intuitive motion based model for mobile devices – Development of an Autonomous Survey on a PDA
78 | P a g e
survey style. Once such a framework is developed and such a path is open to most researchers
the true benefits will truly be seen.
4.5. Survey Two – Testing User Decisions and Reactions The data gathered from the first survey pointed out two specific factors that had to be
eliminated to gain more useful information. The surveys needed to be far more formal; this
would mean no more answering of questions during the actual survey and the users had to be
less familiar with the devices they were using as props.
It would be expected that survey participants would ask less questions in the following survey
if they had partaken in survey number one. However, considering that the planned number of
participants increased, the issue would remain. Therefore the users would have to be more
informed of what was expected of them. For this larger scale survey, users were supplied with
a document explaining the concepts of a motion-based input model via a web page to examine
and understand. Questions would have to be fielded and answered prior to the survey taking
place.
To ensure a more controlled environment, allowing users to use their own mobile devices as
the basis for the survey had to be scraped. Using a device with no notable buttons (a PDA) for
all participants covered this and eliminated the urge for users to attempt to use buttons as
input.
Since the pilot survey had a significant success rate and the possibility of such a model was
deemed valid there was a need to actually define the model with user input. Therefore a better
defined and further controlled survey was required to collect this information.
Survey two was conducted approximately four months after the original survey and was
developed as a fully automated process (further described above). The goal of the survey was
to obtain how a user reacted to specific situations that device itself gave the user. This ensured
a level of realism that the first survey could not. For example, the device was able to ring
when it wanted the user to answer it or it could change its volume when wanting to see how
the user reacted to quiet and loud noises.
An intuitive motion based model for mobile devices – Survey Construction
79 | P a g e
Again the situations were split into input types (Section. 3.5.2.) but were better defined and more
specific than the original survey. Each of these situations is described in more detail in this
section.
4.5.1. Design The informality of the initial survey had to be removed and quantitative data had to be
collected. This meant that all the information regarding motions had to be supported with hard
data (in this case video). Also the users had to receive exactly the same information from the
questions and derive their own understanding from it. These factors suggested using a single
device as the focal point of the entire survey.
The device used to collect information from the users was a HTC Universal, branded as an
IMATE JasJar [41]. The JasJar is a Windows Mobile 5 device that could be used as a PDA
(Figure 4.4) or a mini-laptop like device (Figure 4.5). After demonstrating the device to users in PDA-
mode while running the native camera application, it was noticed that the lack of buttons made
people interact with it via motion in multiple situations. When queried about this phenomenon
users responded that these were instinctive motions, indicating that the users were not actually
aware that they were performing. This was exactly the information that was required.
Figure 4-4: Universal in Laptop (landscape) mode
An intuitive motion based model for mobile devices – Survey Construction
80 | P a g e
The devices were programmed to run applications that supplied information to the users via
traditional means (audio and text) and the reactions to these applications could be recorded.
We believe this automated approach is a much better platform for carrying out the qualitative
surveys required for this aspect of the research. Multiple bonuses were outlined by taking this
approach:
• Consistent results; all participants were supplied with the same device and identical
scenarios to respond to. The lack of external influences would greatly increase the quality
of the data retrieved.
• An early test bed for developing applications on these devices would be achieved. Much
of the information at this stage could be passed on to later stages of design and
development.
• All data could be retrieved and stored automatically in digital form, simplifying the
storage.
• Participants were supplied with an unfamiliar device with a limited amount of buttons to
interact with. This would maximize the possibility of the participants actually using
motion to express their input.
• Tests of the device’s internal camera to capture motions could be tested in a real-life
scenario.
Figure 4-5: Universal in PDA (Portrait)
An intuitive motion based model for mobile devices – Survey Construction
81 | P a g e
A significant downside was that there was be an increased time between surveys, as the
development of these automated surveys needed to be completed before they could be used.
Considering that the time spent in development at this stage would prove beneficial at a later
stage of development, it was deemed a minor sacrifice.
4.5.2. Data Collection Methods Each video was recorded internally by the device inside an ASF wrapper with no audio stream.
Video was encoded in the Microsoft MPEG4 compressor with default properties at a 176x144
resolution. Each of the tests was stored in a separate video file with the following headers (Table
1).
Table 1: Header of Video Filenames for Survey Two
Test Header
Choosing ‘CH’
Adjustment ‘AD’
Modification ‘MO’
Confirmation ‘CO’
Functionality ‘FU’
Functionality2 ‘FN’
These were appended with an incrementing number to denote which participant this was. This
number was stored in the device’s registry and incremented by one at the end of set of tests.
To ensure this number survived after files were copied across to a more permanent medium
(laptop), this was generally done after every three to four participants. The Video was then
batch converted from the ASF container to AVI using the XVid encoder with the quality
quantisizer set at 7 (very high considering the video quality). This was saved directly to an
external 3.5inch hard-drive in a caddy, connected via USB2.0.
The HTC Universal device is equipped with two embedded cameras. One is used as a standard
low-resolution video/photo mechanism while the other points towards the user and is aimed
purely for voice calls. While using both devices might have been possible it was decided to
rely on recording purely the external (first) camera. While capable of capturing up the
resolutions of 640 by 480 pixels, at this rate the frame rate is limited (around 10-15frames per
second) but is also suspect to frame skipping like most software based cameras. Therefore all
captures were performed at 176 by 144 pixels throughout the survey phase.
An intuitive motion based model for mobile devices – Survey Construction
82 | P a g e
At this rate, it was possible to capture at greater than 20 frames per second with more than
acceptable image clarity and limited blurring from fast movement. The data from each test
was recorded and encoded on the fly by the device into an Advanced Streaming Format (ASF)
container. Only video was captured during the tests so no audio stream was recorded during
the process (Figure 4.5.).This resulted in minor playback issues upon desktop platforms so it was
transcoded into a MPEG4 AVI container to ease compatibility and playback. All video
attributes during this transcoding were kept (depth, size) and the bit rate of the videos was
slightly increased to compensate for different compression techniques used by the filters to
ensure the videos were as close as possible to the originals. Mencoder
(http://www.mplayerhq.hu/) was used for conversion into the MPEG4 format.
With each participant partaking in six different tests (Section. 4.5.4.), each of these video clips was
saved to a different file along with a unique identifier mapped to the person’s name
(CHdJK.asf, the CH denoting Choosing (Section 4.5.2.)). Each of these (after being transcoded)
was organized depending on which test they were recorded by. This allowed easier viewing of
the videos based on the same input type.
Figure 4-5: Sample Capture
4.5.3. Survey Conditions All motion survey was conducted within the Multimedia Lab (Room S710) during the Easter
holidays of 2006 to ensure a quiet and uninterrupted environment. Every participant was alone
during the survey as the device prompted the user with each question in turn and recorded all
data. I remained behind a partition to ensure I did not detract the user or exert any influence.
This included ignoring all questions asked.
With video now being collected from the device instead of over the person’s shoulder there
was a need for an environment to be easily identifiable. Also, for collecting quantitative data
An intuitive motion based model for mobile devices – Survey Construction
83 | P a g e
there was a need to be able to put values to this information. To facilitate in this, a relatively
bare room was selected to be the location of the survey. Easily identifiable markers were
placed over the room so motion could be tracked easier at a later date (Fig 4.6.).
Figure 4-6: Controlled Room Layout with Markers
These markers were placed to attempt to gather all the necessary information from the user in
regard to how they moved the device in response to the situations. Blind tests were conducted
to determine how movement could be tested in the environment. A third party took snapshots
from the room at various locations and at different angles to test the efficiency of this
environment and the markers greatly eased the determination of where the camera was located
when taking pictures. Rotation and linear movement from the point of view of the user could
be tracked for the front half of the room with high success. Motion was easier since the
translation between markers helped highlight the changes in camera. Floor and ceiling
colouration differences also increased rotation and location detection.
An intuitive motion based model for mobile devices – Survey Construction
84 | P a g e
4.5.4. The Tests Each test created was designed to concentrate on a specific input type. Each situation was
designed to simulate a common real-life scenario.
4.5.4.1. Functionality Required Each of the input types to be tested required the creation of a test application with appropriate
interface functionality and recording facility. To some extent this restricted the testing
approach used for particular input types. The applications ran on a timer to presume the user
had responded to a situation. However, in situations where multiple paths could be taken
(scroll down or up though a list), this could not be compensated for.
The audio from each of the tests is available in Appendix E. These were recorded as separate
sound bytes and played by the device before the commencement of each test.
An intuitive motion based model for mobile devices – Survey Construction
85 | P a g e
4.5.4.2. Choosing Data on the Screen In this test, the participant was given the device and asked to control it in such a way as to
select a series of asterisks that appeared on the screen in a set order (Fig 4.7.). The test was
designed to see how the user would move the device to reach a specified end point from the
start position.
Figure 4-7: The Choosing Test (Emulator shot)
The test always had a simple rendering of a mouse cursor in the middle of the screen to denote
the start point of the user and an asterisk was shown to denote the end point. This end point
changed every 3 seconds, giving the user a new target to aim at, once again starting from the
cursor in the middle. The test queried seven of the eight standard directions the cursor could
be moved in (Fig 4.8.).
Figure 4-8: The Eight-Way Movement
The asterisks were displayed in the numerical ascending order indicated in Figure 4.8. It was
assumed that the differentiation between up and down movements would be the most
significant factor in this test and therefore kept separate in an attempt to see reactions.
An intuitive motion based model for mobile devices – Survey Construction
86 | P a g e
There are generally two schools of thought when holding a device in front of you and wanting
to indicate a direction in the Y axis. These are often relative on your perspective of the
situation. Many users would want to move or roll the device down to reach square seven, but
a minority would actually want to move the device up, much like the controls of an airplane
when pushing forward will tilt the nose down and hence send the plane downwards.
Many games (first-person shooters in particular) compensate for such a difference by giving an
option to invert the Y-axis, hence allowing the user to push the mouse away from them (up on
the surface) to look down. This seems the most appropriate direction to handle this
particularity.
I had originally assumed that the greater majority of people would in fact roll the device to
generate movement simply because of the lesser body movement required. This in fact turned
out to be false as 39 of the people surveyed relied on linear movement to indicate choosing.
Interestingly, 14 of these people used very large motions to indicate this movement, whether
this was just to emphasis their response or what they naturally believed is not known.
Seventeen people relied on the rolling of the device; four of these used the inverted method
explained above. It should also be noted that a form of deceleration was detected in some of
these videos with the movement slowing down when approaching the target.
Eight people offered no video/motion response, this was attributed to confusion with 5 of these
people deciding to touch the screen instead. Three people offered zero response.
Eight people obstructed the camera while the information was being recorded (with fingers),
while five held the device the wrong way around so no valid data could be collected. Three
people’s data was unreadable because there was not sufficient focus on the markers around the
room. No people offered alternative motions to indicate the movement portion of the choosing
of the asterisks.
Of the 57 usable responses, 41 also performed notable actions to select the items after moving
to them. These were slowly pushing the device away from them (8), a single quick flick away
from their body (20), a double flick (9), a left/right shake (3) and a left 90 degree rotation (1).
An intuitive motion based model for mobile devices – Survey Construction
87 | P a g e
Out of this test the only unexpected but usable response was the rotation to select an item.
This was an interesting outcome.
4.5.4.3. Adjustment to Counteract the Changing Image This phenomenon was the catalyst for devising these automated tests. Capturing video while
the device was in portrait mode actually rotated the device 90degree (a device limitation I
presume). People, including myself, performed what was natural when seeing this rotated
image (Fig 4.9.). We all tried to correct it.
Much like a camera viewfinder where you try and get the perfect angle by rotating the device,
the same was occurring here as people naturally rotated the device in an attempt to match what
was being seen. Of course this was futile as all this accomplished was further rotation of the
camera itself, resulting in the identical image. This was a perfect test to see just how people
respond to a simple adjustment situation, and hence made a perfect test. But the test itself was
even deeper.
Figure 4-9: Adjustment Test. Notice the LCD Screen
The data being streamed through the camera was still not being displayed properly even the
rotation was taken into account. Somewhere along the filtergraph of the data stream, the
image format was getting lost, resulting in the image actually being inverted along the X axis.
This is actually very hard to notice because of the rotation already present in the image. So the
participants’ second attempt to correct the image was now also of interest.
An intuitive motion based model for mobile devices – Survey Construction
88 | P a g e
Participants were given the device as it explained the instructions. The information being
shown was a live stream coming directly from the devices camera and was being recorded so
attempts to correct the image were seen. The goal was to see which way the devices were
being rotated and what follow up attempts happened after the participants noticed the failure of
their correction attempts.
Given that participants were always facing the same objects in the room it should be noted that
objects shown through the view finder had the same impact on the user’s decision on rotation.
So choices of movement came down totally to the participants thought process and not
unevenly influenced by what was seen on the screen. For example, in Figure 4.9 the LCD
monitor in the background is a focal object so users might try and correct the orientation of
that first.
Confusion did not play a part in this test as all users who received a viewable image did in fact
try to move the device to correct this. Problems once again occurred with people who placed
fingers over the camera, some of these realized the problem themselves before a significant
amount of time had expired. Seven were not useable due to a delayed reaction caused by their
finger cover up the camera. The other nine were usable.
Every one of the usable responses started by rotating the device to the left (Fig 4.10).
Figure 4-10: Device Rotation
Differences were noted in the speed and amount rotated. These factors can both be attributed
to the certainty of the user that their action would rectify the image. Of the 73 usable
responses, 25 left their attempt at the one rotation, 38 attempted to rotate right after the left
rotation failed before giving up. The remaining 10 used multiple different rotation attempts to
achieve the result. The overwhelming result was that rotating the device was deemed the same
as rotating an object displayed upon the device.
An intuitive motion based model for mobile devices – Survey Construction
89 | P a g e
4.5.4.4. Modification - Dealing with a Warped World This is another test where the user views an image and the way they respond to it is recorded.
This time a static image was chosen, a building (Fig 4.11.) and various image transforms were
placed on it to see how the user reacted.
Figure 4-2: Modification Image, High Brightness
Various effects were placed upon the image as it was being shown to the participants to gauge
their reactions. Initially the image was shown unmodified as a baseline, then the image goes
through multiple stages of brightness to see how the users respond. There are three steps the
brightness test goes though, the above figure being the highest (Fig 4.11.).
After the brightness tests, the image was returned to normal and the contrast was increased in
two steps to determine if any natural motion occurred to counteract a stronger image. After
this, the image was once again returned to normal and then flickered to test if any motion
occurs because of this flicker.
Again, relying on natural reactions instead of instructions resulted in a high success rate in
collected data. Smothered video was again an unavoidable issue (9 occurrences). But 52
respondents gave useable data, only minor/insignificant motion was recorded in the remaining
respondents. It was noted that some of these participants with poor responses actually relied
on head movements instead of device movements in an attempt to counter the changes.
Something that was non-existent in the previous test.
An intuitive motion based model for mobile devices – Survey Construction
90 | P a g e
Brightness, in general, was countered by moving the device away from the body as a natural
reaction. The shaking response was also popular here in an attempt to rectify the situation.
Acceleration of the movement away in this test appeared slower than in other tests.
Increasing the contrast resulted in a similar response; therefore a movement like these two
could not be mapped to a direct action. They could only be interoperated as the user wanting a
change to occur in the image (back to normal). From then the user would have to be queried
about the problem.
The flicker provided more useable results as slight rotations were performed to the device
(presumed so the user could look at the device at a slightly different angle).
The result of this test suggests that no definite motion could be applied to modification
situations, only that certain motions should be watched for to determine that there may be a
problem and that further interaction is needed to ascertain the exact problem.
An intuitive motion based model for mobile devices – Survey Construction
91 | P a g e
4.5.4.5. Confirmation with the Faces In this test the participants were shown a collection of images showing either happy or sad
faces with the simple request of agreeing with the happy faces (Fig 4.12.) while disagreeing with
the sad (Fig 4.13.). The goal of this survey was to discover how participants interpreted yes and
no (confirmation answers) as motion.
Figure 4-3: A Happy Face
Eight faces were shown in succession from multiple sources (cartoons, photos, renders, faces
moulded out of snow) with the users given two seconds to respond with each face. The faces
were in the following order: Happy, Happy, Sad, Happy, Sad, Sad, Sad and Happy and were
displayed in that order to everyone. A set order of faces was chosen to greatly ease the
processing of the video data recorded.
Figure 4-4: Sad Face
An intuitive motion based model for mobile devices – Survey Construction
92 | P a g e
72 respondents had at least some useable data (at least one happy and one sad response). And
while respondents answered consistently during their tests, the scope of responses was large.
Many different motion types were used to signify yes and no, and there was no standout (most
popular) response.
• Shake device left/right for yes, up/down for no (32)
• Move device left/right for yes, up/down for no (7)
• Move device left for yes, right for not (24)
• Turn device left for yes, right for no (9)
The singular direction inputs did not make any sense to me at first until I realized people were
imitating the inputs on a message box with a yes/no (ok/cancel) option. Movers appeared to
prefer the single direction inputs while rotators liked the multiple directions. What has to be
noted is that there is already a large amount of emphasis on shaking motions for other
commands. How will one differentiate between yes and help if this path is taken?
4.5.4.6. Scrolling Functionality The concept of this test was to get the participants to read to themselves a block of text on the
screen silently. The actual text consisted of just over 500 words describing various input types
on mobile devices. This text was more than could fit on the screen at once; therefore it was
scrolled up so the users could attempt to read it all (Fig 4.14.).
Figure 4-5: The Functionality Test and the Text within.
An intuitive motion based model for mobile devices – Survey Construction
93 | P a g e
The idea behind this test was that the scrolling speed increased at a constant rate until it was
basically impossible to keep up. With such a design there would be a stage where the lines
being read were scrolling up faster that the eyes were moving down after completing each line.
The video was recorded to determine what motion happened once the participants started
‘falling behind’ the scroll speed and hopefully tried to compensate for it by a reactive motion
of the device that was attempted to slow down, or possibly even reverse the scrolling.
This test proved highly successful as reactive motion by users was very significant in the
results. 72 of the respondents performed a notable motion as the scrolling speed of the text
began to overwhelm their reading. The increase in speed was subtle enough that it was not
noticeable until it was basically too late, but the instinctive motion during this subtle
acceleration could be seen.
From the initial reading position, all of the 72 respondents naturally tilted the device upwards
as a means to counteract the scrolling speed, this could be visualized as trying to increase the
gravity applied to the text by moving the device to a more vertical position. Observations also
showed that along with this device rotation, head rotation occurred as well, but head rotation
occurred in both directions
This was an indication of natural motion at its finest and I queried several people about what
they did after the survey, several were aware they were tilting the device up as a counter-
measure, but most were not. People who were aware were generally those who were tilting
their head downwards as a countermeasure.
An intuitive motion based model for mobile devices – Survey Construction
94 | P a g e
4.5.4.7. Simulating a Phone Call This test was a collection of minor situations that could be applied to making a phone call, all
rolled into one test. The step-by-step plan of this survey was:
• The participant placing the device on the table.
This was to ensure the device was in a ‘neutral state’ before the test commenced.
• The device ringing to simulate a phone call.
A traditional ringer was used to ensure the participants understood that the phone was
supposed to be ringing and they should answer it.
• The participant picking the phone up to answer the call.
The user would pick the device up and place to their ear as if answering a phone call.
• The device starting a basic conversation.
The device played out a generic subtle telemarketers shill to start off the conversation.
• The device asking a simple voice question that the user presumably would answer ‘no’ to.
The device asked the participant if they wanted household insurance, the attempt was to pose a
question and situation that the participants would instinctively answer without any thought
process. This process could perhaps have some motion impact on the device.
• The device’s volume suddenly getting very loud.
The device’s telemarketer becomes upset because of this no answer. Since all this audio was
in fact player over the device’s external speaker it was in fact significantly louder than what
would be possible.
• The participant responding to this sudden loud noise.
The user would instinctively move the device away from their ears, but how was the point of
interest, directly away from the ear, or down? Hopefully the user would not be too surprised
and drop the device.
An intuitive motion based model for mobile devices – Survey Construction
95 | P a g e
• The device’s conversation getting angry with the user and hanging up.
The phone call was ending
• The device playing the disconnected tone.
Phone call was officially over, how would the user respond?
• The participant hanging up the phone.
The user performing some action to indicate they had hung up from their end as well.
The idea behind this test was to get general reactions for a set of situations placed together to
examine the combination of motions to reach an end goal. Each of the perceived inputs was
minor and hopefully instinctive. Quantitative data was not to be collected from this test; it was
merely to see how a motion input system could be applied to a more complex real world
scenario.
The test did not start well for 18 of the people as they to put down the phone and place to their
ear the wrong way around. This made video hard to track as the camera was typically pointed
right at their ear or hair. These situations were discarded. The remaining participants were
able to provide at least some useable information.
The remaining 62 users all picked the device up and placed it to their ears as expected when
answering a phone. However, the question got little usable responses via motion as most of it
could be attributed to laughing. Some minor head shaking to indicate ‘no’ was detected in six
of the responses.
Upon the increase in volume, 45 participants moved the device directly away from their ear
while seven moved the device in a general downwards direction. Ten users did not react to the
volume. No one dropped the device.
An intuitive motion based model for mobile devices – Survey Construction
96 | P a g e
48 users moved the device away from their ear and placed it upon the table. A situation where
an external object may have affected the table was not presumed to be part of the survey since
the table cannot be applied to the hanging up process. Four users placed the phone in their
laps after ‘hanging up’. Two placed the device into their pocket while the remaining eight
kept the device in their hand.
4.5.5. Participants Eighty users were selected for the second survey. Twenty-seven remained from the original
survey while the remaining 53 were new. These new participants were supplied with the
original survey as an introduction to the prior work.
While user information of these new participants was not collected, it can be stated that the
backgrounds of these participants made the tested population less diverse, in particular in the
age category as the majority of the new participants fit into the 23-26 and 31-34 categories,
though seven people of age 40+ were added. IT and construction fields saw the biggest
growth of job positions.
Little information was supplied to the participants prior to the survey and they were asked to
try to keep the tests confidential in an attempt to not corrupt responses of people who had yet
to participate. The participants were informed that information would be collected via a PDA
that they were to interact with and that I would not be responding to questions during the
survey.
4.5.6. Additional Details/Observations A few people had trouble handling the device because of the size and weight. I imagine this
did have an impact of the answers given, but there was no way around this issue.
Relating to the above, some people were aware of the cost and fragility of the device (the
screen had cracked and had to be replaced prior to the surveys. Both of these issues would
have had minor influences on the responses given.
An intuitive motion based model for mobile devices – Survey Construction
97 | P a g e
Several people assumed that the statement ‘hold the device at a 45degree’ was an indication to
roll the device so that the top of the device slanted to their left. This was a minor issue as the
motion captured could still be processed. Tilting the top of the device downwards occurred
naturally since the device was always held at below eye level.
4.6. Survey and Concept Summary Testing simple motion concepts upon a wide variety of the mobile device using population was
performed to see how these users embraced the concept. Originally simple motions were
tested in an informal environment. Participants were asked how they would perform simple
functions. Their motions were recorded and other reactions jotted down.
Upon first look it appeared that people either understood the motion concept, or just could not
understand why they should bother. They had keys and other input mechanisms instead to rely
upon. While this was not ideal, there was a definite base that the model could be used by
(greater than 60 percent of surveyed candidates). This number might be increased with a
gentler learning curve and possible prior training of how motion could be advantageous.
This increased number was actually achieved during the course of the second survey. It
became more and more apparent that if given the right circumstances anyone would use
motion to try to modify the outcome they wished to achieve. Even if they were not
consciously aware of this movement, it could still be taken advantage of, after all, such sub-
conscious movement is natural motion in its purest form. When users were informed of this
movement after the tests they became much more receptive to the entire concept.
This suggests that even if users do not embrace the entire idea of using motions at first, it
might still be a useful tool to augment their everyday usage. As suggested by the second
survey, people became more comfortable with the idea over time as its advantages were
demonstrated first-hand. This can only aid in the further adoption of the concept. If less
responsive users realise their inputs are being guided by motions as well then they should be
more responsive to the positives of the model in its entirety.
An intuitive motion based model for mobile devices – Basic Model Creation
98 | P a g e
5. Basic Model Creation With information collected from the surveys, it is possible to start mapping certain motion
categories to input types and then further classify motions to specific inputs. This is
performed by examining how often certain commands are used by the users and what
situations they were used in. If survey data shows an overwhelming response to a specific
input by participants using the same motions then that motion is likely to be mapped to that
input.
The goals in creating the model are three-fold:
• To create a motion model that is intuitive and simple.
We wish to augment the traditional input schemes by allowing users to perform actions to
signify their intentions. These actions would be movements that come naturally to the user
and take minimum processing and effort to perform.
• To cover a significant portion of the day-to-day actions performed upon a mobile device.
Motions for the most common actions that users wish to perform will allow true comparisons
on how this model performs against competitors. This will also give solid guidelines for
possible implementations.
• To allow the expansion of the model to be both logical and simple
The model needs to be able to cope with new input and motion types being added to it. This
will mean that the model must be well-defined so designers know where to start the placement
of their concept inside the model. It will also be clear how to easily include their ideas though
the model with minimal changes to the model itself. The model as a whole should be a good
guide on how the designers’ inputs could be mapped to motion.
An intuitive motion based model for mobile devices – Basic Model Creation
99 | P a g e
When motions are used, some natural motions are a combination of multiple motions (rotate
left, rotate right) and could cause confusion with an input that is only rotate left since they both
contain the same information inside. Therefore the rotate left could fire off a command before
the rotate right could occur. To stop this from occurring, many commands require a have a
neutral state after them to indicate the end of the input.
5.1. Collected Data Classification The first step in pulling the collected information together is to classify each type of motion
type used by the participants during the survey. An example of these has been tabulated in
Appendix F. This shows the input types and where they were used. The information is sorted
by when they were used (survey).
This table shows data that was consistent throughout the survey with simple to classify
motions being far more popular during the surveys (in particular directional movements).
Participants often used very similar inputs to end up with their result. This information can be
restructured simply to allow us to examine what types of motions are commonly associated
with certain types of input (Appendix G). This data shows that it was very easy to apply certain
motions to input events. But the table also supplies details on additional information that was
obtained from examining the inputs (confidence of answer, for example.)
5.2. Situational Motions As discussed earlier (Section 3.5.1.) certain inputs do not make sense when certain situations are
currently happening (i.e. panning a picture to the right when you are listening to music since
there is no image to pan). This brings up two important points:
The model must take into account the current situation of the device
Since many motions will make no contextual sense in some situations, there must be limits and
controls on when certain inputs can occur. This also means any software must be aware of the
current usage state of the device.
An intuitive motion based model for mobile devices – Basic Model Creation
100 | P a g e
Motions can represent multiple inputs as long as those inputs are in mutually exclusive
situations.
We can map the motion Direction Right, Y-Axis to multiple inputs but each of these inputs
must not interfere with each other. With the current limitations of multi-tasking on Windows
Mobile at this time (one application running on top, others in the background) this mutual
exclusivity is in fact increased (only one application can have control at a time) and allows
increased replication of motions through situations.
Currently defined situations are available in Appendix H. This describes these situations
examined and relationships with other situations.
5.3. Model Expandability There must be a goal of not boxing in the designers in what options they can choose for their
inputs. This means that the designers should always have options available for them to choose
for motions. And hopefully there will be enough available options that at least one of these
motion options is feasible enough for the required input. We do not want the designers
deciding upon illogical inputs simply because there is nothing else available for them to
choose.
Therefore commands pre-defined by the model should be spread across the motion types. This
increases the chance the designers can use a motion that is similar to what is decreed optimal
without having to overwrite a pre-existing command, which should be an absolute last resort.
An intuitive motion based model for mobile devices – Basic Model Creation
101 | P a g e
5.4. Inappropriate Commands Commands need to be judged for appropriateness before even being applied to the model.
There are inputs where currently available inputs are simply better at performing tasks. For
example, to draw a picture using a stylus on the touch-screen to simulate a pencil will be far
more effective than the most natural motion replacement which would be to emulate a pencil
with the entire device. Being able to see what is drawn directly with the stylus, and the stylus’
physical similarity to a pencil are positives a motion based scheme cannot effectively replicate.
There are also situations that are not just difficult like the above example, but are in fact totally
inappropriate. If one is using the device’s camera functionality and has a perfect snapshot, one
does not wish to have to perform some motion on the device to take the picture. Not only will
one lose the perfect picture, but the motion is more than likely to blur the picture. In such a
situation it is best that the input model (and application that performs the tracking) are disabled
entirely in favour of a more traditional input model.
5.5. The Base Model To create a motion for an input, certain information needs to be defined. An input, the context
of this input and a proposed motion are all required to be placed in an input. The thought
process that could be applied to this is shown in Figure 5.1. When the designer has a certain
situation that this input is valid in, this is the beginning of defining a context. With this
context in mind, it can be checked to see if the motion is already being used. If the motion is
not being used then it can be applied to the input wanted by the designer. With a motion
mapped to an input, a result is achieved.
An intuitive motion based model for mobile devices – Basic Model Creation
102 | P a g e
Figure 5-1: Example of how the Information fits together from the Model
The breakdown of the motions in the model is available in Appendix I. The process for
creating these inputs follows.
Once the input and a relative motion have been defined, it needs to be placed in a category. If
this is a pre-existing category then the designer must compare each of the categories that it is
related to. If any of these categories contains the same motion that is proposed then it is
recommended the motion is not used. But it is possible that the conflicting inputs may be
exclusive from each other as the categories might be related in separate ways. But it is still
recommended that the motions are not replicated.
Once this comparison is set up, it will be possible to quickly check what inputs are in fact
available, and if any of these is appropriate it can easily be added.
If the input belongs to a unique category, then this has to be added to the model and examined
to see which situations already in the model it can relate to. Once this is complete then the
examining of conflicts can begin.
An intuitive motion based model for mobile devices – Basic Model Creation
103 | P a g e
Scaling of inputs is an important factor. It was generally observed that certain motions were
performed with slightly different factors affecting them. This can be visualized simply by
imagining a user moving the device along the x-axis faster to indicate they wish to see the
image being displayed on the screen to pan across faster. These are indicated in Appendix G
as adjusters. Therefore in inputs that have varying levels of ‘amount’ that can be input, these
adjusters aid in determining this information.
A basic model must cover fundamental information. To achieve this certain input types must
be covered. These include: Confirmation, Movement, Choosing and Selection. A type such
as Adjustment (adjusting what?), has such a large scope that it cannot be blanket covered by a
model, therefore the expandability of the model should cater to it instead.
5.5.1. Confirmation With confirmation only covering two possible inputs (no choice to back out of option, ala
cancel), it is an easy motion to map. Yes and no are polar opposites and as such, the inputs
showed this. Regardless of the choice of motion, ‘No’ used the same motion type as ‘Yes’ but
with opposite parameters.
I.E.:
• Yes – Move Left, No - Move Right
• Yes – Move Up then Down, No – Move Left then Right
• Yes – Rotate Up then Down, No – Rotate Left then Right
Being in an exclusive category (a confirmation dialog will always be on top and cannot be
avoided until it is answered), confirmation inputs do not have to worry about stealing input
motions away from other types. Therefore any of the inputs can be selected. If allowing
multiple inputs to be used as one command then we should look into ensuring these inputs
could not be confused as another (to ensure highest accuracy.) Moving Left is part of one
‘Yes’ input and one ‘No’ input, therefore only one of these input paths should be used.
Allowing multiple inputs suggests taking both inputs 2 and 3 is the best course of action for
the confirmation inputs ‘Yes’ and ‘No’
5.5.2. Movement Movement is interesting, as it can be applied to many different contexts and situations as many
items can get moved in many different ways. However the entire idea of moving an item was
An intuitive motion based model for mobile devices – Basic Model Creation
104 | P a g e
easy for participants to grasp and their answers were consistent. The single important item for
consideration is what is being moved. For example, when viewing a picture on the screen and
moving the device to the left to simulate a left movement, one is in fact not moving the picture,
but moving the viewport left. Such information needs to be explained to avoid confusion.
Truly analogue movement appears to be best avoided as such detection would require a far
more precise detection method to work solidly. Therefore eight-way movement appears to be
the best course of action but only allowing rotation directly along the 3 axis:
Examples:
Table 2: Movement to Input Mapping
Move Device Left Move Target Left
Move Device Up Move Target Up
Move Device Up Right Move Target Up Right
Rotate Device Left (X Axis) Rotate Target Left (X Axis)
Rotate Device Down (Y Axis) Rotate Target Down (Y Axis)
Rotate Device Left (Z Axis) Rotate Target Left (Z Axis)
An inverted phenomenon is best avoided until the model can accept user preferences. So the
above examples are best suited for the base model.
5.5.3. Choosing The base model of choosing assumes we are choosing from a vertical list of items. Again
depending on the context of the situation this can be vastly different but handling hierarchical
menus available on devices is the most common choosing exercise.
In this situation there is always a default value set. This gives us a base point to traverse the
list (again being a hierarchical menu), the options at any situation are to go up options, down
options or to pick an option. It is possible to have scales to choose how far to go up or down
and if we try to keep consistent with the survey results (and in fact our previously defined base
type movement) we could use movement to traverse up and down our list, with amounts of
distance determining how far. This can be achieved by moving to the next item selected, then
after a set time determine, if we are still moving. If so continue moving in our list.
An intuitive motion based model for mobile devices – Basic Model Creation
105 | P a g e
Therefore:
Table 3: Precise Input to Motion Mapping
Activate Item Push Device (Move Forward then Back on Z axis)
Move Up List Move Device Direction Up
Move Down List Move Device Direction Down
Since we are polling the device every so often for inputs, the model should accept continuous
moving up as a continuation of the list. Any time the device is brought back to the start this
would be seen as movement in the opposite direction. Hence this would be seen as an
attempted input in the opposite direction.
5.5.4. Selection Selection brought forward the notion of where do we start, the most forgiving answer is to,
after every accepted input, return the start point of a selection to the middle of the screen. As a
example after double clicking an icon on a desktop and running the application, once the user
quits that application, the mouse pointer returns to the middle of the screen.
This again allows us to easily embrace an input concept that has already been defined by
choosing. Move the selection in a direction and continue to move in that direction if the
device’s motion continues in that same direction. This allows us to use the directional input of
motion as out stimuli and the activation process of choosing. Selection and movement are
exclusive from each other except for the process of de-selection so they can share inputs. An
item must be selected before it can be modified or moved. So once an item is selected it can
change to the movement structure.
An intuitive motion based model for mobile devices – Basic Model Creation
106 | P a g e
Step – by – Step Desktop Example:
• User selects a square in a textbox in Microsoft PowerPoint (No change)
• User selects a point inside the selected textbox (cursor changes to the movement arrow
while mouse is held down)
• User moves textbox around
• User deselects textbox (releases mouse)
Such selections can be replicated very closely by hand motions, except the actual holding
down of the mouse button, which serves as a two-state switch. Therefore a de-select input that
turns off this switch is required (we have activation to turn it on). This is where de-select is
required.
5.6. Summary of Model Creation Stipulations and exceptions were continually found while developing a base model for the
system. It is simple to define what inputs are, and give those motions, however placing these
within a consistent model remains a challenging task. Such problems were typically solved by
simplifying the process (for example, limiting directions to eight.) While others required
stipulations to cover the entire model (continual scrolling up, does the user have to keep
moving their arm up, or can I return back down then resume up). Such decisions had to be
made to keep the model as simple as possible and not consider implementation hassles.
Breaking down data collected in the survey was also a huge task, one that became a lot less
troubling because of input classifications. This meant that only the types of motions needed to
be collected and not exact data on distance or speed. These details could be supplied with very
little precision (in fact simple describers were sufficient, a.k.a. slow, fast or sharp).
Overall, a model has been created that is simple enough to apply at the application level, or
even at the OS level.
An intuitive motion based model for mobile devices – Prototype Development
107 | P a g e
6. Prototype Development To demonstrate the detection of motion and applying this to a situation, a prototype had to be
developed. A desktop implementation was to be developed first as a proof-of-concept which
would then be shrunk down and implemented onto a mobile platform. The goal was to
successfully track motion over a series of frames on a mobile platform in such a way that the
performance of the device was not significantly impacted (allowing the user to perform other
tasks at the same time).
6.1. Using DirectShow DirectShow [2] is commonly known as the video and audio processing component of
Microsoft’s DirectX Framework [8, 21]. It also is commonly considered to be the most complex
and difficult to learn aspect of the framework. This is evident as much of the functionality
available to desktop developers is simply non-existent in the mobile framework because of
implementation hardships.
DirectShow uses a plug in type architecture, taking full advantage of the ideals behind COM,
Microsoft’s Component Object Model that allows different components to talk and interact
with each other. The way this works can be easily visualized in desktop application such as
GraphEdit. This application allows the user to visually design the graph the developer wishes
the video information to pass through (Figure 6.1). Every stage from the initial source
(camera/file) through to the final destination (screen/file) is part of this DirectShow filter
graph, along with everything in between.
Figure 6-1: GraphEdit
An intuitive motion based model for mobile devices – Prototype Development
108 | P a g e
The concept of creating the graph itself is similar between the two platforms (Windows
Desktop and Windows Mobile) but many commonly used features either do not exist or have
limited functionality in their mobile incarnation. This coupled with limited debug libraries can
make the development of a filter a much more drawn out process.
6.1.1. DirectShow Filters DirectShow filters are the middle ground between the input video information and the output.
Typically filters are used to perform image enhancements/modifications such as sharpening,
cropping and resizing before displaying the results to the user. Information is sent to the filter
as a Media Sample that contains header information such as size and encoding type. This can
be used to obtain a reference to the image itself, as a byte stream. With this information, a new
media sample can be created (either by directly copying this information or creating your own)
and the required modifications can be made to the data if needed.
Once a filter is plugged in, all information will travel through it. For example, a darkening
filter will typically copy the header information and then go through the image data decreasing
the values by a fluctuating amount (and therefore darkening the image.) Knowing the data
type lets you know the image format and what values to decrease (Colour, bit depth of image,
if it contains an alpha channel) and the most efficient way to go about it..
DirectShow filters are typically compiled as their own library and interact with applications
via COM. The GraphEdit application available in the DirectShow SDK allows interaction
with these filters via a drag-and-drop interface for easy testing and debugging. These libraries
are typically registered though the Windows registry and can be dynamically loaded by the
operating system when required.
Such filters run independently of other applications in separate threads and provide a great
starting ground for a motion-based input implementation as they can run in the background,
independently of other tasks and can be probed when required. This would be a far superior
grounding instead of the more obtrusive step-by-step approach typically taken by image
detection solutions. DirectShow tends to use a significant portion of CPU time to ensure it
processes as much frame data as possible, but it will drop frames if falling behind. This is
essential on lower speed devices.
An intuitive motion based model for mobile devices – Prototype Development
109 | P a g e
6.2. Desktop Development DirectShow, with its’ modular format allows the filter to be separated totally from the two
endpoints of the video displaying process, in this case the input (camera) and output (screen).
Each of these are considered components that have pins that can easily be connected to others
pins to create the filtergraph. Each of these pins sends a specific type of data and to connect
pins, the component whose pin is receiving the data must confirm that it can accept the data
type of the component sending data. This is done when the components first try to connect,
therefore once a graph is complete, the data should be able to flow freely as all components
can accept the data they are given.
BDA(Broadcast Driver Architecture) and WDM(Windows Driver Model) [26] drivers that
video/audio devices use to interoperate with Windows all use DirectShow to transport data.
Therefore an interface to get this data is provided and a filter is created with a pin that can
accept the data passed to it by this driver. Hence activating devices is typically easy and
getting them running through GraphEdit is trivial.
6.2.1. Using a Filter to detect squares Instead of re-creating the wheel multiple shape detection algorithms were tested to check their
performance on mobile devices. Creating a Hough shape detection algorithm was an easy
task, but the performance was not acceptable on low resolution images to be relied upon for
motion detection. The Augmented Reality Toolkit was also examined for its basic
functionality, and while it was not easy to convert over to the PocketPC platform, its
performance was impressive.
6.2.1.1. The Augmented Reality Toolkit The ARToolkit [15] is a library of functionality designed to process scene information for
specific features (know as icons.) These icons are based on simple black squares with a black
and white pattern inside. Upon detection of a possible marker, the contents inside it can be
examined to see if it is the marker being looked for and if so, the orientation of the icon can be
determined by comparing the picture inside the image to the original icon picture. This gives a
3D orientation of the icon and therefore additional meaning within a 2D space (picture/video
image.)
An intuitive motion based model for mobile devices – Prototype Development
110 | P a g e
Typically this allows additional 3D information (3D models) to be overlaid on top of the
displayed image in a realistic manner at the icon point based on the orientation information.
Possible applications include gaming that interacts with the surrounding environment or
something as simple as a location guide. But such is obviously not required when detecting
motion.
As explained earlier, obtaining three-dimensional information from within a flat image aids
significantly in the tracking of motion between frames. The ARToolkit was a good basis for
obtaining information that could be transformed into this three dimensional information. But it
also performed functions that were unnecessary in this step (inside the detected squares.)
While this information would be useful for scene-information input (Section 3.5.2.), it was not
necessary for motion based information.
The first step using a stripped down version of the ARToolkit as a proof-of-concept was
considered viable. This has been done before on mobile devices [33], but in such a way that it
was obtrusive to the end user and could not migrate between devices as well as using a great
deal of back-end and unreleased code.
A mobile filter framework was developed that could incorporate the ARToolkit’s stripped
down detection routines, convert this information to three-dimensional data and display the
tracking of this to screen. This is further described in the following sections.
6.2.1.2. DirectShow Filter Basics Filters modify input to produce output of a video or image stream, modifying it in some way to
produce a final result. Such filters are required in the mobile framework to enhance lacking
functionality. But as a middle ground and as a separate entity that interacts via COM, a lot
more is possible. Thankfully developing filters is pretty much identical to their desktop
variants. They can be loaded inside applications or registered though the registry in a similar
fashion to their desktop counterparts. The significant difference comes with debugging and
designing these filters.
An intuitive motion based model for mobile devices – Prototype Development
111 | P a g e
The first step is to define exports in a definitions file (DllMain, DllGetClassObject,
DllCanUnloadNow, DllRegisterServer and DllUnregisterServer). These are simply
methods that can be called from other programs using the DLL and should be included in all
filters.
LIBRARY "FilterName"
EXPORTS
DllMain PRIVATE
DllGetClassObject PRIVATE
DllCanUnloadNow PRIVATE
DllRegisterServer PRIVATE
DllUnregisterServer PRIVATE
Usually the filter should be a transformation filter and therefore extend CTransformFilter as
well as CPersistStream to gain the base functionality needed. Being a COM object then a
GUID should also be supplied to give it uniqueness.
DEFINE_GUID(CLSID_FilterName,
0x00000000, 0x0000, 0x0000, 0x00, 0x00, 0x0, 0x00, 0x00, 0x00, 0x00, 0x00);
Several methods should then be overridden to gain basic funtionality.
From CPersistStream: HRESULT ScribbleToStream(IStream *pStream);
HRESULT ReadFromStream(IStream *pStream);
STDMETHODIMP GetClassID(CLSID *pClsid);
And from CTransformFilter: HRESULT Transform(IMediaSample *pIn, IMediaSample *pOut);
HRESULT CheckInputType(const CMediaType *mtIn);
HRESULT CheckTransform(const CMediaType *mtIn, const CMediaType *mtOut);
HRESULT DecideBufferSize(IMemAllocator *pAlloc,
ALLOCATOR_PROPERTIES *pProperties);
HRESULT GetMediaType(int iPosition, CMediaType *pMediaType);
An intuitive motion based model for mobile devices – Prototype Development
112 | P a g e
Transform is obviously where all the work is done. With reference to multiple media samples
(one input, one output) the simplest transform is to copy directly from the input to the output.
If general information is going to remain the same (image size/format) but with only image
distortion, then calling GetPointer(ref) on the media sample gives us a pointer directly to
the byte stream of the output image, allowing one to write to it directly to make changes.
Depending on what the filter does, one might copy the data over before writing or read and
change dynamically.
This image data is usually contained in an abstract format when working with it. While the
image data does go from left to right, the data starts at the bottom of the image and works up
and is typically stored in a BGR or BGRA (blue, green, red, alpha channel) format. Therefore
image data will need modification before the filter is applied. (Figures 6.2. and 6.3.).
Figure 6-2: Image Data with Alpha Channel (Note the 00’s)
Figure 6-3: Image Data without an Alpha Channel
An intuitive motion based model for mobile devices – Prototype Development
113 | P a g e
Having an image to work upon, and draw directly to, is generally how detection algorithms
work. Therefore a memory copy from the input to output MediaSample is typically the first
step. The next decision is if the byte stream format is suitable to work the algorithm, or if a
rotated image with swapped filters should be created (and possibly even the removal of the
alpha field).
But the true advantage of having a filter perform these modifications is that one is not just
limited to an image output result. Interfaces can be made to the filter and the filter can be
queried for information, or have parameters passed to it to change how it works on the fly.
These interfaces also required a separate GUID (unique identifier for the filter) since you call
these methods from a separate object reference (instead of a Filter object, you call the actual
COM object directly.)
DEFINE_GUID(IID_IFilterName, 0x00000000, 0x0000, 0x0000, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x01);
And then define the methods that can be called externally.
DECLARE_INTERFACE_(ICEARTFilter, IUnknown)
{
STDMETHOD(SetFiltering) (THIS_ BOOL set) PURE;
};
This would be an obvious first method to create, allowing the program using the filtergraph to
turn the filtering of the image on or off whenever it wanted. The corresponding method would
be created like normal as a way to change the information and a check inserted into the
Transform method that decides if it should actually perform the transform or not.
An intuitive motion based model for mobile devices – Prototype Development
114 | P a g e
6.2.3. Mapping to 3D Square information retrieved from the filter contains purely screen co-ordinates of the vertex
pixels. This two-dimensional representation needs to be transformed into a relative three-
dimensional meaning (Figure 6.4).
Figure 6-4: Translation of Two-Dimensional Screen Data to Direct3D Polygon Format.
The first step to perform is the translation of pixel space to an absolute -1 to +1 value in both
the x and y axis. This is the default format used by Direct3D and allows us to use its data
structures and methods to check changes (transforms) between frames. The next step is to
calculate Z depths for each point of the polygon. To ensure consistency the detected polygon
is always considered a square (Section 6.2.1.), therefore Z values need to be determined to make the
item a square.
To determine these values triangulation, needs to occur by determining line length. This is
easiest when the closest line to the screen is given a Z-depth of 0 (which will be the longest
since we are working with squares, where all lengths are equal). Other values can then be
determined by using the actual length (screen data length) and applying our three-dimensional
length (longest line).
The ARToolkit includes such functionality in its code base to convert this screen polygon to
three-dimensional space, but returns OpenGL [30] co-ordinate data. This is a simple task to
change to a Direct3D co-ordinate system so we keep a consistent orientation.
An intuitive motion based model for mobile devices – Prototype Development
115 | P a g e
6.2.4. Tracking motion from cube transformations To track motion in a video stream there are two core components that need to be stored, the
direction we have already determined the device to be travelling and the location of key
objects from the last frame. With this data, we find where components are located in the
current frame of video and compare this information to the last frame. Changes in similar
objects location suggests movement by in a direction relating to this change. Acceleration and
direction change can be determined from the previous motion information passed on. The
motion data is then updated with this new information.
More complicated information such as rotation and movement into or out of the screen require
more advanced interpretation. For this project, I decided to try and convert key objects
available in the scene to three-dimensional information and track changes in the 3D planes.
Such logic has limited the type of objects that could be tracked to those that could be easily
changed into such information. Therefore this logic tracks polygon information in the scene to
read motion. All polygons seen in the scene are determined to be squares and are mapped like
as square polygons on the 3D plane complete with their rotations and orientations.
With this three-dimensional information available, motion changes can be tracked between
frames by checking the X, Y and Z axis change of each vertex in the MAP (Section 3.4.). To gain a
more accurate representation of this information, it should be averaged over a series of frames
(to lessen the impact of the MAP possibly changing) and to avoid random twitching that could
occur from either the user or the device.
To perform this averaging, motion and rotation vectors of the change between frames should
be created. Then relative time samples need to be defined as a polling period (for example half
a second) then the average of this time is taken as the motion movement actually performed
over this period. Since the filter created runs at 30 frames per second and captures objects
over that period then this would surmount to 15 calculations to compare between.
An intuitive motion based model for mobile devices – Prototype Development
116 | P a g e
With low resolutions and less consistent frame rates than a more powerful desktop counterpart,
this extended period also allows the adoption of ‘loss’ periods where the filter fails to find
objects in a frame that it is supplied (usually due to the blurring of the image). Frames where
objects are not found are simply not stored in the queue so the polling period will have fewer
transformations to look at.
With the stack resetting every half a second, the impact of visual flaws such as two polygons
being detected far apart and then loosing one off the edge is limited. Therefore, if we lose a
polygon because of its falling off the edge, then find a new polygon shortly after the impact on
the motion we detect is only minor (one frame of transformation). Any other motion detected
in the queue is likely to outweigh this.
There was consideration into examining a sliding queue so that polling occurred more often
and previous data was reused but the advantages appeared minor when compared to the
additional calculations required.
6.3. Windows Mobile 5 Development Upon the release of Windows Mobile 5, a consistent model was created for application
developers to programmatically access mobile device camera. Interfaces have been made
available for both .NET managed languages and the original C++ pocket SDK. Now armed
with the ability to directly call the camera from an application, a coder can begin to move their
applications to the mobile sector via DirectX, or even develop Desktop and Mobile
applications from very similar code trees.
Originally, this was both difficult and messy as there was no direct linking of the camera to the
operating system so each third party responsible for developing handsets had their own unique
libraries and hooks to incorporate camera software into the OS. Without any documentation
and limited support this made attempted development in this sector both difficult and
unreliable.
Now with the ability to interface with the camera directly (and in particular with the Pocket
SDK, the .NET compact framework is still lacking), developers can use COM and DirectShow
to process incoming video, similarly to their desktop counterparts.
An intuitive motion based model for mobile devices – Prototype Development
117 | P a g e
Using DirectShow on a mobile device is identical to the desktop platform, therefore section
6.2.1.2. can be applied to mobile use as well.
6.3.1. Porting detection filters Typically filters are required in the mobile framework to enhance functionality that is available
natively on the desktop platform. But since they are designed as a separate entity that interacts
via COM, re-adding a lot of this functionality is possible and in fact a necessity for the filters
to be useful. The interaction is similar to their desktop variants (Section 6.2.1.) The significant
difference comes with debugging and designing these filters.
Many filters compiled with the debug flag will simply not register properly, even though
RegSvrCE suggests they were successfully registered. This coupled with much tighter
memory requirements, generally slower devices with lower quality video and once again a
limited amount of available libraries means there simply is not the same turn-around time in
creating these filters as on the desktop. Complicated filters needs to be smart with their
memory as pointers easily get corrupted in the still young Direct Show mobile SDK and they
need to have an obvious flow since debugging is a lot more time consuming process.
To compile code for Windows Mobile 5 devices, the Windows Mobile 5 Pocket SDK is to be
installed: http://www.microsoft.com/downloads/details.aspx?FamilyID=83a52af2-f524-4ec5-9155-717cbe5d25ed&DisplayLang=en
Compiling filters uses the following libraries: strmiids.lib, strmbase.lib, d3dmguid.lib,
d3dm.lib, ddguid.lib all DirectShow libraries are included in this SDK and do not need to be
recompiled (no source is supplied anyway).
When transferring a filter from a desktop implementation over to a mobile solution several
particulars of note must be adhered to. Use of ATL and MFC frameworks often cause
problems and you must not have wchar_t as a built in type.
An intuitive motion based model for mobile devices – Prototype Development
118 | P a g e
The testing and debugging of filters can be particularly difficult as a mobile filter will not
register (and therefore cannot be used if compiled in debug mode so all work must be done
with the ($NDEBUG) flag (release mode). Such a situation means that limited information can
be gathered from the filter while testing. Combined with the very limited development
environment this means debugging becomes a messy task.
Once the .dll (or .ax depending on setup) is copied across to the device it must be registered
into the registry. RegSvrCE is no longer supplied with the SDK and while programs can self
register a library, actually registering it globally allows every application to use it.
Registering via the registry will usually place the filter in the following location: [HKEY_CLASSES_ROOT\Filter\{Supplied GUID}]
Code can also be self registered from inside the program that uses it with the following COM
calls: LoadLibrary("filter.dll");
GetProcAddress("DllRegisterServer");
6.3.2. Camera Initialization Typically on the desktop platform, components of the machine are enumerated through
(probed one-by-one) to find a camera connected to the machine. While this method is possible
on mobile devices there is typically no point as you will already know the being used to record
the video (the embedded camera).
Another interesting point of the mobile DirectShow implementation is there is no included
PropertyBag object, something that is essential to adding a camera to a filter programmatically.
Thankfully such information is available on MSDN [20] and easy to replicate.
CComVariant varCamName;
CPropertyBag PropBag;
CComPtr<IBaseFilter> pSrcFilter;
CoInitialize(NULL);
An intuitive motion based model for mobile devices – Prototype Development
119 | P a g e
hr = m_pGB.CoCreateInstance(CLSID_FilterGraph, NULL, CLSCTX_INPROC);
if( FAILED(hr))
{
Msg(TEXT("Failed to create filter graph. hr = 0x%08x"), hr);
}
pSrcFilter.CoCreateInstance( CLSID_VideoCapture );
pSrcFilter.QueryInterface( &pPropertyBag );
varCamName = L"CAM1:";
if(( varCamName.vt == VT_BSTR ) == NULL ) {
return E_OUTOFMEMORY;
}
PropBag.Write( L"VCapName", &varCamName );
pPropertyBag->Load( &PropBag, NULL );
pPropertyBag.Release();
hr = m_pGB->AddFilter(pSrcFilter, L"Video Capture");
if (FAILED(hr))
{
return hr;
}
The shortcut “CAM1:” is used as a direct reference to the first camera on a mobile device and
should be implemented on all Windows Mobile 5 devices. Once added to the device, video
can start and stop via the IMediaControl and IMediaEvent interfaces after they have been
added to the graph as well.
CComPtr<IMediaControl> m_pMediaControl;
CComPtr<IMediaEvent> m_pMediaEvent;
m_pGB.QueryInterface(&m_pMediaControl);
m_pGB.QueryInterface(&m_pMediaEvent);
Starting and stopping capture is performed via:
m_pMediaControl->Run();
m_pMediaControl->Stop();
An intuitive motion based model for mobile devices – Prototype Development
120 | P a g e
Encoding video also uses this information as media events are constantly polled while the
encoding is taking place. Once certain events take place (stopped) the video should be in the
process of being encoded. But it is not until these events are actually processed that you know
the encoding has been completed.
do
{
pMediaEvent->GetEvent( &lEventCode, &lParam1, &lParam2, INFINITE );
pMediaEvent->FreeEventParams( lEventCode, lParam1, lParam2 );
if( lEventCode == EC_STREAM_CONTROL_STOPPED ) {
OutputDebugString( L"Received a control stream stop event" );
count++;
}
} while( count < 1);
6.3.3. Image Output DirectShow on the desktop has a call-back mechanism to retrieve the currently processed
frame in the filtergraph stream. This can be used to actually process the data directly to the
bitmap object or send it elsewhere to perform processes upon. Sadly the mobile version of
DirectShow has no available way to grab this image.
Therefore the filters themselves have to offer up this functionality. Again COM is an effective
way to achieve this. Once the filter has the IMediaSample it is working upon, a pointer
directly to the image is available. Creating a function to access this pointer is then trivial but
causes problems because there may not be an IMediaSample available when the function is
called. Therefore, an additional data structure should be available to constantly store the last
worked on image in the stream. Copying a bytestream across to a new memory location
during the IMediaSample data collection process allows a constantly available image that is far
more immune to corruption than the pointer location of the IMediaSample image.
Therefore, this COM method can simply return a pointer value to this memory location and
whenever this location is checked it will contain the last processed, or currently processed
frame.
6.3.4. Data Display Once the data has been processed and information created then there are two obvious ways to
display it. Draw it directly to the image itself and display our new image, since the filtergraph
An intuitive motion based model for mobile devices – Prototype Development
121 | P a g e
works with two IMediaSamples (in and out) the changes can be drawn to the out image as it is
being created, then if this graph ends in a renderer these changes can be seen (Figure 6.5).
Figure 6-5: Image without and with Display Filter applied
The other option is to read and interpret this data by storing into memory structures and then
displaying this data in an entirely new context. With this setup both sections of the application
must be aware of the data types this information is stored in.
6.4. Prototype Summary Creating a working and efficacious program that took advantage of the motion based input
concept was a journey that took far longer than expected. The tools and the hardware were
available, however significant difficulties were experienced due to the paucity of available
expert knowledge, and the lack of maturity of the mobile platform development platform
compared to the desktop platform.
The .NET Compact Framework v2 [19], much proclaimed by Microsoft and with great potential
in my eyes at the time (rose-coloured glasses in hindsight) proved to be severely lacking when
trying to push the boundaries of mobile development. It remains a tool to develop standard
applications in the quickest development cycle with little to no way to access device resources
outside the scope of a typical application window.
The solution was to return to the C++ orientated Pocket SDK, where development time and
effort is significantly higher than .NET, but functionality is available. For example, the
camera inter-operability in the .NET Compact Framework is nothing more than a call to the
camera application developed by the device manufacturer to record video to a file for a certain
An intuitive motion based model for mobile devices – Prototype Development
122 | P a g e
amount of time. There was absolutely no way to use this functionality in this project. The
Pocket SDK however, allows true access to the camera to be used inside an application proper,
which allows motion detection to be a reality on a device a reality.
Microsoft Visual Studio, while a great tool, had many short comings when it comes to mobile
development. At first the tool is great, but debugging applications becomes painful while
trying to step through code. Even over a direct USB2.0 connection the time taken to process a
line of code is significant (multiple seconds per line of code). When trying to debug code that
is tens of thousands lines in size (like the ARToolkit) this is unusable. Performance is
increased when using an emulated device on the debug machine, but such a device does not
have a camera so in my circumstance this was an unusable option.
This was combined with many major drawbacks of the Pocket SDK available. It is mainly
compiled and the only source supplied is header files. This leaves the developer spending
much of their time stepping though the machine language created via disassembly. Trouble
also occurred as much of the code (the DirectShow filter in particular) would not operate with
debug flags set. This means that even less information is supplied while trying to debug code.
Many variable values cannot be tracked and branching code is difficult to work though.
Undoubtedly the above are major contributors to the lack of advancement in the field of
application development using the camera on mobile devices. But the base work has been
completed and its performance is more than acceptable given the above circumstances. Better
cameras and more memory in the future will only improve this prototypes success.
An intuitive motion based model for mobile devices – Conclusions and Future Work
123 | P a g e
7. Conclusions and Future Work With all three segments of the research resulting in varying levels of success it can be
concluded that there is a huge amount of potential in this field. This chapter summarises the
paths this research can take in the future as well as outlining the new paths the research has
opened up.
7.1. Answers to Research Questions
7.1.1. Answer to Question 1 - What functionalities of a phone's features are appropriate candidates to be used as parts of a
motion input scheme?
Throughout the research it was discovered that many of a phone’s functionalities were in fact
suitable targets for motion-based inputs. In fact many of these inputs already incorporated
movements by the end user already that could be used (moving the phone to answer it for
example.)
Section 5.4 describes several inputs that are inappropriate for an input model.
7.1.2. Answer to Question 2 - Is it possible to construct a rational and useable mapping scheme for phone inputs?
As demonstrated in Section 5.5 it is in fact possible to create a useable mapping scheme for
phone inputs. Appendix I demonstrates a small sample of inputs that can be mapped to
motions.
7.1.3. Answer to Question 3 - Can people adapt to using motion gestures as an input medium and what are considered
suitable (not embarrassing. over-exertive) motions to perform?
The results from Survey One (Section 4.3) demonstrate when instructed to use motions instead
of traditional input means users were generally more than capable of adapting straight away.
The few who struggled were capable after further encouragement and explanation.
An intuitive motion based model for mobile devices – Conclusions and Future Work
124 | P a g e
Survey Two results (Section 4.5) show that users definitely moved the devices in specific ways
subconsciously and these could be used as inputs as well.
7.1.4. Answer to Question 4 - How uniformly do people perform motions given to them (different people, slight difference
in movement) and can these variations be adapted to?
It was discovered that a significant percentage of users performed very similar movements in
an attempt to get an end result (Sect 4.4.9 & 4.5.6.) Very few users deviated from a specific
command and few users had any significant peculiarities while performing these motions. As
long as the device moved and rotated in the general directions it was tracked the same.
7.1.5. Answer to Question 5 - How suitable are images (collected by the embedded mobile cameras) for in-depth image
processing?
Generally the images collected by the in-built cameras was not very suitable for in-depth
processing and therefore a significantly different approach had to be taken to track movement
information (Section 6.2.1.1 & 6.2.3.) In good conditions the low quality video collected was
sufficient, but in many situations the used filters were not capable of tracking information.
This brought forward the concept of switching between detection algorithms depending on the
situation (something only very briefly discussed in this thesis.) Switching detection routines
depending on the situation would be the best way to collect movement information without
higher quality image/video information.
7.1.6. Answer to Question 6 - Can real-time performance of image detection algorithms and movement calculations on
Smartphones™ be achieved?
Surprisingly it can. It is defiantly not an easy feat, but with significant knowledge of device
development and how to perform the detection even memory and processor intensive
algorithms are possible, as shown in Section 6.3.
7.1.7. Answer to Question 7 - Will tracking movement critical to this project unexpectedly interfere with the normal usage
of the phone?
An intuitive motion based model for mobile devices – Conclusions and Future Work
125 | P a g e
Testing showed that this was in fact a possibility as occasionally the phone hung while
detecting motion and receiving a phone call at the same time. This was generally due to the
amount of memory being used and the processing of video. I would imagine that more
memory (becoming common in devices) and better quality video that requires less processing
time will improve this.
7.2. Contributions to Research Three significant contributions to the research process have been covered within this
document. First and foremost is the design and commencement of an input model that relies as
motion as the stimulus. Related models have been developed in the past, but little work has
been carried out towards motion based mobile device input. This model encompasses a far
wider scope that those of the past (typically just text input), and does not confine the
implementers of the model to limited motions. The only limitation lies on what data can be
gathered and interpreted from the camera. The findings also show how users also typically
interact with the device in general situations so the findings can be used as a guideline of
motions users prefer to use.
Secondly, I have developed and documented a process for developing DirectShow filters on
mobile devices. Information for creating filters upon the desktop is extremely limited, actual
filter development on mobiles is very close to non-existent, even in the research field.
Information regarding mobile filters that provide information back to the program using them
does not exist, so to my knowledge, this is a new field. Such filters are capable of a lot of
functionality and work on a framework that offers up great performance and options once it is
understood. These options remain on the mobile platform once the implementation limitations
are overcome by either code or design.
Finally there is the development of automated surveys to aid in the information collection
procedure. Such surveys when designed properly are capable of collecting large amounts of
data that standard surveys cannot at only a slightly increased cost. Findings and experiences
for the creation and execution of such surveys are all included. Further understanding of the
capabilities of the devices and how to work with them will greatly enhance the directions
surveys can take. Mobile kiosks can be used as a form of data collection from users. Moving
An intuitive motion based model for mobile devices – Conclusions and Future Work
126 | P a g e
this to a smaller, portable device with many more input mechanisms (camera, audio as well as
the touch) open up many more opportunities. Include such data such as location, time and
scene information that is only lightly covered here and the possibilities are endless.
7.3. Limitations Situations occurred (particularly in the implementation phase) that hindered the progression of
this project. While such troubles were expected because of the working with smaller devices,
the workarounds are far from ideal. For the implementation to be truly viable, several things
should (and hopefully will) happen with the next iteration of mobile devices, and Microsoft
operating system.
• An increased viable resolution without the dropping of frames
Detecting valid information at low resolutions (176x144) is very difficult. You lose a lot of
the data that can be found when working at a higher resolution. In particular it is very hard to
pick up shadows at the low resolution, something that usually gives very good polygon
information on the desktop. This needs to come with improved performance so there is no
drop in frame rate, as a good frame rate is required to properly keep track of motion.
• Better direct access to memory
Suddenly losing pointer information inside Windows Mobile 5 played havoc with
development times as it simply did not make sense that data was getting corrupted. Only once
it became apparent that the OS itself was to blame (trying to clean up memory) did solutions
become clear. Sadly this solution was to actually use more memory and processor time to
ensure that the pointers that were getting scrapped were being continually accessed, or
swapped to yet another pointer. Such awkward work-arounds should not be required, and
better documentation by Microsoft to inform developers is required.
• A more complete SDK
Much functionality just isn’t there when it really should be. I found many instances where I
had to rewrite functions already available for the desktop counterpart (to handle very obvious
functionality).
An intuitive motion based model for mobile devices – Conclusions and Future Work
127 | P a g e
7.4. Potential Applications Originally this model was designed to encompass the entire operating system it became more
and more apparent over time that the model would work on the application as well. Such a
design would allow developers to use the model to it’s fullest since they would have complete
control over the context and have no competing pre-defined motions at the OS level. With
complete control over the model and what to define, applications can take the fullest advantage
of this model.
Applications which could employ such a model are numerous. Applications that revolve
around movement and rotations are the obvious benefactors of the model. Applications which
employ the use of 3D renders can benefit from the ability of the users to truly walk around a
model and get a rotated display while moving forward and back would zoom the item in focus.
Again these are natural reactions to get a specific result, which can easily be mapped to
motion.
Simple navigation through menus also remains a powerful use of the model. The ability to
move through options using only the single hand that holds the device is advantageous in
many situations, something that is not possible with a touch-screen interface.
Following the same path, applications that generally try to avoid direct input from the user can
also aid from the model. GPS Navigation systems for motorized vehicles use signals to
determine direction and are usually situated on the dashboard at eye level. At this level the
device’s camera would have sight much the same as the user. This can produce much finer
information that what is available via GPS. Minor turns and direction information can be used
to augment the input information given by the GPS satellites.
An intuitive motion based model for mobile devices – Conclusions and Future Work
128 | P a g e
Location based gaming would also benefit from the model. Many device’s interfaces are not
built towards gaming at all, so a way to gather information from the user quickly is sadly
lacking. Augmenting a motion based input scheme on top of these games could make them
much more intuitive. Walking forward in the world could very well translated to the walking
forward of your avatar in a virtual world.
7.5. Future Work Once hurdles are overcome then there are many different directions for this project. The
obvious is further work on the algorithm to collect motion information. However with
DirectShow a far more interesting approach can be taken for this concept. Since DirectShow
works by plugging in and pulling out components there is little reason why a series of
detection filters could not be used. Scanning of image information should be able to tell which
filters will work better than others (a lack of black or a low resolution would indicate the filter
developed for this research would not work optimally).
This scanning of information itself could be a filter placed before our detection filter in the
graph. If a detection filter is failing to find much information then it could be swapped out for
another that should be more successful. All these filters would need to communicate in the
same way with other components so a base framework for detection filters would have to be
developed. All the filters report back the same data (be it location information for the object
being track, or at a higher level actual motion vectors themselves), but the inside workings
would be totally different.
Such a framework would also become useful in domain specific situations. If a device is more
likely to be used near a assembly line then the filters can be fine tuned for that information,
and filters more appropriate take priority.
Such development, along with development of the domain specific inputs required by the
applications would create the best combination for a true motion based input system for mobile
devices.
An intuitive motion based model for mobile devices – Appendix A – Sample Inputs
129 | P a g e
Appendix A – Sample Inputs These inputs are designed as a test bed of sample inputs to be used throughout the research
procedure.
Common (Global) Motions ( A )
(AA) Scroll Up
(AB) Scroll Down
(AC) Scroll Left
(AD) Scroll Right
(AE) Select Option
(AF) Number Input
(AG) Left Soft Key
(AH) Right Soft Key
(AI) Power
Web Browsing ( B )
(BA) Go To
(BB) Refresh
(BC) Previous Link
(BD) Next Link
(BE) Follow Link
(BF) Zoom
(BG) Favourite Menu
(BH) Options Menu
(BI) Home
Photo Album ( C )
(CA) Change View (Details/Thumbnails)
Picture Viewing (from Photo Album) ( D )
(DA) Pan Up
(DB) Pan Down
(DC) Next Image
(DD) Pan Left
(DE) Pan Right
(DF) Previous Image
(DG) Zoom In
(DH) Zoom Out
In Call ( E )
(EA) End Call
(EB) Mute Call
(EC) On Hold
(ED) Increase Volume
(EE) Decrease Volume
Outside Call ( F )
(FA) Answer Call
(FB) Key Lock
Phone Book/ Contacts ( G )
(GA) Go to Details
(GB) Call Number
(GC) New Information
(GD) Store Edited Info
An intuitive motion based model for mobile devices – Appendix A – Sample Inputs
130 | P a g e
Media Player ( H )
(HA) Open File/Playlist
(HB) Play
(HC) Stop
(HD) Mute
(HE) Next File
(HF) Previous File
(HG) Volume Up
(HH) Volume Down
(HI) Randomise Playlist
(HJ) Clear Playlist
Voice Notes ( I )
(IA) Start Recording
(IB) End Recording
(IC) New Note
(ID) Replay Note
(IE) Next Note
(IF) Previous Note
Calendar/Task Scheduler ( J )
(JA) Change Time View
(JB) Next
(Day/Week/Month)
(JC) Mark as Completed
(JD) Previous
(Day/Week/Month)
(JE) Add Task
(JF) Remove Task
File Manager ( K )
(KA) View Type
(KB) New Directory
(KC) Switch Storage
(KD) Cut File
(KE) Copy File
(KF) Paste File
(KG) Change File Properties
(KH) Device
Text Input ( L )
(LA) New Line
(LB) Symbol List
(LC) Change Input Type
(T9/abc)
(LD) Change Input Language
(LE)
Upper/Lowercase/Caselock
Camera ( M )
(MA) Change Filter
(MB) Zoom In
(MC) Zoom Out
General Interactions ( N )
(NA) Exit Application
(NB) Landscape/Portrait Mode
An intuitive motion based model for mobile devices – Appendix B – Input Type Breakdown
131 | P a g e
Appendix B – Input Type Breakdown
This table breaks down the inputs listed in Appendix A and classifies them into their
appropriate input types.
Table 4: Input Breakdown
T Choosing Selection Confirm Adjust Movement Function Menu Modify
AA ●
AB ●
AC ●
AD ●
AE ●
AF ●
AG ●
AH ●
AI ●
BA ●
BB ●
BC ●
BD ●
BE ●
BF ●
BG ●
BH ●
BI ●
CA ●
DA ●
DB ●
DC ●
DD ●
DE ●
An intuitive motion based model for mobile devices – Appendix B – Input Type Breakdown
132 | P a g e
DF ●
DG ●
DH ●
EA ●
EB ●
EC ●
ED ●
EE ●
FA ●
FB ●
GA ●
GB ●
GC ●
GD ●
HA ●
HB ●
HC ●
HD ●
HE ●
HF ●
HG ●
HH ●
HI ●
HJ ●
IA ●
IB ●
IC ●
ID ●
IE ●
IF ●
JA ●
JB ●
An intuitive motion based model for mobile devices – Appendix B – Input Type Breakdown
133 | P a g e
JC ●
JD ●
JE ●
JF ●
KA ●
KB ●
KC ●
KD ●
KE ●
KF ●
KG ●
KH ●
LA ●
LB ●
LC ●
LD ●
LE ●
MA ●
MB ●
MC ●
NA ●
NB ●
An intuitive motion based model for mobile devices – Appendix C – Survey One Handout
134 | P a g e
Appendix C – Survey One Handout
Would you like to take a survey??
Part 1 - General
The following questions are presumed to be answered to the best of the participants’
knowledge.
How long have you been using mobile phones?
Please supply a list of mobile phones you have used in the past.
What functionality does your current phone contain (if known) and which of these do/would
you use?
Functionality
Has Use
Make calls □ □ Send/Receive SMSes □ □ Send/Receive MMSes □ □ Use Voice Notes □ □ Play games □ □ Store Contact Information □ □ Plan Appointments/Schedule, Track Calender □ □ View Pictures □ □ Play Music □ □ Watch Movies □ □ Transfer data to PC □ □ Take Photos/Movies □ □ Send/Receive Emails □ □
An intuitive motion based model for mobile devices – Appendix C – Survey One Handout
135 | P a g e
Functionality
Browse Web □ □
Do you typically operate your device one-handed , or two-handed
Do you believe you are confident with the day-to-day usage of your phone. Yes No
Part 2 - Motion (Hand Focus)
The following information is all motion based and therefore will have to be captured via video.
The questions are designed to be slightly abstract and the answers are expected to be the same.
Participants will be given a small box (as a mobile phone mock-up) to manipulate in an
attempt to give answers to the following questions. There is not expected to be a right or
wrong answer, answers are simply what the participants believe would be the most appropriate
physical input for the question.
If the participant does not understand the question then it to be skipped. It is also assumed that
the participant has prior knowledge of the following questions and has had time to formulate
answers that seem the most appropriate to them.
Please perform natural motion that you believe would best interpret the choosing an object in a
vertical list (demonstrate movement down and up the list and the choosing of the object)
A motion to confirm (say yes) to an action.
A motion to deny (say no) to an action.
Rotate an object on the screen to the left. Example:
There is the number 18 in a box, how would you increase this by 2 (to 20).
How would you increase the device's volume while talking on the phone.
An intuitive motion based model for mobile devices – Appendix C – Survey One Handout
136 | P a g e
Reload a web page that is being currently viewed.
Answer an incoming phone call.
Pan right while viewing an image.
If you have any concern over ethical issues in regards to this survey, please contact the
Queensland University of Technology Research Ethics Officer on (07) 3864 2340 or via email:
An intuitive motion based model for mobile devices – Appendix D – Survey One Participant Breakdown
137 | P a g e
Appendix D – Survey One Participant Breakdown
15- 18 19- 22 23- 26 27- 30 31- 34 35- 37Age 3 5 11 4 3 4
Figure D-1: Age of Participants, Survey One
8
4
3 3
2 2 2 2
1 1 1
Australian Born Anglo-Saxon
Australian Born Chinese
Mainland Chinese (incl. HK)
Singaporean
United Kingdom
Vietnamese
Brazilian
Samoan
Nigerian
New Zelander
Korean
Figure D-2: Nationality of Participants, Survey One
An intuitive motion based model for mobile devices – Appendix D – Survey One Participant Breakdown
138 | P a g e
11
65
4
2 2
Completed High School
TAFE/Community College
Undertaking Degree
Completed Degree
Undertaking/Completed Postgraduate
Year 10
Figure D-3: Education of Participants, Survey One
4 4
3 3 3 3 3
2 2
1 1 1
Study OnlyBuilding/ConstructionMaintenance/CleaningCateringUnemployedCombinationITLegal/AccountingOfficeManagerialEngineeringTourism/Travel
Figure D-4: Employment of Participants, Survey One
An intuitive motion based model for mobile devices – Appendix F – Survey Two Results
139 | P a g e
Appendix E – Survey Two Audio
1: In this test please hold the device at a 45deg Angle and control it so that the pointer in the
centre of the screen would move to and select the asterisks around the screen
2: A picture will be displayed in this test please hold the device at a 45deg angle view the
picture and react naturally.
3: Please hold the device in a comfortable position away from objects. Video will be taken
from the camera and displayed to you much like a viewfinder. Look and interact with this
viewfinder
4: A selection of faces will appear on the screen. While holding the device at a 45deg angle
please react by agreeing with the happy faces while disagreeing with the sad. Try to
incorporate device motion into this.
5. Hold the device at a 45deg angle, text will be displayed on the screen, attempt to speed read
it.
6. Place this device on the table, it will ring so react naturally to this. Hang up the phone when
you get the engaged signal. This will be repeated twice, so please repeat you actions
An intuitive motion based model for mobile devices – Appendix F – Survey Two Results
140 | P a g e
Appendix F – Sample Survey Two Results
Results on the following pages show a visualization of the recorded video collected for the six
different tests performed for the second survey. They are classified in via the surveys they
were recorded in and what the suspect input the motion was attributed is. Classifications and
visualisations are included. The occurrences column is a value that can be up to 10 and
denotes how common that motion was for that command. Therefore all motions for one input
should add up to 10. This sample data shows the information collected for users trying to
select the first asterisk in the first test.
Failed attempts are not included in this data, the reasons for these failed inputs is summarized
in Section 4.4.6.
An intuitive motion based model for mobile devices – Appendix F – Survey Two Results
141 | P a g e
Table 5: Sample 1, Survey Two Motion Breakdown
Survey Assumed Input Input Type Motion Explanation Motion Type Visualization Occurrence
Choosing Traverse to Up Left Asterisk Choosing Move Direction Left,
Move Direction Up Direction
4
Move Direction Up, Move Direction Left
3
Move Direction Up-Left
3
Selecting Asterisk Choosing Push Forward Direction
2
Push Forward, Return to neutral Push Forward
3
Rotate Down (Y-axis) Rotation
1
Nothing - - 2
Traverse to Left Asterisk Choosing Move Direction Left, Direction
8
Rotate Left (X-axis) Rotation
An intuitive motion based model for mobile devices – Appendix G – Base Input Compression
142 | P a g e
Appendix G – Base Input Compression
Table 6: Input Compression Part 1, Survey Two
Input Type Possible Motions Adjuster
Confirmation (Yes) Rotation Left then Right (X
Axis),
Direction Left,
Rotation Left (Z Axis)
Confidence of Answer
Speed of Rotation, Direction
Angle of Rotation.
Confirmation (No) Rotation Up then Down (Y Axis)
Direction Right,
Rotation Right (Z Axis)
Confidence of Answer
Speed of Rotation, Direction
Angle of Rotation.
Movement (Move Left) Direction Left,
Rotation Left (X Axis)
Amount of Movement
Length of Direction,
Acceleration of Direction,
Angel of Rotation.
Movement (Move Right) Direction Right,
Rotation Right (X Axis)
Amount of Movement
Length of Direction,
Acceleration of Direction,
Angel of Rotation.
Movement (Move Down) Direction Down,
Rotation Down (Y Axis)
Amount of Movement
Length of Direction,
Acceleration of Direction,
Angel of Rotation.
Movement (Move Up) Direction Up,
Rotation Up (Y Axis)
Amount of Movement
Length of Direction,
Acceleration of Direction,
Angel of Rotation.
Movement (Rotate Up (Y)) Rotation Up (Y Axis),
Rotation Down (Y Axis)
Amount of Rotation
Angle of Rotation
Movement (Rotate Down (Y)) Rotation Down (Y Axis),
Rotation Up (Y Axis)
Amount of Rotation
Angle of Rotation
Movement (Rotate Left (X)) Rotation Left (X Axis) Amount of Rotation
An intuitive motion based model for mobile devices – Appendix G – Base Input Compression
143 | P a g e
Angle of Rotation
Movement (Rotate Right (X)) Rotation Right (X Axis) Amount of Rotation
Angle of Rotation
Movement (Rotate Left (Z)) Rotation Left (Z Axis) Amount of Rotation
Angle of Rotation
Movement (Rotate Right (Z)) Rotation Right (Z Axis) Amount of Rotation
Angle of Rotation
An intuitive motion based model for mobile devices – Appendix G – Base Input Compression
144 | P a g e
Table 7: Input Compression Part 2, Survey Two
Choosing (First Item
Selected)
Direction Forward and Back,
Direction Forward and Back * 2,
Rotation Down (Y Axis)
None
Choosing (Item Up One) Direction Up,
Rotation Up (Y Axis) Then
Direction Forward and Back,
Direction Forward and Back * 2,
Rotation Down (Y Axis)
None
Choosing (Item Up
Multiple)
Direction Up,
Direction Up - Direction Down * n
Rotation Up (Y Axis)
Rotation Up – Rotation Up (Y Axis) Then
Direction Forward and Back,
Direction Forward and Back * 2
Amount of Items Up
Length of Direction,
Angle of Rotation.
Choosing (Item Down
One)
Direction Down,
Rotation Down (Y Axis) Then
Direction Forward and Back,
Direction Forward and Back * 2,
Rotation Down (Y Axis)
None
Choosing (Item Down
Multiple)
Direction Down,
Direction Down - Direction Down * n
Rotation Down (Y Axis)
Rotation Down – Rotation Up (Y Axis) Then
Direction Forward and Back,
Direction Forward and Back * 2
Amount of Items Up
Length of Direction,
Angle of Rotation.
Selection (Start of
Selection)
Start at Centre
Direction towards Point of Interest Then
Direction Forward and Back,
Direction Forward and Back * 2
Distance Travelled in
Selection
Length of Movement
Selection (Deselect) Direction Left, Direction Right None
An intuitive motion based model for mobile devices – Appendix H – Base Situations
145 | P a g e
Appendix H – Base Situations
Table 8: Context Relationships
ID Situation Related To
1 Base OS (Home Screen) 6, 9, 14
2 Music Player 12
3 Web Browser 8, 12, 14
4 Calendar (Week View) 5, 6, 14
5 Calendar (Month View) 4, 6, 14
6 Calendar (Daily View) 4, 5, 1, 14
7 Photo Album 8, 12, 14
8 Image Viewing 3, 7, 12
9 Clock Application 1
10 Phone Call 11
11 Contact List 10, 14
12 File Manager 2, 3, 7, 8, 14
13 Confirmation Window
14 Item is selected 1, 3, 4, 5, 6, 7, 11, 12
15 In hierarchical menu
16 Selecting an item (currently unselected)
An intuitive motion based model for mobile devices – Bibliography
146 | P a g e
Appendix I – Sample Motion Model
Figure I-1: Model Map, Direction Down
An intuitive motion based model for mobile devices – Bibliography
147 | P a g e
Figure I-2: Model Map, Direction Up
An intuitive motion based model for mobile devices – Bibliography
148 | P a g e
Figure I-3: Model Map, Direction Left
An intuitive motion based model for mobile devices – Bibliography
149 | P a g e
Figure I-4: Model Map, Direction Right
An intuitive motion based model for mobile devices – Bibliography
150 | P a g e
Bibliography
1. Amant, R. S., Horton, T. E., & Ritter, F. E. (2004 ). Model-based evaluation of cell
phone menu interaction In Proceedings of the SIGCHI conference on Human factors in
computing systems (pp. 343-350 ). Vienna, Austria ACM Press.
2. Blome, M. & Wasson, M. (2002, July). DirectShow: Core Media Technology in
Windows XP Empowers You to Create Custom Audio/Video Processing Components. MSDN
Magazine, Vol 17 No. 7.
3. Boyle, R & Thomas, R. (1988). Computer Vision: A First Course. Oxford, UK:
Blackwell Scientific Publications.
4. Brewster, S., Lumsden, J., Bell, M., Hall, M., & Tasker, S. (2003 ). Multimodal 'eyes-
free' interaction techniques for wearable devices In Proceedings of the SIGCHI conference on
Human factors in computing systems (pp. 473-480 ). Ft. Lauderdale, Florida, USA ACM Press.
5. Canny, J. (1986 ). A computational approach to edge detection IEEE Trans. Pattern
Anal. Mach. Intell. , 8 (6 ), 679-698.
6. Charmandari, E., et al. (2005). ENDOCRINOLOGY OF THE STRESS RESPONSE.
Annual Review of Physiology, 67, pp259-284.
7. Chen, J. S., & Medioni, G. (1989 ). Detection, Localization, and Estimation of Edges
IEEE Trans. Pattern Anal. Mach. Intell. , 11 (2 ), 191-198.
8. Chesnut, C. (2006). /cfMDX : Windows Mobile DirectX and Direct3D. Retrieved
21/3/2006 from http://www.mperfect.net/cfMDX/
9. Cowburn, N. (2004). Using the Integrated Camera in HTC Devices from Managed
Code. Retrieved 12/04, 2005, from http://blog.opennetcf.org/ncowburn/PermaLink,guid,5f0ebbac-8199-4ad1-aaa5-
5e84af695359.aspx
An intuitive motion based model for mobile devices – Bibliography
151 | P a g e
10. Crossan, A., Murray-Smith, R., Brewster, S., Kelly, J., & Musizza, B. (2005 ). Gait
phase effects in mobile interaction In CHI '05 extended abstracts on Human factors in
computing systems (pp. 1312-1315 ). Portland, OR, USA ACM Press.
11. Davies, E. (1990). Machine Vision: Theory, Algorithms and Practicalities. London, UK:
Academic Press.
12. Goldberg, D., & Richardson, C. (1993 ). Touch-typing with a stylus In Proceedings of
the SIGCHI conference on Human factors in computing systems (pp. 80-87 ). Amsterdam, The
Netherlands ACM Press.
13. Goodman, J., Venolia, G., Steury, K., & Parker, C. (2002 ). Language modeling for soft
keyboards In Eighteenth national conference on Artificial intelligence (pp. 419-424 ).
Edmonton, Alberta, Canada American Association for Artificial Intelligence.
14. Gluckman, S. K. N. (1998). Ego-motion and omnidirectional cameras. Paper presented
at the Sixth International Conference on Computer Vision.
15. Human Interface Technology Lab, University of Washington, WA, USA. (2005).
ARToolkit Home Page. Retrieved 1/11/2005 from http://www.hitl.washington.edu/artoolkit/
16. MacKenzie, I. S., & Buxton, W. (1992 ). Extending Fitts' law to two-dimensional tasks
In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 219-226
). Monterey, California, United States ACM.
17. Marentakis, G. B., S. A. (2004). A Study on Gestural Interaction with a 3D Audio
Display. Paper presented at the Mobile HCI 2004, University of Strathclyde, Glasgow, Scotland.
18. Microsoft Corporation. (2003). Smart Client Developer Center. Retrieved 12/03/2005,
2005, from http://msdn.microsoft.com/smartclient/community/cffaq/default.aspx
19. Microsoft Corporation. (2005). .NET Compact Framework. Retrieved 9/7/2005 from http://msdn2.microsoft.com/en-us/netframework/aa497273.aspx
An intuitive motion based model for mobile devices – Bibliography
152 | P a g e
20. Microsoft Corporation. (2005). MSDN Home Page (Australia - English). Retrieved
11/8/2005 from http://msdn2.microsoft.com/en-au/default.aspx
21. Microsoft Corporation (2005). Windows Mobile DirectX and Direct3D. Retrieved
17/3/2006 from http://www.mperfect.net/cfMDX/
22. Microsoft PressPass (2005). Microsoft Releases Windows Mobile 5.0. Retrieved
11/05/2005, 2005, from http://www.microsoft.com/presspass/press/2005/may05/05-10WindowsMobile5PR.asp
23. Nesbat, S. B. (2003 ). A system for fast, full-text entry for small electronic devices In
Proceedings of the 5th international conference on Multimodal interfaces (pp. 4-11 ).
Vancouver, British Columbia, Canada ACM Press.
24. Nodelman, V. (2004). OOP via C++, C\#...? In Proceedings of the 9th annual SIGCSE
conference on Innovation and technology in computer science education (pp. 255-255). Leeds,
United Kingdom: ACM Press.
25. Oliver, N., Pentland, a. (1999). DyPERS: Dynamic Personal Enhanced Reality System.
Retrieved 26/09, 2005, from http://research.microsoft.com/~nuria/dypers/dypers.htm
26. Oney, W. (2002). Programming the Microsoft® Windows® Driver Model (2nd ed.).
Redmond, WA, USA: Microsoft Press.
27. OpenNETCF Consulting L. (2003-2004). OpenNETCF.org. Retrieved 02/03, 2005,
from http://www.opennetcf.org/CategoryView.aspx?category=Home
28. Paelke, V., Reimann, C., & Stichling, D. (2004 ). Kick-up menus In CHI '04 extended
abstracts on Human factors in computing systems (pp. 1552-1552 ). Vienna, Austria ACM
Press.
29. Palm, Inc. (2005). Graffiti 2 Writing Software. Retrieved 12/6/2005 from http://www.palm.com/us/products/input/graffiti2.html
An intuitive motion based model for mobile devices – Bibliography
153 | P a g e
30. Silicon Graphics, Inc. (2006). OpenGL – The Industry Standard for High Performance
Graphics. Retrieved 13/3/2006 from http://www.opengl.org/
31. Strachan, S., et al. (2004). Dynamic Primatives for Gestural Interaction. Paper presented
at the Mobile HCI 2004, University of Strathclyde, Glasgow, Scotland.
32. Stringfellow, C. V., & Carpenter, S. (2005 ). An introduction to C\# and the .Net
framework J. Comput. Small Coll., 20 (4), 271-273
33. Studierstube, Graz University of Technology, Austria. (2005). Handheld Augmented
Reality. Retrieved 4/11/2005 from http://studierstube.icg.tu-graz.ac.at/handheld_ar/
34. Tegic Communications, Inc. (2005). T9 Text Input. Retrieved 7/6/2005 from http://www.t9.com
35. Vernon, D. (1991). Machine Vision. Upper Saddle River, NJ, USA: Prentice-Hall.
36. Vito Technology (2006). VITO Remote – Pocket PC infrared universal remote control.
Retrieved 13/2/2006 from http://www.vitotechnology.com/en/products/remote.html
37. Wigdor, D., & Balakrishnan, R. (2003). TiltText: using tilt for text input to mobile
phones. In Proceedings of the 16th annual ACM symposium on User interface software and
technology (pp. 81-90). Vancouver, Canada: ACM Press.
38. Wilkens, L. (2003 ). The joy of teaching with C\# J. Comput. Small Coll. , 19 (2 ), 254-
264
39. Williamson, J., & Murray-Smith, R. (2004 ). Pointing without a pointer In CHI '04
extended abstracts on Human factors in computing systems (pp. 1407-1410 ). Vienna, Austria
ACM Press.
40. Williamson, J. R. M.-S. (2005). Hex: Dynamics and Probabilistic Text. Switching and
Learning.
An intuitive motion based model for mobile devices – Bibliography
154 | P a g e
41. Windows for Devices. (2005). Smartphones™ and Pocket PC™ Phones Quick Reference
Guide. Retrieved 12/05, 2005, from
http://www.windowsfordevices.com/articles/AT2468909181.html
42. Zhu, Q. (1992 ). Improving edge detection by an objective edge evaluation In
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing: technological
challenges of the 1990's (pp. 459-468 ). Kansas City, Missouri, United States ACM Press.