smartwheeler: a robotic wheelchair test-bed for ......wheelchair, s/he may require substantial...

SmartWheeler: A Robotic Wheelchair Test-Bed for InvestigatingNew Models of Human-Robot Interaction

Joelle Pineau and Amin Atrashfor the SmartWheeler team1

School of Computer ScienceMcGill University

Montreal, QC H3A [email protected]

Abstract

The goal of the SmartWheeler project is to increase the au-tonomy and safety of individuals with severe mobility impair-ments by developing a robotic wheelchair that is adapted totheir needs. The project tackles a range of challenging issues,focusing in particular on tasks pertaining to human-robot in-teraction, and on robust control of the intelligent wheelchair.The platform we have built also serves as a test-bed for vali-dating novel concepts and algorithms for automated decision-making onboard socially assistive robots. This paper intro-duces the wheelchair platform, and outlines technique contri-butions in four ongoing research areas: adaptive planning inlarge-scale environments, learning and control under modeluncertainty, large-scale dialogue management, and commu-nication protocols for the tactile interface.

IntroductionMany people who suffer from chronic mobility impairments,such as spinal cord injuries or multiple sclerosis, use a pow-ered wheelchair to move around their environment. How-ever, factors such as fatigue, degeneration of their condition,and sensory impairments, often limit their ability to use stan-dard electric wheelchairs.

The SmartWheeler project aims at developing—in collab-oration with engineers and rehabilitation clinicians—a pro-totype of a multi-functional intelligent wheelchair to assistindividuals with mobility impairments in their daily locomo-tion, while minimizing physical and cognitive loads.

Many challenging issues arise in this type of applica-tion. First, there are a number of technical issues pertain-ing to the physical design of the wheelchair; these are onlybriefly mentioned below. Second there are substantial com-putation issues pertaining to the control of the wheelchairwhich require close attention. This paper outlines ongo-ing work targeting a number of these aspects, ranging fromnew approaches to path planning, to technical innovationsfor model learning, to the design of the human-robot controlinterface.

Beyond its technological components, an essential aspectof this project is a strong collaboration with clinicians, to

1 The SmartWheeler team at McGill University includes: AminAtrash, Jeremy Cooperstock, Robin Jaulmes, Robert Kaplow, NanLin, Andrew Phan, Chris Prahacs, Doina Precup and Shane Saun-derson.

ensure the definition of goals for the mobility functions, forthe patient/wheelchair and environment/wheelchair interac-tions, as well as for the experimental validation of the smartwheelchair.

Our aim is to show that the robotic wheelchair reduces thephysical and cognitive load required to operate the vehicle.We are therefore focusing on high-load situations, such asnavigating in confined spaces (e.g. entering/exiting an eleva-tor or a public washroom), stressful situations (e.g. exiting abuilding during a fire alarm), or unknown environments (e.g.transferring flights through a new airport).

Most of these tasks require basic robot navigation capa-bilities (mapping, localization, point-to-point motion). Werely substantially on previous technology to implement thesefunctionalities. Some challenges remain pertaining to ro-bust navigation in large-scale environments. To address this,we discuss a novel approach for variable-resolution planningunder motion and sensory uncertainty.

We are also concerned with the design of the physical in-terface between the robotic wheelchair and its user. A keyaspect of the patient/wheelchair interface involves creatingcommunication protocols that can ensure high-quality infor-mation exchanges, minimizing the ambiguity, and progres-sively improving effectiveness of the interactions over time.We investigate two such protocols: a voice-based dialoguesystem, and a tactile/visual interface system. Both are dis-cussed below.

Target population

The goal of this project is to increase the autonomy andsafety of individuals with severe mobility impairments bydeveloping a robotic wheelchair that is adapted to theirneeds.

Through discussions with clinical collaborators at theCentre de readaptation Constance-Lethbridge and Centre dereadaptation Lucie-Bruneau, two rehabilitation clinics in theMontreal area, we have selected a target population for thiswork, along with a set of challenging tasks. We have chosento define the target population based on theirabilities, ratherthan theirpathologies. The motivation for doing so is thatwe can use uniform measures of performance across the tar-get population, thereby allowing us to gauge the usefulnessof the deployed robotic system.

Individuals of interest will be those who meet the re-duced mobility criteria necessary to qualify for a pow-ered wheelchair under the Regie de l’assurance maladie duQuebec (the provincial public health board). There are wellestablished guidelines for applying this criteria, and our clin-ical collaborators have long expertise in evaluating these.

Once an individual is approved for use of a poweredwheelchair, s/he may require substantial configuration of thevehicle to achieve maximum usability. In particular, manypatients require custom interfaces that go beyond the stan-dard joystick, for example sip-and-puff devices, or pressuresensors that can be activated with minimum head, chin orhand control. Yet despite clinicians’ significant customiza-tion efforts, control of a powered wheelchair (even with ajoystick) remains a significant challenge. According to a re-cent survey, 40% of patients found daily steering and maneu-vering tasks to be difficult or impossible, and clinicians be-lieve that nearly half of patients unable to control a poweredwheelchair by conventional methods would benefit from anautomated navigation system (Fehr et al., 2000).

Robot platformSmartWheeler, shown in Figure 1, is built on top of acommercially available Sunrise Quickie Freestyle, to whichwe have added front and back laser range-finders, wheelodometers, a touch-sensitive graphical display, a voice in-terface, and an onboard computer. The laser range-findersand odometers are used for navigation and obstacle avoid-ance. The display, voice interface, and joystick are the mainmodes of communication with the user. The onboard com-puter interfaces with the wheelchair’s motor control boardto provide autonomous navigational commands. Additionaldevices will be integrated in the future, including stereo vi-sion, IR sensors, and a modified joystick. All hardware andelectronic design were performed in-house by staff membersat McGill’s Center for Intelligent Machines.

Figure 1: SmartWheeler robot platform.

The robot’s basic mapping and navigation functions areprovided by the Carmen robot navigation toolkit (Monte-merlo et al., 2003). This toolkit can be adapted to a variety

of robot platforms and has been used in the robotics com-munity for the control of indoor mobile robots in a variety ofchallenging environments. The toolkit can be used to builda high-resolution 2-D grid-based representation of an indoorenvironment. When used for online navigation, it providesrobust laser-based robot localization, obstacle detection, andpath planning.

Carmen is particularly useful for validation of new algo-rithms because its simulator is known to be highly reliableand policies with good simulation performance can typicallybe ported without modification to the corresponding robotplatform.

However there are some limitations to the current soft-ware, and we are actively developing a number of new capa-bilities, for example:

• Detection and avoidance of negative obstacles (e.g. down-ward staircase).

• Robust point-to-point planning and navigation.

• Shared control between autonomous controller and hu-man user.

• Adaptive (user-specific) control strategy.

• Adapted interface for low-bandwidth communication.

The remainder of the paper discusses four ongoing areasof research pertaining to this project.

Adaptive planning in large-scale environmentsAs part of its task domain, the robot will be called upon tonavigate robustly in very large environments. There exists anumber of well known approaches for robot path planning,however they tend to roughly fall into two classes.

The first group assumes deterministic effects on the partof both the robot and the environment; these methods cantherefore scale to high-dimensional domains but are not ro-bust to uncertainty in the motion or sensor model. An exam-ple of such algorithm is the variable resolution cell decom-position technique.

The second group considers probabilistic motion effectsand sensor readings; these methods are therefore robust touncertainty, but generally scales poorly and can only handlesmall environments. An example of such an algorithm is thePartially Observable Markov Decision Process (POMDP)framework.

We aim to combine these two types of techniques, in anattempt to devise a planning approach that affords the com-putational flexibility of variable resolution techniques withthe robustness of POMDs. Before describing our approach,we review briefly the POMDP framework.

The POMDP is defined by the n-tuple:{S, A,Z, T,O,R}whereS defines the state space,A defines the action space,Z defines the set of observations,T = Pr(s′|s, a) de-fines the state-to-state transition probabilities (e.g. motionmodel),O = Pr(z|s, a) defines the probability of seeingeach observation (e.g. sensor model), andR(s, a) defines areal-valued reward function. Unlike many traditional plan-ning paradigms, in POMDPs the state of the system is notnecessarily known, but can be inferred probabilistically from

the observation sequence. To do this, we track a belief state,b(s) = Pr(st = s|zt, at−1, zt−1, ..., a0, z0, b0), which de-fines the probability of each state at a given time stept. Thegoal of the POMDP is to select a sequence of actions suchas to maximize the sum of rewards over time. This is de-fined formally as:V (b) = R(b, a) +

∑b′ T (b, a, b′)V (b′).

Further details on POMDP solution techniques can be foundin the literature (Kaelbling et al., 1998; Hauskrecht, 2000;Pineau et al., 2006). For the purposes of this paper, it issufficient to say that planning complexity increases quadrat-ically with the size of the state space. Thus it is crucial tofind a good compact state representation to tackle planningin large-scale environments.

With this goal in mind, we have developed a new ap-proach to planning in metric environments with unifiesthe variable resolution cell decomposition and POMDP ap-proaches. The variable resolution technique allows us toselect the appropriate state representation for the environ-ment. This automatically discretizes the robot’s navigationspace using fine grid resolution near obstacles or goal ar-eas, and coarser discretization in large open spaces. Oncewe have computed the discretization, we must infer a prob-abilistic motion model to capture the robot’s motion uncer-tainty. This is done using sampling techniques; the robotruns through a number of trajectories and the motion modelis estimated from statistics computed over these trajectories.A probabilistic sensor model is also needed; this is gener-ally crafted by hand based on expert knowledge. In the nextsection we discuss how to incorporate automated learning inthis phase of the approach. Given a variable resolution staterepresentation and corresponding parameterized models, weare able to compute an optimal path planning strategy overthe entire state space. This is done using recent POMDPsolution techniques (Pineau and Gordon, 2005).

Figure 3 shows a sample map taken from (Carmen, 2006),along with the the state representation that was extracted byour approach. As expected, large rooms receive coarse reso-lution, whereas hallways and narrow areas show finer reso-lution. For this map, the use of the variable cell decomposi-tion yielded a 400-fold reduction in computation, comparedto using the full map resolution. Further reduction could beachieved at the expense of some loss in performance. Thusthe variable resolution approach allows us to trade-off plan-ning time and plan quality in a flexible manner.

Learning and control under model uncertainty

The POMDP framework requires a known parametric modeldefining the dynamics of the problem domain. The paramet-ric model contains two parts: themotionmodel (or transitionprobabilities) and thesensormodel (or observation proba-bilities). In the section above, we assumed that the motionmodel was learned by acquiring motion samples and build-ing a parametric representation. We also assumed that theobservation model was given by an expert. Both of these as-sumptions can be problematic in some realistic applications.Learning a model from samples is feasible in cases wheredata is plentiful and inexpensive, for example when a simu-lator is available, however the quality of the produced model

Output Of Algorithm

Using Variable Resolution Techniques Under a POMDP Framework for Robotic Path Planning – p. 6Figure 2: Variable resolution map

is only as good as the simulator’s fidelity. Alternately, ask-ing a domain expert to specify a parametric model is feasiblein some domains, for example in the physical world wherethere are natural constraints. However in other cases, for ex-ample human-robot interaction, acquiring an accurate modelcan be challenging because of our poor understanding of thedynamics that occur in that domain.

Ideally, we would like to combine both expert knowledgeand data-drive learning to produce a more flexible approachto model acquisition. To achieve this, the key is to developnew ways of representing the POMDP paradigm, such thatmodel uncertainty is taken into account. The approach wepropose, called MEDUSA, relies on a Bayesian formulationof uncertainty, which is particularly appropriate to offer aflexible trade-off between a priori knowledge engineeringand data-drive parameter inference.

The approach is relatively simple. First, we assume anexpert specifies a prior on the model parametersP (M). Wethen observe dataY from standard trajectories. Assumingwe can specify a simple generative processP (Y |M), then itis straight-forward to apply Bayes rule and obtain a posteriormodelP (M |Y ) which combines both the expert knowledgeand the data acquired. Since POMDP model parameters aregenerally represented using multinomial distributions, it isconvenient to represent the model prior (and posterior) us-ing Dirichlet distributions, which are the conjugate prior forthe multinomial. There is one more obstacle: to apply thismethod we need to know thestatecorresponding to eachdata point acquired, since this will tell us which parameterto update. But the state is not usually given in POMDPs.To overcome this, we assume access to an oracle which canidentify the state of the system when queried. This is a rel-atively strong assumption, in the POMDP context. Howeverit is standard in most other planning frameworks. We com-ment below on ways to relax this assumption.

Given these preliminaries, we formulate an algorithmwhich uses Dirichlet distributions to capture model uncer-tainty. The algorithm relies on a sampling of POMDP mod-els from these distributions to plan and select actions. As

learning progresses, the set of sampled models will gradu-ally converge to the correct model. Here is the full algo-rithm:1. Define Dirichlet priors over the model.

2. Sample a set of POMDPs from the distribution.

3. Solve each model using standard POMDP technique.

4. Use policy to select good actions.

5. Query the oracle and observe answer (s,a,z,s)

6. Increment Dirichlet parametersDir(αs,a,s), Dir(αs,a,z)7. Continue until convergence.Throughout, we maintain a weight indicating the probabilityof a sampled model. Models with low weights are droppedperiodically, and replaced by re-sampling the Dirichlet dis-tributions.

In practice, we also try to minimize the number of queriesto the oracle. For example, if the robot happens to be in apart of the state space that has already been well-explored,then it is not useful to query the oracle since no new informa-tion will be provided. In such cases, the robot should sim-ple behave optimally until it moves towards less-exploredregions. To accomplish this as part of MEDUSA, we con-sidered a number of heuristics designed to decide when theoracle should be queried, and when the robot should in-stead follow its policy. The method we settled on com-bines information about thevarianceover the value com-puted by each model, theexpected information gainthat aquery could yield, theentropy in the belief, and thenum-ber of recent queries. This aspect of MEDUSA is some-what ad-hoc. We have considered methods to formalize it,such as including the decision of whether to query withinthe POMDP decision-making (Jaulmes et al., 2005a). How-ever solving this optimally proves intractable for all but thesmallest problems (e.g. 2 states, 2 actions), therefore wecontinue to use the heuristics mentioned above.

Experimental validation of the MEDUSA technique hasfocused on a scenario where the SmartWheeler must navi-gate in an environment, with the goal of autonomously find-ing a caregiver that is also mobile. Similar versions of thisproblem have been studied before in the POMDP literatureunder various names (Hide, Tag, Find-the-patient). Previouswork always assumed a fully modeled version of this prob-lem, where the person’s location is unknown, but the per-son’s motion model is precisely modeled, as are the robot’ssensor and motion models. We now consider the case wherein addition to not knowing the person’s position, we are alsouncertain about the person’s motion model and the robot’ssensor model. In total, MEDUSA is trying to learn 52 dis-tinct parameters. We consider the environment shown inFigure 3. Planning and learning are done over a discretizedversion; the associated POMDP has 362 states, 24 observa-tions and 5 actions. We assume a fixed-resolution grid (fu-ture plans include integration of the method describe in theprevious section). Execution assumes the continuous staterepresentation and in that case belief tracking is done on-board Carmen using the full particle filter.

During learning, MEDUSA makes several queries aboutthe state. Since there is no model uncertainty about the

Figure 3: Map of the environment used for the robot simu-lation experiment.

robot’s motion, this is equivalent to asking the caregiver toreveal his/her position so that MEDUSA can infer his/hermotion model. The answer to the queries can be providedby a human operator, though for convenience of carrying outmultiple evaluations, in our experiments they are producedusing a generative model of the caregiver.

As we can see from Figure 4, MEDUSA converges withinroughly 12,000 time steps, after having received answers toapproximately 9,000 queries. While this may seem large,it is worthwhile pointing out that MEDUSA’s oracle can infact take the form of a high-precision sensor. It is realisticto assume for example that the caregiver will carry around aGPS sensor that can answer queries automatically during thelearning phase, and that this will play the role of the oracle.In such a setup, 9,000 queries seems a small price to payto obtain a full probabilistic model of the person’s motionmodel.

(a) Discounted reward as a function of the number of time steps.

(b) Number of queries as a function of the number of time steps.

Figure 4:Results for the robotic task domain.

The other reason why MEDUSA requires so many queriesfor this problem is that the experiment assumed completelyuninformed initial priors on the robot’s sensor model andthe caregiver’s motion model. Using a more informed priorwould lead to faster learning, but would require more knowl-edge engineering. Finally, to further reduce the number ofqueries, we could also build a simpler model with fewerDirichlet parameters, in effect assuming stronger correla-tions between model parameters.

Further information on this component is availablein (Jaulmes et al., 2005a; Jaulmes et al., 2005b; Jaulmes etal., 2007).

Large-scale dialogue managementA natural medium for communication between a user and anintelligent system is through voice commands. While manycommercial software solutions are available for speechrecognition and synthesis, there is no commercial equiva-lent for handling the actual dialogue (i.e. production of re-sponses by the robot). In fact, in a spoken dialogue system,determining which action the robot (or computer) shouldtake in a given situation is a difficult problem due to the un-certainty that characterizes human communication.

Earlier work pioneered the idea of POMDP-based dia-logue managers, but were limited to small domains (e.g. 2-3topics in a question-answer format). This prompted an in-vestigation of new techniques for tracking the dialogue stateand efficiently selecting dialogue actions in domains withlarge observation spaces. In particular, we study the appli-cability of two well-known classes of data summarizationtechniques to this problem.

We first investigated the use of the K-means algorithm tofind a small set of summary observations. The idea is tocluster the natural observationsZ into the clustersZ ′, suchthat observations with similar emission probabilities over allstates are clustered together. It is crucial to use thenor-malizedobservation probabilitiesPr(z|s, a)/Pr(z) (ratherthan the unnormalizedPr(z|s, a)) to ensure that observa-tions that are clustered together provide (near-)equivalent in-ference information over the set of states.

We also investigated the use of a dimensionality reduc-tion algorithm along the lines of Principal Component Anal-ysis (with a few added constraints) which finds a good low-dimensional representation of the observation probabilitymodel. The idea is to use the lower-dimensional projectionduring planning, which should reduce computation time.

Both methods have been used in the past to summa-rize high-dimensional data. In general, common wisdommight suggest that Principal Component Analysis yields ahigher quality solution, since it projects the entire obser-vation space. K-means has the advantage that it is usuallymuch faster to compute for very large observation sets.

Before implementing either method onboard the robot, wetested both with the SACTI dialogue corpus (Willians andYoung, 2004). The results we obtained suggest that the K-means clustering works well, even in this complex dialoguedomain. This is encouraging because the clustering algo-rithm is simple to implement, fast to compute, and generates

intuitive compressed representations. We also found thata constrained-based PCA performed on par with K-meanson this 450 word domain, however computation was signif-icantly slower. Further technical details and results are pro-vided in (Atrash and Pineau, 2005). We are now extendingthe results to topics relevant to the robot domain.

10 25 50 100 2006

7

8

9

10

11

12

13

14

15

K!meansPCA (constraints)PCA (normalized)All Observations

(a)

10 25 50 100 2000

2000

4000

6000

8000

10000

12000

14000

16000

18000

Observations

Tim

e (s

ec)

K!meansPCA (constraints)PCA (normalized)All Observations

(b)

Figure 5: Results for Dialogue POMDP: (a) Expected Re-ward and (b) Planning time

High-level goal specificationThe last component aims at designing and validating a newcommunication protocol for allowing users to provide high-level navigation commands to the wheelchair. Conventionalcontrol of a motorized wheelchair is typically done througha joystick device. For those unable to operate a standardjoystick, alternatives include sip-and-puff devices, pressuresensors, etc. Regardless of the device used, the user input setis restricted to displacement and velocity commands. Opera-tion of a wheelchair in this manner can result in fatigue overtime, as well it is often difficult to manoeuver the wheelchairin constrained spaces (e.g. elevators, crowded rooms, etc).

The prototype robotic wheelchair we are developing seeksto alleviate these challenges by allowing the users to specifyhigh-level navigation goals (e.g.Go to room 103.) This re-quires a new communication protocol which will allow theuser to input such commands.

The input protocol we proposed initially for specifyinghigh-level navigation goals was based on EdgeWrite (Wob-brock and Myers, 2006), a unistroke text entry method forhandheld devices, designed to provide high accuracy text en-try for people with motor impairments. We have adapted thismethod for the control of a motorized wheelchair by cus-tomizing the set of strokes, input constraints, and feedbackdisplay, to the task of wheelchair control.

User experiments currently under way are comparing en-try of robot navigation goals using: direct map selection,menu selection, and EdgeWrite gesture entry. For each in-put modality, the user is shown a floor map of a building onthe screen, and guided through a list of locations that mustbe selected quickly and accurately using the different inputselection methods. We measure error rate, input time, andmotion time needed to reach the target location. Early re-sults with a control population indicate that menu selection(from a static vertical list) was twice as fast as selecting thetargets directly on the map and three times as fast as enteringthe corresponding EdgeWrite symbol, which is as expectedfor this population. We must now replicate the experimentwith disabled users. Since this population has significantmotor constraints, we may obtain significantly different re-sults regarding the preferred mode of input.

DiscussionThis paper highlights some of the key components of theSmartWheeler project. We are currently working on their in-tegration, and planning out a sequence of experiments withthe target population. Through close collaborations with en-gineers and rehabilitation researchers, we hope to one dayhave a positive impact on the quality of life for individualswith severe mobility impairments.

It is worth noting that many of the techniques devel-oped in this project are not specific to the mobility-impairedpopulation, but are relevant to building service robots fora large number of applications. The SmartWheeler plat-form is proving to be an exciting new test-bed for explor-ing novel concepts and approaches in automated decision-making, human-robot interaction, and assistive robotics.

AcknowledgmentsThe SmartWheeler team acknowledges the financial supportof the Natural Sciences and Engineering Research Councilof Canada and the Canada Foundation for Innovation. Wealso thank Sunrise Medical Canada Inc. for their generousequipment donation.

ReferencesAtrash, A. and Pineau, J. “Efficient Planning and Track-ing in POMDPs with Large Observation Spaces”. AAAI-06 Workshop on Empirical and Statistical Approaches forSpoken Dialogue Systems. 2006.

Carmen-Team. Robot Navigation Toolkit.http://carmen.sourceforge.net. Retrieved 2006.Fehr, L., Langbein, E. and Skaar, S. B. “Adequacy of powerwheelchair control interfaces for persons with severe dis-abilities: A clinical survey”. Journal of Rehabilitation Re-search and Development, 37(3). pp. 353-360. 2000.Hauskrecht, M. “Value-Function Approximations for Par-tially Observable Markov Decision Processes”. Journal ofArtificial Intelligence Research. 13. pp.33-94. 2000.Jaulmes, R., Pineau, J. and Precup, D. “Active Learningin Partially Observable Markov Decision Processes”. Eu-ropean Conference on Machine Learning (ECML). LNAI3729. Springer-Verlag. 601-608. 2005.Jaulmes, R., Pineau, J. and Precup, D.. “ProbabilisticRobot Planning Under Model Uncertainty: An ActiveLearning Approach”. NIPS Workshop on Machine Learn-ing Based Robotics in Unstructured Environments. 2005.Jaulmes, R., Pineau, J. and Precup, D. “A formal frame-work for robot learning and control under model uncer-tainty”. IEEE International Conference on Robotics andAutomation (ICRA). 2007.Kaelbling, L. P., Littman, M. L. and Cassandra, A. R.“Planning and Acting in Partially Observable StochasticDomains”. Artificial Intelligence. 101. pp.99-134. 1998.Montemerlo, M., Roy N. and Thrun, S. “Perspec-tives on Standardization in Mobile Robot Programming:The Carnegie Mellon Navigation (CARMEN) Toolkit”.IEEE/RSJ International Conference on Intelligent Robotsand Systems (IROS). 2003.Pineau, J., Gordon G. and Thrun, S.. “Anytime Point-BasedApproximations for Large POMDPs”. Journal of ArtificialIntelligence Research (JAIR). 27. pp. 335-380.Pineau, J. and Gordon, G. “POMDP Planning for RobustRobot Control”. International Symposium on Robotics Re-search (ISRR). 2005.Williams, J.D. and Young, S. “Characterizing task-orienteddialog using a simulated ASR channel. International Con-ference on Speech and Language Processing (ICSLP).2004.Wobbrock, J.O. and Myers, B.A. “Trackball text entry forpeople with motor impairments”. ACM Conference on Hu-man Factors in Computing Systems (CHI). 2006.

smartwheeler: a robotic wheelchair test-bed for ......wheelchair, s/he may require substantial...

Documents