skinput technology : "turns body into touch screen interface"

Skin Put: “Turns body into touchscreen interface”

Purushottam SwamiStudent, BCA (V semester)

School of Computer and System SciencesJaipur National University

Jaipur, [email protected]

Abstract—Touch has become a standardinput method for smart phones over thelast few years. Additionally, capacitivetouch screens have found their way intomany other everyday devices, such astablets, laptops, gaming devices and evenglass-ceramic stoves. However, all ofthese application scenarios are limitedto flat screens. In this report, we willinvestigate three different approachesthat go beyond the scope of traditionaltouch screens. Using the 3rd dimension,they create entirely new applicationscenarios. We present a technology thatappropriates the human body for acoustictransmission, allowing the skin to beused as an input surface. In particular,we resolve the location of finger taps onthe arm and hand by analyzing mechanicalvibrations that propagate through thebody. We collect these signals using anovel array of sensors worn as anarmband. This approach provides an alwaysavailable, naturally portable, and on-body finger input system.

Keywords:- On-demand interfaces, fingertracking, on-body computing, appropriatedsurfaces, object classification,biosensing, bio-acoustics, multitouch,stereoscopic display, 3d user interfaces.

I. INTRODUCTION A touch screen is an

electronic visual display that enables auser to interact directly with informationby touching areas on the display. Thetouching can occur with a finger/hand orobjects such as stylus pens. Many of thefirst touch screens required a more activeobject such as a light pen to workproperly, however contemporary ones aredesigned with only the need for touch froma finger. Although touch screens have beenaround for almost 40 years, the recentexplosion in popularity and development isoften attributed to the 2007 launch ofApple's iPhone. The success of the iPhoneproved that touch screens could bedeveloped elegantly, affordably, and bereceived with massive public excitement.This success was a catalyst for growth andinvestment in the application anddeployment of touch screen technologies.Global shipments of touch screen displaymodules are expected to more than doublefrom 2008 to 2012, and investments in thedevelopment of new touch screentechnologies are also expected to increasesignificantly. The touch screen hasevolved in reaction to computingapplications that require this mediumto be present. The form factor of digitaldevices has drastically changed in the

last few years. We now carry with usmulti-tasking microcomputers that haveevolved from cell phones. The new useprotocols that have turned phones intomobile computers, rather than just voicecommunication tools, presented a need toevolve the hardware. Special issues, whichare caused by the shrinking size of thesedevices, have forced the deviceapplications to become context aware. Itsimply Randy Lessen! Touches c re en 1isn't efficient or sustainable to havefull sized keyboards or other intermediarydevices as the primary input for thesemicrocomputers. Touch screens solve thisproblem by allowing the display to act asthe primary method of inputs, and thusovercomes the limitations of input flowthough digitally creating context basedinterfaces. The most important aspect oftouch screens is in its contributions tothe evolution of the user interface. In anage dominated by information, improvedmethods of interacting with informationare both coveted and necessary. One of thegreatest advancements in improvedinformation interaction in the digital agewas the advent of hypertext. Hypertextallows for the ability to cut throughlinear connection, which allows the readerto participate more actively withinformation and connect with it in moremeaningful ways. It demands an activereader by blurring the distinction betweenauthor and reader. We now expect the samenon-linear approach of hypertext to beavailable in any devices that we use tointeract with information. The touchscreen adds this non-linear navigation tothe user interface, through a physicalmethod of direct control of the userinterface.

II. RELATED WORKAlways-Available Input:- Theprimary goal of Skin put is to providean always- available mobile input

system – that is, an input system thatdoes not require a user to carry orpick up a device. A number ofalternative approaches have beenproposed that operate in this space.Techniques based on computer vision arepopular. These, however, arecomputationally expensive and errorprone in mobile scenarios is a logicalchoice for always-available input, butis limited in its precision inunpredictable acoustic environments,and suffers from privacy andscalability issues in sharedenvironments. Other approaches have taken the formof wearable computing. This typicallyinvolves a physical input device builtin a form considered to be part ofone’s clothing. For example, glove-based input systems allow users toretain most of their natural handmovements, but are cumbersome,uncomfortable, and disruptive totactile sensation. Post and Oathpresent a “smart fabric” system thatembeds sensors and conductors intofabric, but taking this approach toalways-available input necessitatesembedding technology in all clothing,which would be pro- hibitively complexand expensive.

The SixthSense project proposes amobile, always- available input/outputcapability by combining projectedinformation with a color-marker-basedvision tracking system. This approachis feasible, but suffers from seriousoc- clusion and accuracy limitations.

For example, determining whether, e.g.,a finger has tapped a button, or ismerely ho- vering above it, isextraordinarily difficult. In thepresent work, we briefly explore thecombination of on-body sensing withon-body projection.

Bio-Sensing:- Skin put leveragesthe natural acoustic conductionproperties of the human body to providean input system, and is thus related toprevious work in the use of biologicalsignals for computer input. Signalstraditionally used for diagnosticmedicine, such as heart rate and skinresistance, have been appropriated forassessing a user’s emotional state .These features are generallysubconsciously- driven and cannot becontrolled with sufficient precisionfor direct input. Similarly, brainsensing technologies such aselectroencephalography (EEG) andfunctional near-infrared spectroscopy(fNIR) have been used by HCIresearchers to assess cognitive andemotional state this work alsoprimarily looked at involuntarysignals. In con- trast, brain signalshave been harnessed as a direct inputfor use by paralyzed patients, butdirect brain- computer interfaces(BCIs) still lack the bandwidthrequired for everyday computing tasks,and require levels of focus, training,and concentration that are incompatiblewith typi cal computer interaction. There has been less workrelating to the intersection of fin-ger input and biological signals.Researchers have harnessed theelectrical signals generated by muscleactivation during normal hand movementthrough electromyography (EMG). Atpresent, however, this approachtypically requires expensive

amplification systems and theapplication of conductive gel foreffective signal acquisition, whichwould limit the acceptability of thisapproach for most users. The input technology mostrelated to our own is that of Amen toet al., 1 who placed contactmicrophones on a user’s wrist to assessfinger movement. However, this work wasnever formally evaluated and isconstrained to finger motions in onehand. The Hambone system4 employs asimilar approach using piezoelectricsensors, yielding classificationaccuracies around 90% for four gestures(e.g., raise heels, snap fingers).Performance of false positive rejectionremains untested in both systems. Finally, bone conductionmicrophones and headphones—now commonconsumer technologies— represent anadditional bio-sensing technology thatis relevant to the present work. Theseleverage the fact that soundfrequencies relevant to human speechpropagate well through bone. Boneconduction microphones are typicallyworn near the ear, where they can sensevibrations propagating from the mouthand larynx during speech. Boneconduction headphones send soundthrough the bones of the skull and jawdirectly to the inner ear, bypassingloss transmission of sound through theair and outer ear. The mechanicallyconductive properties of human bonesare also employed by Zhong et al.27 fortransmitting information through thebody, such as from an implanted deviceto an external receiver.

Acoustic input:- Our approach isalso inspired by systems that leverageacoustic transmission through (non-body) input surfaces. Paradiso et al.18measured the arrival time of a sound at

mul- tiple sensors to locate hand tapson a glass window. Ishii et al.8 use asimilar approach to localize a ballhitting a table, for computeraugmentation of a real-world game. Bothof these systems use acoustic time-of-flight for localization, which weexplored, but found to beinsufficiently robust on the humanbody, leading to the fingerprintingapproach described in this paper.

III. SKINPUT To expand the range of sensing

modalities for always- available inputsystems, we developed Skinput, a novelinput technique that allows the skin to beused as a finger input surface, muchlike a touchscreen. In our proto- typesystem, we choose to focus on the arm,although the technique could be appliedelsewhere. This is an attrac- tive area toappropriate as it provides considerablesurface area for interaction, including acontiguous and flat area for projection(discussed subsequently). Furthermore, theforearm and hands contain a complexassemblage of bones that increasesacoustic distinctiveness of differentlocations. To capture this acousticinformation, we developed a wearablearmband that is non-invasive and easilyremovable (Figure 2).

Bio-acoustics:-When a finger taps the skin, several

distinct forms of acoustic energy areproduced. Some energy is radiated into theair as sound waves; this energy is notcaptured by the Skinput system. Among theacoustic energy transmitted through thearm, the most readily visible are trans-verse waves, created by the displacementof the skin from a finger impact (Figure3). When shot with a high-speed camera,these appear as ripples, which propagateoutward from the point of contact (like a

pebble into a pond). The amplitude ofthese ripples is correlated to the tappingforce and the volume and compliance ofsoft tissues under the impact area. Ingeneral, tapping on soft regions of thearm creates higher-amplitude transversewaves than tapping on boney areas (e.g.,wrist, palm, fingers), which havenegligible compliance.

In addition to the energy thatpropagates on the surface of the arm, someenergy is transmitted inward, toward theskeleton (Figure 4). These longitudinal(compressive)

waves travel through the soft tissues ofthe arm, exciting the bone, which is muchless deformable than the soft tissue butcan respond to mechanical excitation byrotating and translat- ing as a rigidbody. This excitation vibrates softtissues sur- rounding the entire length ofthe bone, resulting in new longitudinalwaves that propagate outward to the skin.

We highlight these two separateforms of conduction— transverse wavesmoving directly along the arm surface, andlongitudinal waves moving into and out ofthe bone through soft tissues—becausethese mechanisms carry energy at differentfrequencies and over different dis-tances. Roughly speaking, higherfrequencies propagate more readily throughbone than through soft tissue, and boneconduction carries energy over largerdistances than soft tissue conduction.While we do not explicitly model thespecific mechanisms of conduction, ordepend on these mechanisms for ouranalysis, we do believe the success of ourtechnique depends on the complex acousticpatterns that result from mixtures ofthese modalities.

Similarly, we also hypothesize thatjoints play an important role in makingtapped locations acoustically distinct.Bones are held together by ligaments, andjoints often include additional biologicalstructures such as fluid cavi- ties. Thismakes joints behave as acoustic filters.In some cases, these may simply dampenacoustics; in other cases, these willselectively attenuate specificfrequencies, creating location-specificacoustic signatures. Finally, musclecontraction may also contribute to thevibration patterns recorded by oursensors,14 including both contractionrelated to posture maintenance andreflexive muscle move- ments in responseto input taps.

Armband prototype:-Our initial hardware prototype

employed an array of tuned mechanicalvibration sensors; specifically small,cantile- vered piezoelectric films(MiniSense100, Measurement Specialties,Inc.). By adding small weights to the endof the cantilever, we were able to alterthe resonant frequency, allowing eachsensing element to be responsive to aunique, narrow, low-frequency band of theacoustic spectrum. Each element wasaligned with a particular frequency pilotstudy shown to be useful in characterizingbio-acoustic input. These sensing elementswere packaged into 2 groups of 5–10sensors in total.

A Mackie Onyx 1200F audio interface wasused to digitally capture data from the 10sensors. Each channel was sam- pled at 5.5kHz, a sampling rate that would beconsidered too low for speech orenvironmental audio, but was able torepresent the relevant spectrum offrequencies transmitted through the arm.This reduced sample rate (and conse-quently low processing bandwidth) makes

our technique readily portable to embeddedprocessors. For example, the ATmega168processor employed by the Arduino platformcan sample analog readings at 77 kHz withno loss of precision, and couldtherefore provide the full sampling powerrequired for Skinput (55 kHz in total).

Processing:-The audio stream was segmented into

individual taps using an absoluteexponential average of all sensor chan-nels (Figure 5, red waveform). When anintensity threshold was exceeded (Figure5, upper blue line), the program recordedthe timestamp as a potential start of atap. If the intensity did not fall below asecond, independent “clos- ing” threshold(Figure 5, lower purple line) between 100and 700 ms after the onset crossing (aduration we found to be the common forfinger impacts), the event was dis-carded. If start and end crossings weredetected that satis- fied these criteria,the acoustic data in that period (plus a60 ms buffer on either end) was consideredan input event (Figure 5, vertical greenregions). Although simple, this heuristicproved to be robust.

After an input has been segmented, thewaveforms are analyzed. We employ a bruteforce machine learning approach, computing186 features in total, many of which are

derived combinatorially. For grossinformation, we include the averageamplitude, standard deviation and total(absolute) energy of the waveforms in eachchannel (30 features). From these, wecalculate all average amplitude ratiosbetween channel pairs (45 features). Wealso include an average of these ratios (1feature). We calculate a 256-point FFT forall 10 channels, although only the lower10 values are used (representing theacoustic power from 0 to 193 Hz), yielding100 features. These are normalized by thehighest- amplitude FFT value found on anychannel. We also include the center ofmass of the power spectrum within the same0–193 Hz range for each channel, a roughestimation of the fundamental frequency ofthe signal displacing each sensor (10features). Subsequent feature selectionestablished the all-pairs amplitude ratiosand certain bands of the FFT to be themost predictive features.

These 186 features are passed to asupport vector machine (SVM) classifier. Afull description of SVMs is beyond thescope of this paper. Our software uses theimplementation provided in the Wekamachine learning toolkit.26 It should benoted, however, that other, moresophisticated classification techniquesand features could be employed. Thus, theresults presented in this paper should beconsidered a baseline.

Before the SVM can classify inputinstances, it must first be trained to theuser and the sensor position. This stagerequires the collection of severalexamples for each input location ofinterest. When using Skinput to recognizelive input, the same 186 acoustic featuresare computed on- the-fly for eachsegmented input. These are fed into thetrained SVM for classification. We use anevent model in our software—once an inputis classified, an event associated withthat location is instantiated. Any

interactive features bound to that eventare fired.

IV. SUPPLEMENTAL EXRERIMENTS

We conducted a series of

smaller, targeted experiments

to explore the feasibility of ourapproach for other appli- cations. In thefirst additional experiment, which testedperformance of the system while userswalked and jogged, we recruited 1 male

(age 23) and 1 female (age 26) for a sin-gle-purpose experiment. For the rest ofthe experiments, we recruited 7 newparticipants (3 female, mean age 26.9)from within our institution. In all cases,the sensor armband was placed just belowthe elbow. Similar to the previ- ousexperiment, each additional experimentconsisted of a training phase, whereparticipants provided between 10 and 20examples for each input type, and atesting phase, in which participants wereprompted to provide a particular input(10 times per input type). As before,input order was randomized; segmentationand classification were performed in realtime.

Walking and jogging:-With sensors coupled to the

body, noise created during other motionsis particularly troublesome, and walkingand jogging represent perhaps the mostcommon types of whole-body motion. Thisexperiment explored the accuracy of oursystem in these scenarios.

Each participant trained and tested thesystem while walking and jogging on atreadmill. Three input locations were usedto evaluate accuracy: arm, wrist, andpalm. Additionally, the rate of falsepositives (i.e., the system believed therewas input when in fact there was not) andtrue positives (i.e., the system was ableto correctly segment an intended input)was captured. The testing phase tookroughly 3 min to complete (four trials intotal: two participants, two conditions).The male walked at 2.3 mph and jogged at4.3 mph; the female at 1.9 and 3.1 mph,respectively.

In both walking trials, the systemnever produced a false- positive input.Meanwhile, true positive accuracy was100%. Classification accuracy for theinputs (e.g., a wrist tap was recognizedas a wrist tap) was 100% for the male and86.7% for the female.

In the jogging trials, the system hadfour false-positive input events (two perparticipant) over 6 min of continu- ousjogging. True-positive accuracy, as withwalking, was 100%. Considering thatjogging is perhaps the hardest inputfiltering and segmentation test, we viewthis result as extremely positive.Classification accuracy, however,decreased to 83.3% and 60.0% for the maleand female participants, respectively.

Although the noise generated from thejogging almost certainly degraded thesignal (and in turn, lowered clas-sification accuracy), we believe the chiefcause for this decrease was the quality ofthe training data. Participants onlyprovided 10 examples for each of 3 testedinput locations. Furthermore, thetraining examples were collected whileparticipants were jogging. Thus, theresulting training data was not onlyhighly variable, but also sparse—neitherof which is conducive to accurate machinelearning classification. We believe thatmore rigorous collection of training datacould yield even stronger results.

Single-handed gestures:-In the experiments discussed

thus far, we considered only bimanualgestures, where the sensor-free arm, andin particular the fingers, are used toprovide input. However, there are a rangeof gestures that can be performed withjust the fingers of one hand. This was thefocus of Amento et al.,1 although thiswork did not evaluate classificationaccuracy. We conducted three independenttests to explore one- handed gestures. Thefirst had participants tap their index,middle, ring and pinky fingers againsttheir thumb (akin to a pinching gesture)10 times each. Our system was able toidentify the four input types with anoverall accuracy of 89.6% (SD = 5.1%). Weran an identical experiment using flicksinstead of taps (i.e., using the thumb as

a catch, then rapidly flicking the fingersforward). This yielded an impressive 96.8%(SD = 3.1%) accuracy in the testing phase.

This motivated us to run a third andindependent experiment that combinedtaps and flicks into a single gesture set.Participants retrained the system, andcompleted an independent testing round.Even with eight input classes in veryclose spatial proximity, the system wasable to achieve 87.3% (SD = 4.8%)accuracy. This result is comparable to theaforementioned 10-location forearmexperiment (which achieved 81.5%accuracy), lending credence to the pos-sibility of having 10 or more functions onthe hand alone. Furthermore,proprioception of our fingers on a singlehand is quite accurate, suggesting amechanism for high- accuracy, eyes-freeinput.

Segmenting finger input:-A pragmatic concern regarding

the appropriation of fingertips forinput was that other routine tasks wouldgenerate false positives. For example,typing on a keyboard strikes the fingertips in a very similar manner to thefinger-tip input we proposed previously.Thus, we set out to explore whetherfinger-to-finger input soundedsufficiently distinct such that otheractions could be disregarded.

As an initial assessment, we askedparticipants to tap their index finger 20times with a finger on their other hand,and 20 times on the surface of a table infront of them. This data was used to trainour classifier. This training phase wasfollowed by a testing phase, which yieldeda participant-wide average accuracy of94.3% (SD = 4.5%, chance = 50%).

I. EXAMPLE INTERFACES AND INTERACTIONS

We conceived and built severalprototype interfaces that demonstrate our

ability to appropriate the human body, inthis case the arm, and use it as aninteractive surface.

While the bio-acoustic input modalityis not strictly tethered to a particularoutput modality, we believe the sensorform factors we explored could be readilycoupled with visual output provided by anintegrated pico-projector. There are twonice properties of wearing such aprojection device on the arm that permitus to sidestep many calibration issues.First, the arm is a relatively rigidstructure—the projector, when attachedappropriately, will naturally track withthe arm. Second, since we have fine-grained control of the arm, making minuteadjust- ments to align the projected imagewith the arm is trivial (e.g., projectedhorizontal stripes for alignment with thewrist and elbow).

To illustrate the utility of couplingprojection and finger input on the body(as researchers have proposed to do withprojection and computer vision-based tech-niques16), we developed four proof-of-concept projected interfaces built on oursystem’s live input classification. In thefirst interface, we project a series ofbuttons onto the forearm, on which a usercan tap to navigate a hierar- chical menu(Figure 1). In the second interface, weproject a scrolling menu (Figure 2),which a user can navigate by tapping atthe top or bottom to scroll up and downone item. Tapping on the selected itemactivates it. In a third interface, weproject a numeric keypad on a user’s palmand allow them to, for example, dial aphone number (Figure 1). Finally, as atrue test of real-time control, we portedTetris to the hand, with controls bound todifferent fingertips.

II. FUTURE WORK

In order to assess the real-worldpracticality of Skinput, we are currently

building a successor to our prototype thatwill incorporate several additionalsensors, particularly electrical sensors(allowing us to sense the muscle activityassociated with finger movement, as per21)and inertial sensors (accelerometers andgyro- scopes). In addition to expandingthe gesture vocabu- lary beyond taps, weexpect this sensor fusion to allowconsiderably more accuracy—and morerobustness to false positives—than eachsensor alone. This revi- sion of ourprototype will also allow us to benefitfrom anecdotal lessons learned sincebuilding our first prototype: inparticular, early experiments with subse-quent prototypes suggest that the hardwarefiltering we describe above (weighting ourcantilevered sensors to create amechanical band-pass filter) can beeffectively replicated in software,allowing us to replace our rela- tivelylarge piezoelectric sensors with micro-machined accelerometers. This considerablyreduces the size and electrical complexityof our armband. Furthermore, anecdotalevidence has also suggested that vibrationfrequency ranges as high as severalkilohertz may contribute to tapclassification, further motivating the useof broadband accelerometers. Finally, ourmulti-sensor armband will be wireless,allowing us to explore a wide variety ofusage scenarios, as well as our generalasser- tion that always-available inputwill inspire radically new computingparadigms.

III. CONCLUSIONIn this paper, we presented our approachto appropriat- ing the human body as aninteractive surface. We have described anovel, wearable, bio-acoustic sensingapproach that can detect and localizefinger taps on the forearm and hand.Results from our experiments have shownthat our system performs well for aseries of gestures, even when the body

is in motion. We conclude withdescriptions of several prototypeapplications that graze the tip of therich design space we believe Skinputenables.

REFERENCES

1. Chris Harrison, Desney Tan, DanMorris. "Skinput: Appropriatingthe Body as an Input Surface."CHI 2010, April 10-15, 2010,Atlanta, Georgia, USA

2. Thomas Hahn University ofReykjavik [email protected] 26, 201

3. Chris Harrison, Desney Tan, DanMorris 1Human-ComputerInteraction Institute CarnegieMellon University 5000 ForbesAvenue, Pittsburgh, PA [email protected]

4. Sara Kilcher MSc Computer ScienceStudent Distributed Systems GroupETH Zrich Raemistrasse 101 8092Zurich [email protected]

5. Bertrand David1, Yun Zhou1, TaoXu1, René Chalon1 1 Université deLyon, CNRS, Ecole Centrale deLyon, LIRIS, UMR5205, 36 avenueGuy de Collongue, F-69134 EcullyCedex, France Contact :[email protected]

6. R.BHAVANA ([email protected])(7416180210)

7. Chris Harrison1,2, Desney Tan2,Dan Morris2 1Human-ComputerInteraction Institute CarnegieMellon University 5000 ForbesAvenue, Pittsburgh, PA [email protected]

mailto:[email protected]



skinput technology : "turns body into touch screen interface"

Documents