noisy vibrational pairing of iot devices

Noisy Vibrational Pairing of IoT DevicesS Abhishek Anand and Nitesh Saxena

Abstract—Internet of Things (IoT) is embodied by smart network-enabled devices that utilize computing power, networking,

and miniaturization to enable richer and improved user experience. Due to their interconnectedness, ubiquitous nature and low

computational power, trustworthy and secure communication between IoT devices has become a security concern. To authenticate

the devices, “pairing” may be secured by the use of an auxiliary channel such as audio, visual and vibrations for sharing the key

or keying material between the IoT devices. In this paper, we evaluate the security of vibration channel, susceptible to an acoustic

eavesdropper, that can capture audio leakage from the vibrations of the transmitting IoT device. We propose a noisy vibration

scheme for cloaking vibration sounds during pairing against such attacks. The scheme only requires a speaker for emitting the masking

sound during key transmission. We evaluate the scheme in proximity, co-located and remote settings with an eavesdropping attacker.

We also study motion sensor exploits against this scheme and compliment it with additional measures to mask vibration effects on

motion sensors. Our scheme is user transparent and requires only a speaker (that may already be present on the device), so it can be

readily implemented in the IoTsetting, smart wearables, and other commodity gadgets.

Index Terms—Pairing, side channel, IoT, signal masking, VoIP

Ç

1 INTRODUCTION

INTERNET of Things as defined by Global StandardsInitiative on Internet of Things is “a global infrastructure

for the information society, enabling advanced servicesby interconnecting (physical and virtual) things based onexisting and evolving interoperable information and com-munication technologies”[3]. Examples of household IoTdevices include the Nest smoke detector and thermostats,smart thermometers from Kinsa, smart bluetooth or wifibulbs for indoor lightening, smart security sensors includ-ing motion sensors etc.1 A forecast by Gartner, Inc. statesthat 8.4 billion connected things will be in use worldwide in2017 and this number will rise up towards 20.4 billion by2020 [4]. In addition, consumer based IoT devices constitute63 percent of total IoT applications in 2017. Such dynamicgrowth points towards the significance of these class ofdevices in the future end-user computing infrastructure andservices.

In an IoT network, the devices usually communicate witha central hub, usually a smartphone, or among each otherthrough WiFi, Bluetooth, RFID or long range cloud-basedinteractions. These communication channels are inherentlyinsecure i.e., they can be easily manipulated or eaves-dropped upon and therefore present a fundamental chal-lenge of securing such transmission pathways. “Pairing” iscommonly referred to the operation of bootstrapping secure

communication between two such devices such that thecommunication between them is resistant to eavesdrop-ping and manipulation (man-in-the-middle) attacks. Pair-ing in the context of IoT devices is a hard challenge due tolack of a global infrastructure that can enable devices toshare an on- or off-line trusted third party, a certificationauthority, a public key infrastructure (PKI) or any pre-configured secrets.

An existing approach to pairing is to leverage an auxiliarychannel, also referred to as an out-of-bound (OOB) channel,that can be controlled by the users operating the devices. OOBchannels, unlike wifi or bluetooth, are human-perceptible thatindicates perception by one or more of the human senses.Some examples of such OOB channels are audio, vibrationand visual medium. In these channels, the user is able to con-firm the origin of the transmitted message and can detect anymanipulation attempts by an adversary. It may not be ableto prevent the adversary from eavesdropping on the OOBchannel. Pairing using OOB channels has been referred to asauthenticated-OOB (A-OOB) [5]. It has been the basis of anumber of protocols proposed in literature as surveyed in [6].

Pairing becomes a challenging problem when one of thedevices involved in the process falls under the category ofconstrained devices. A device is considered to be a constraineddevice when it lacks a good quality output interface (such asa full screen display) or an input interface (e.g., a keyboard)or receivers (e.g., cameras, microphone). Many IoT devices(like smart home appliances) fall into this category. A-OOBpairing for such devices is difficult due to the fact that estab-lishing bidirectional and automated A-OOB channels onsuch devices is challenging, in general. In addition, manualmechanisms for pairing constrained IoT devices can beprone to human errors [6] that could potentially lead toman-in-the-middle attack.

An alternative approach for pairing geared especiallytowards constrained devices, involves using a secret as well

1. A list of IoT devices currently on market is available at http://iotlist.co

� The authors are with the Department of Computer and Information Sciences,University of Alabama at Birmingham, Birmingham, Alabama 35294.E-mail: {anandab, saxena}@uab.edu.

Manuscript received 31 July 2017; revised 11 Mar. 2018; accepted 27 Mar.2018. Date of publication 1 Oct. 2018; date of current version 10 May 2019.(Corresponding author: S Abhishek Anand.)For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TDSC.2018.2873372

530 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 16, NO. 3, MAY/JUNE 2019

1545-5971� 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See ht _tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Authorized licensed use limited to: UNIV OF ALABAMA-BIRMINGHAM. Downloaded on October 22,2020 at 18:41:44 UTC from IEEE Xplore. Restrictions apply.

https://orcid.org/0000-0002-6168-7649

https://orcid.org/0000-0002-6168-7649

https://orcid.org/0000-0002-6168-7649

https://orcid.org/0000-0002-6168-7649

https://orcid.org/0000-0002-6168-7649

mailto:

as authenticated OOB channel (AS-OOB in [5]). Since thechannel is supposed to be authenticated as well as secret,it is assumed that an adversary would be unable to detector manipulate the transmission over this channel betweenthe two devices. In an AS-OOB channel, pairing can beachieved by simply transmitting the key or the keyingmaterial over the channel and it is also devoid of any humanerrors. In case of a low bandwidth channel, a short PIN orpassword can be transmitted and a password-based authen-ticated key agreement [7] protocol can be used for pairing.

Some examples of pairing protocols that use AS-OOBchannel are [8] and [9]. The IMD pairing scheme in [8] usesa low frequency audio channel to pair an RFID tag (attachedto an implanted medical device) with an authorized RFIDtag reader. The PIN-Vibra method [9] uses vibration channelto transmit the key/keyingmaterial from the transmitter (e.g.,the IoT hub smartphone device) to the receiving device (e.g.,an IoT appliance). In this scheme, the phone generates a shortPIN, encodes it as anON-OFF vibration scheme and transmitsit as vibrations (generated by a miniature vibration motor)to the receiving device. The receiving device can read thevibrations using an MEMS accelerometer followed by adecoding process based on the ON-OFF scheme.

While both the pairing schemes detailed above, offer analternative approach to pairing, they have been shown to besusceptible to eavesdropping attack. In the case of using alow frequency audio channel for transmitting PIN or shortpassword, it was discovered that the audio channel can beeavesdropped upon by an adversary that can learn thetransmitted communication [5]. In this work, we show thatsimilar eavesdropping attack can also be detrimental to avibration based AS-OOB channel pairing protocol that canlead to leakage of the transmitted PIN or short password.Furthermore, we also demonstrate that it is still possible toeavesdrop on the vibrations during the pairing processeven when the vibration sounds are cloaked within a bandlimited white noise based masking signal. We provide a via-ble countermeasure leading to a secure “noisy” vibrationalpairing scheme, well-suited for securely connecting the IoThub smartphone device with other IoT devices.

1.1 Our Contributions

In our work, we perform an investigation of pairing protocolsfor IoT scenarios based on vibration channel.

� We recreate the Pin-Vibra [9] protocol to implementpairing between an IoT hub (a smartphone) and anIoT device. We show that this protocol is vulnerable against an eavesdropping adversary that canexploit the acoustic leakage resulting from vibrations(“standard attackmodel–proximity attacker”). To coun-teract this attack, we introduce noisy vibration basedpairing where the acoustic leakage from vibrations arehidden by amasking signal (noise signal).

� We further show thatwhile noisy vibration basedpair-ingmay be able towithstand “standard attackmodel”,an “advanced attack model–colocated adversary” canrelatively easily defeat noisy vibration pairing. Toprevent such an attack, we suggest the addition of lowfrequency tones to the masking signal that would hidethe acoustic leakage at low frequencies preventing the

co-located adversary from learning any informationabout the vibrations.

� We enhance the proposed defense against a co-locatedmotion sensor exploiting adversary by injecting fakereadings into the actual sensor readings. We extendthe notion used in [10], [11] in our proposed defenseby using it to mask the vibration effect on accelerome-ter on the transmitting device thereby mitigating anadversary that may exploit the accelerometer on thetransmitting device.

� We propose a novel eavesdropping scenario where astandard voice call or voice over internet protocol(VoIP) applications are exploited to spy upon theacoustic leakage from vibrations remotely. We showthat it is possible for such an attack to decode thetransferred keying information (during pairing) byexploiting the vibrations sounds, recorded overthe call, in a similar manner to previous acousticeavesdropping exploits against vibration pairing.

1.2 Outline of the Paper

In Section 2, we detail the existing work on pairingconstrained devices using an A-OOB channel. Section 3demonstrates the protocol used in vibration based pairingmechanism in the context of constrained devices. Section 4showcases the vulnerability of vibration based pairingprotocol against a proximity attacker and Section 5 proposesnoisy vibration pairing as a possible solution. Section 6analyzes white noise defense against a proximity attack andSection 7 analyzes it against a co-located adversary. Theremote eavesdropping attack is introduced in Section 8 andcontains evalaution of the attack against noisy defensemechanism. Section 9 summarizes and discusses the resultsand Section 10 has the take-home message.

2 RELATED WORK: PAIRING FOR CONSTRAINED

DEVICES USING OOB CHANNEL

The usage of OOB channel for pairing and bootstrappingsecurity has been proposed in existing literature. Balfanz et al.[12] proposed the use of a location limited channel (based onphysical contact between the devices) in a pre-authenticationstep where the pairing devices can exchange pairing informa-tion. This data can later be used for subsequent authenticationof the devices on wireless or Bluetooth channels. The locationlimited channel could be audio, infrared, visual or contactchannel. Goodrich et al. [13] developed an audio based OOBchannel for secure device association called “Loud andClear”. It used a text-to-speech (TTS) engine for convertingan English sentence (derived from device’s public key) tospeech and display/render the same sentence on the receiv-ing device.

Another audio based pairing approach was taken byHalperin et al. [8] for wireless implantable medical devices(IMD), such as pacemakers and implantable cardiac defibril-lator (ICD). They showed that prior communication protocols,used by IMDs to communicate wirelessly with an externalprogramming entity, are susceptible to various radio-basedattacks. They proposed zero-power defenses involvingan RFID tag attached to the IMD. A secure communicationchannel is established between the IMD and an external

ANANDAND SAXENA: NOISY VIBRATIONAL PAIRING OF IOT DEVICES 531


reader by having a small piezo element attached to the RFIDtag on IMD. The piezo element transmits a random key over alow frequency audio channel that can be recorded anddecoded by an external reader.

Seeing-is-Believing is a pairing protocol proposed byMcCune et al. [14] that is based on a visual channel. In thisprotocol, a camera is used to take a snapshot of a barcodethat encodes the cryptographic material. The barcodes canbe pre-configured and attached as labels on devices or theycan be produced on demand and displayed on the device’sdisplay unit. Saxena et al. [15] extended this work to thecontext of constrained devices especially devices with lim-ited form of display, such as a single light emitting diode(LED). Since most of the constrained devices may alreadyhave a single light sensor and the visual channel for thelight sensor can be verified by a human user, this scheme isboth cost effective and secured by user perception.

Kim et al. [16] proposed a vibration based side channelfor securing communication between an external device(a reader or a smartphone) and a medical device. The pro-posed technique is based on an OFF-ON keying demodula-tion scheme for exchanging a shared cryptographic key forimplantable and wearable medical device (IWMD). They alsoperformed a security analysis of the proposed scheme againstan acoustic eavesdropping adversary at a distance of30meters. They concluded that such an adversarywas unableto demodulate the recorded waveform into the shared key inthe presence of a strongmasking signal.

Several other pairing protocols based on bidirectionaldevice-to-device (dtd) A-OOB (e.g., [6]) require both devicesto have transmitters and corresponding receivers (e.g., IRtransceivers), which may not exist on constrained devices. Insettings, where dtd channel(s) do not exist (i.e., when at leastone device does not have a receiver), pairing methods can bebased upon device-to-human (dth) and human-to-device(htd) channel(s) instead (e.g., based on transfer of numbers[17]). However, establishing such channels on constraineddevicesmay also not be feasible.

3 VIBRATION BASED PAIRING FOR IOT DEVICES

Pairing between devices often involves a shared secret thatallows the communicating devices to authenticate themselvesand establish a trusted and secure communication channel.For example, pairing between an IoT device and an IoT hubcould be achieved by the IoT hub generating a short key orkeying material and sending it to the receiving IoT deviceover an OOB channel. The receiving IoT device can decodethe key/keying material depending upon the nature of theOOB channel and utilize the receiveddata for further securingthe main communication channel. The security of producinga shared secret as such, rests upon the premise that the associ-ated OOB channel is inherently secure against eavesdroppingandmanipulating attacks.

Vibration based Pairing Protocol. In a vibration based pairingscheme, the transmitting IoT device encodes the keyingmaterial (a short passphrase or PIN) into vibrations through avibrationmotor.

The transmitting device is placed in contact with thereceiving device prior to the pairing process. The receivingdevice (also a smart device) contains MEMS accelerometerthat is capable of recording the vibrations conducted through

the transmitting device. The accelerometer readings afterrecording the vibrations can be decoded to obtain the trans-mitted data.

A common encoding technique for encoding a bit stringinto vibrations that can be transmitted to another device is atime based ON-OFF encoding mechanism. Under thisscheme, each bit is encoded as a vibration for a fixed timeinterval (t) if it is “1” and every “0” bit is encoded as sametime period (t) of stillness or no vibration. In Vibrate-to-Unlock scheme [9], the time interval t is fixed as 200 ms. Inorder to detect a valid transmission, a header is attached tothe transmitted data which is fixed as “110”. As per thescheme, a 4-bit PIN is generated by the transmitting deviceand is converted to its 14 bit binary equivalent string. Prior totransmission, the preamble (as described before) is attachedto the string increasing its length to 17 bits. Since time dura-tion for every bit is 200 ms, the entire transmission takes17� 200 ms ¼ 3:4 seconds.

4 PROXIMITY ATTACK ON VIBRATION PAIRING

The pairing scheme described in Section 3 has been shownsusceptible to an acoustic eavesdropping attack in [18]. Inthis attack, the attacker exploits acoustic emanations thatare generated during the process of pairing as a result of thevibrations of the transmitting device. In this attack model,we assume that the pairing protocol being used is known tothe attacker. This knowledge includes the bit length of thetransmitted string, the scheme used for encoding vibrationsand the preamble attached to the bit string that denotes thebeginning of the transmission.

The proximity attack model assumes an attacker eaves-dropping on vibration based pairing from a nearby location.The distance between the attacker and the pairing devicesdepends on the loudness of vibrations generated by thetransmitting device and on the ability of the eavesdroppingattacker to record those acoustic emanations. The attackerdoes not have access to the microphone (if present) on anyof the devices involved in the pairing effort. The attacker orthe listening device, being at a distance, can be assumed tounnoticeable by the victim/s.

In order to highlight the threat of the attack, the attackeruses off the shelf recording devices such as PC microphonesor microphone/s in his own smartphone. These devices arelow cost and ubiquitous in nature and hence make theattack simple to launch. A depiction of proximity attackmodel can be found in Fig. 1a. We also assume that the envi-ronment is devoid of any intentional background noise thatmay interfere with the recording capability of the attacker.

4.1 Attack Experiment under Proximity AttackModel

Equipment. We used Motorola Droid X2 phones as pairingdevices where one of the phones acted as a transmitter whilethe other phone acted as a receiver. Both phones wereequipped with vibration motor and accelerometer sensor.To record the acoustic leakage from vibrations, we usedDynex USBMIC13 PC microphone [19] with a frequencyresponse of 150 Hz-10 kHz and Audacity application. Toprocess the captured audio,we usedMatlab signal processingtoolbox.



Experiment. On examining the spectrum in Fig. 1b, theacoustic leakage from the vibrations shows a dominantresponse in the frequency band 3.5 kHz to 8.3 kHz. Theintensity of the signal in the spectrum (an indication ofenergy in the signal) seems to concentrated from 6.8 kHz to7.8 kHz. To decode the transmitted data from the recordedsignal, we convert the recorded signal to frequency domainand detect the beginning of the transmission (using thepreamble “110”) using a suitable threshold for the sum ofFFT coefficients of the signal. A window size of 441 sampleswith 50 percent overlap and the frequency band 6.8 kHz –7.8 kHz is chosen for best results. The plot of sum of FFTcoefficients against time can be observed in Fig. 1b.

We tested our eavesdropping attack on ten random PINsusing the above setup with the recordings done at a distanceof 15cm. The attack was successful with 100 percentaccuracy, demonstrating that communication using vibra-tions is susceptible to an acoustic eavesdropping attackerwith a high degree of accuracy. This result is in-line withthe results of the acoustic eavesdropping attack schemeproposed by Halevi et al. [5].

5 OUR DEFENSE: NOISY VIBRATION PAIRING

Aproximity eavesdropping attacker exploits acoustic leakageemanating from the vibrations during pairing for decodingthe transmitted PIN or key. To prevent such an attack, acous-tic leakage should be minimized or hidden in such mannerthat it becomes very hard for the proximity attacker to extractany meaningful information about the transmitted PIN orthe key. To achieve this objective, either signal cancellationor signalmasking can be deployed.

5.1 Audio Leakage Cancellation

Roy et al. [20] proposed canceling of the sounds of vibration(termed SoV) by generating an “anti-noise” signal on thetransmitting device (source of vibration). To estimate theeffect of surface on which the device has been placed, ashort preamble is transmitted and the resulting SoV isrecorded. The FFT of SoV is examined for the strongestovertones that are then combined to create the “anti-noise”signal. For phase alignment, the transmitter increases thesampling frequency of “anti-noise” signal keeping track ofthe phase difference of the “anti-noise” and SoV, switchingit back to its original value when the phase difference isminimum.

This approach may not be suitable for IoT devices thatare computationally restricted in their ability to performexhaustive signal processing tasks such as FFT calculationand real time phase synchronization. It may also take moretime to generate “anti-noise” signal than the entirety of thepairing process (our setup in Section 3 takes 3.4s). Anotherpossible flaw in signal cancellation as a defense lies in the factthat it is not possible to cancel out SoV completely andpromptly. It takes some time before we can determine thephase difference and then perform the matching, duringwhich SoV would be constantly leaking confidential informa-tion. It may also be possible that a more sophisticated attacker(e.g., using machine learning) may be able to recover someremnant information about SoV from the resultant signal.

If we do not consider the duration of the communicationas a limiting factor by artificially padding it via an additionof a preamble to the actual PIN, cancellation of audio signalmay yet prove to be capable of mitigating the acoustic sidechannel attack. However, in this work, we restrict ourselvesto the examination of easy to generate and computationallylight signal masking technique.

5.2 Audio Leakage Masking

Signal masking mechanism is commonly featured in radiocommunications where the presence of noise in the environ-ment corrupts the signal. If the signal to noise ratio (SNR) islow, it becomes hard to differentiate the signal from thebackground noise. This phenomenon can be utilized againstan acoustic eavesdropping attacker by intentionally intro-ducing noise (referred as masking signal) during the vibra-tional pairing so that it cloaks the audio leakage from thevibration, making it indistinguishable from the masking sig-nal. As the effectiveness of the defense mechanism dependson the difficulty of the adversary’s task in filtering outmasking signal from eavesdropped signal, we test out var-iants of masking signal that can be deployed to mask theacoustic leakage from vibrations effectively.

To defend the pairing process against a proximityattacker (Section 4), we propose Vibreaker, a defense mecha-nism that generates a masking signal obscuring the audioleakage, making it hard for the adversary to extract anyinformation about the data transmitted during the pairingof devices. In order to measure the effectiveness of ourmechanism, we experiment with different types of soundsthat could potentially be the masking signal and evaluatetheir security against an attacker as defined in Section 4.

Fig. 1. Proximity attack model and acoustic characteristics of the vibration leakage for PIN “4562” under the model. Color intensity in the spectrum isproportional to energy in the frequency band with blue color indicating lowest energy per frequency and red being the highest energy contained perfrequency. Sum of the FFTcoefficients indicates the estimated energy at the time instant.



6 VIBREAKER AGAINST PROXIMITY ATTACK

We will now evaluate the efficiency of various maskingsignals against an eavesdropping attacker as described inSection 4. We will also investigate the possibility of filteringout the masking signals by the attacker and extracting anyrelevant information from the resultant signal.

6.1 White Noise as Masking Signal

White noise is defined as a random signal having a constantpower spectral density. White noise is constantly presentin our environment for example, the humming sound ema-nating from air conditioning units. It has also been used forsound masking in offices by suppressing other distractingsounds. Here, we use white noise as the simplest candidatemasking signal that can be generated easily. Filteringthe white noise signal is not a hard task, but the process offiltering also diminishes the quality of the recovered signal.Since white noise is evenly distributed over all of thefrequency spectrum, trying to filter it out also removessome of the signal in the frequency bands where the whitenoise overlaps with the spectrum of the original signal(audio leakage from the vibrations).

Experimental Setup. We use the wgn function of Matlab togenerate a 10 second sample ofwhite Gaussian noise at a sam-pling frequency of 44.1 kHz. White gaussian noise is a goodapproximation of real world white noise and hence sufficientfor our intentions. A frequency filter to the generated noisesample can be applied to limit the white noise spectrum to thesame frequency band as the audio leakage fromvibrations.

Once we have generated the white noise sample, we playit in the background during the pairing of the IoT deviceswith the transmitting device vibrating to deliver the PIN orkey to the receiving device. To make sure that the whitenoise suppresses all the audio leakage, we introduce a delayin the pairing process such that it begins only after the whitenoise has started playing in the background.

Observations. To study the effectiveness of white noise asa masking signal against a proximity attacker, we use thepairing protocol described in Section 3. Our observations forthe recording done at a distance of 15cm (Fig. 2a) showthat white noise is able to mask the audio leakage from the

vibrations. In addition, plotting sum of FFT coefficients overtime (an indication of energy in the signal) does not revealany vibration sounds in the intended frequency domain. Thebit error rate of the attacker (Fig. 2b) never reaches 0%, thebest effort being at approximately 35%. Apart from coveringthe spectrum in which the audio leakage from the vibrationslie, the sound level of the white noise can be kept more thanthat of vibration sounds thereby easily suppressing the leak-age. Since vibration sounds are not loud, it is easy to generatelouder white noise for a short duration that only lasts till pair-ing is successfully completed.

Filtering the Masking Signal. While our results indicatethat white noise is capable of hiding acoustic leakage, thereexist filtering methods that use spectrum subtraction toremove a static signal like white noise from the recordedaudio. For filtering the white noise, we use the noise reduc-tion effect from Audacity tool that selects a small sample ofnoise as the profile of the static signal to be removed andsubtracts it from the spectrum of the recorded audio. Thistechnique is known as spectral noise gating and it works bysuppressing all pure tones that fall below a pre-determinedthreshold (determined from the created profile of staticnoise) in each frequency band. We used a noise reductionlevel of 15dB and a sensitivity value of 6 to get the bestresults. Fig. 3 shows the bit error rate for the attacker afterfiltering the white noise. From the plot, it is clear that the

Fig. 2. Frequency features of the eavesdropped signal and accuracy of a proximity attacker against Vibreaker(white noise) for PIN “4562”. Colorintensity in the spectrum is proportional to energy in the frequency band with blue color indicating lowest energy per frequency and red being thehighest energy contained per frequency. Sum of the FFTcoefficients indicates the estimated energy at the time instant.

Fig. 3. Bit error rate against threshold values for PIN “4562” after filteringthe white noise. Since we begin from maximum threshold value, the xaxis is in descending order.



attacker never achieves a bit error rate of 0% (fully decodedPIN). The best bit error rate remains similar to the oneachieved prior to filtering. This effect is due to completecloaking of the audio leakage by the white noise and thesubsequent filtering that destroys any information aboutthe acoustic leakage itself (due to frequency overlap).

6.2 Pre-Recorded Vibration Sounds asMasking Signal

Our next choice of masking signal is a close representationof the audio leakage itself i.e., the sound generated duringthe vibration of the PIN transmitting IoT device. We pre-record the sound emanated during the vibration and try toconfuse the attacker by masking the audio leakage from thevibrations with pre-recorded vibration sounds (henceforthreferred as fake vibrations).

Experimental Setup. We generated a random sequenceof numbers and encoded them as vibrations using thesame protocol as PIN-Vibra [9]. However, in order tomake sure that the fake vibrations completely overlapwith the actual vibrations, we reduced the duration ofsilence between the vibrations from 200ms to 100ms. Theresulting vibration sequence is recorded offline andstored for use as the masking signal. When the user ini-tiates the protocol for sending the PIN via vibrations, thedevice in addition to vibrating also plays the stored mask-ing signal in the background. We adjusted the timings ofthe masking signal such that it always begins playing atapproximately the same time as the vibrations. The prox-imity attacker is again presumed to be eavesdropping ata distance of 15cm.

Observations. Our results (Fig. 4) show that fake vibra-tions are able to mask the audio leakage resulting from thedevice’s vibration. It is very hard to distinguish betweenfake vibration signals and the audio leakage based only onfrequencies as demonstrated in Fig. 4a. The sum of FFT coef-ficients also shows identical response from the fake vibra-tions and the acoustic leakage from vibrations indicatingthat audio leakage has completely been masked. The biterror rate for the proximity attacker for each threshold valueis shown in Fig. 4b. The best bit error rate achieved by theattacker is 30% which is similar to the accuracy achieved inthe proximity attack model in noisy vibration pairing. Thusfake vibrations provide similar defense capabilities as whitenoise in our experiments.

Filtering the Masking Signal. We applied the same filteringprocess that was used for filtering out white noise. Sincesounds of fake vibration differ from actual vibration sounddue to imperfect reproduction by the speakers, we (as anattacker) listened to the eavesdropped audio signal andselected the part that we believed to be the fake vibrations.The selected part of the audio was used as the noise profileand applied to the full length of the eavesdropped audiosignal for filtering the fake vibrations.

The results (Fig. 5) after the filtering process show thatthe bit error rate is around 20 percent indicating that theattacker fails to completely decode the recorded sound eventhough it is a slightly better error rate when comparedagainst white noise. Thus fake vibrations could also serve asa masking signal for obfuscating the vibration sounds.

7 VIBREAKER AGAINST CO-LOCATED ATTACK

In the attack model described in Section 4 (proximity attack),the eavesdropping attacker is at a distance from the pairingdevices. This threat model can be further strengthened bydecrementing the distance between the pairing devices andthe eavesdropping attacker to almost zero. This extensionof the attack model places the adversary in the same physi-cal location as either of the devices involved in pairing,henceforth called co-located attack.

The advantage of a co-located eavesdropper from anattack point of view is a more accurate recording of theacoustic leakage from vibrations as it places the recordingdevice close to the source of acoustic emanation from the

Fig. 4. Frequency features of the eavesdropped signal and the effectiveness of a proximity attacker in presence of fake vibrations for PIN “4562”.Color intensity in the spectrum is proportional to energy in the frequency band with blue color indicating lowest energy per frequency and red beingthe highest energy contained per frequency. Sum of the FFTcoefficients indicates the estimated energy at the time instant.

Fig. 5. Bit error rate against threshold values for PIN “4562” after filteringfake vibrations. Since we begin from maximum threshold value, the xaxis is in descending order.



vibration motor of the device transmitting the pairing data.The assumption that the eavesdropper is residing on one ofthe devices (co-resident) can be realized in real life if thevibration transmitting device is equipped with an on-boardmicrophone (for example, a smartphone) that can be manip-ulated into recording vibrations through a malicious appli-cation installed by an attacker. Another way for an attackerto implement co-located attack scenario would be by attach-ing a tiny listening bug [21] to either of the pairing devices.This effort would, however require one-time access to thecompromised device constituting a lunch-time attack.

A co-located attacker can exploit the microphone on thedevice to record the acoustic leakage resulting from vibra-tions. One point to note in the context of smartphones isthat access to microphone requires explicit permission fromthe user. However, studies done on android smartphonessuggest that the comprehension and attention level of userswhile granting these app permissions are very dismal [22].This may help an eavesdropping attacker slip past suchrestrictions. In addition, audio channel opens up other dif-ferent attack possibilities that include eavesdropping via avoice based call constituting a case of remote attacker–coresident eavesdropping scenario.

Towing the same line as proximity attackmodel, we assumethat the attacker is capable of recording the acoustic leakagefrom vibrations and process it offline using signal processingtools. The environment is considered to be noise-free exceptfor the participating devices in the scenario.

In addition to a co-located attacker that can exploit themicrophone of the transmitting device (IoT hub), anothertype of co-located attack exists that can exploit on-boardmotion sensors of the IoT hub (smartphone) and learn thetransmitted PIN/passphrase from the effect of IOT hub’svibrations on its own motion sensors. We study this type ofattack and propose a defense that seeks to transparentlyhide the vibration’s effect present in the motion sensor’sreadings from the co-located attacker.

7.1 Attack Experiment under Co-LocatedAttack Model

The attack principles under this advanced eavesdroppingattack model are similar to proximity attackmodel. We imple-mented the same ON-OFF encoding scheme for transforming

4-bit PIN into vibration sequence. We used the same setof Motorola Droid X2 smartphones as the communicatingdevices, with one acting as the transmitter and other as thereceiver. To record the audio generated from the vibrations,we used Dynex PC microphone and Matlab’s signal process-ing toolbox for processing the recorded audio. The micro-phone is placed at a distance (� 1 cm) from the vibratingdevice in order to record the vibration sounds at the closestpossible distance for emulating a co-located adversary. Theon-boardmicrophone can also be used for this purpose as perdetailed in the threatmodel.

To complete the noisy vibrational pairing setup, we alsoimplemented the defense measures as proposed in [23]. Thedefense measure in [23] utilized band-limited Gaussianwhite noise that lies in the same frequency range as theaudio leakage from the vibration, to hide the acoustic leak-age from the vibrations. The masking sound was generatedas the transmitting device starts the communication withthe receiving device for pairing. On the side of the attackerthat has the capability to process the eavesdropped audiosignal offline, we also used the “noise reduction” feature ofthe audio processing tool Audacity to filter out the noiseand reveal part or whole of the audio leakage. This featureallows the selection of a small portion of the audio signalconsisting of the noise only to build a noise profile that isthen filtered from the whole audio signal.

7.1.1 Effectiveness of the Attack

As per the threat model detailed in Section 7.1, we recordedvibration sounds superimposed by the masking sound ata distance of 0 cm. We also recorded vibration sounds at adistance of 10 cm for comparing it with the co-located adver-sary scenario. Figs. 6a and 6b represent the frequency spec-trum of the eavesdropped signal at distances 0 cm (as perour threat model) and 10 cm (similar to [23]). The frequencyspectrum revealed that masking sound may be able to hidethe audio leakage due to the vibrations from an adversaryeavesdropping at a distance. However, for a co-located adver-sary, themasking soundwas unable to hide the audio leakageresulting from the vibrations at the lower frequency rangeof 50 Hz-250 Hz. Since white noise has same intensity for allfrequency, while the vibration emanations have higher inten-sity at low frequencies (as evident in the frequency spectrum

Fig. 6. Difference in audio characteristics of recorded vibration sounds at close (0 cm) and far (10 cm) distances. Color intensity in the top graph isproportional to energy in the frequency band with blue color indicating lowest energy per frequency and red being the highest energy contained perfrequency. Sum of the FFTcoefficients indicates the estimated energy at the time instant.



of Fig. 6a, we believe white noise alone would be unableto hide the emanations due to uneven spectral power densityof the emanations.

In our proximity attack model (Section 4), we showed thatwhite noise was enough to hide acoustic leakage occurringfrom vibrations of the device for an attacker at a distance of10-15 cm. Yet, as we demonstrated in our experiments, whitenoise (same as pseudorandom noise) alone was inadequateto mask the vibration sounds for a co-located attacker. Hence,we needed to explore further options to bolster the whitenoise masking signal in order to mitigate an eavesdroppingattack from a co-located attacker.

7.2 Novel Defense based on Low-Frequency Noise

Since white noise was unable to mask the vibration sounds atlow frequencies, we tried to strengthen the white noise at thelower frequency band of 50 Hz-250 Hz against a co-locatedadversary. We also faced some challenges in the implementa-tion of the proposed masking signal with the attack setupdescribed in Section 7 that we will describe here. Lastly,we evaluated the efficiency of the proposed masking signalagainst sophisticated attacks and its effect on vibrationalsensors of the receiving device.

7.2.1 Masking with Vibrations and Low-Frequency

Tones

As observed in the previous section, white noise alone provedineffective at masking low-frequency vibration sounds. Inorder to overcome this shortcoming, we considered other sig-nals that could prove effective at masking audio leakage atlow frequencies. We tried to add sounds that are acousticallysimilar to vibration sounds to confuse the adversary betweenthe real vibration sounds and the pre-recorded vibrationsounds thatwere played alongwith thewhite noise.

We recorded vibrations of the Droid X2 phone from oursetup with the inbuilt microphone, with the phone placedon a glass surface. Ripple [20] indicated a glass plate as pro-ducing the strongest side channel leakage when the vibratingdevice is placed on it. This motivated us to record the vibra-tions on a glass surface (henceforth referred as fake vibrationsounds) as stronger the vibration sounds, stronger would betheir recording, producing similar effect as actual vibrationsoundswhen played back during the pairing process.

After recording fake vibration sounds, we played it onthe device during pairing process to gauge similarity of thefake vibration sounds with audio leakage of real vibrationon the frequency spectrum. Fig. 7 shows the resultingfrequency spectrum. As it is clear from the spectrum thatwhile fake vibration sounds matched the audio leakagefrom the real vibration to most extent, they lacked therequired low-frequency components contained in the audioleakage. Hence, they offered no better protection over thewhite noise masking signal and neither did the combinationof both the white noise and fake vibration sounds (due totheir inefficiency at lower frequency range).

The lack of low-frequency components in the fake vibra-tion signal implored us to explore software based solutionsfor improving the quality of the audio playback to recover thedesired low-frequency response. Since, our device ran onAndroid platform, we utilized audio effects and controlsoffered by the platform via AudioTrack API (Application

Programming Interface) but no improvements were observedand boosting the signal only resulted in clipping of the audiosignal, a phenomenon explained below:

Non Linearity. This phenomenon is widely encounteredin electrical circuits e.g., an amplifier, where the generatedoutput signal strength is not directly proportional to theinput signal strength.

Clipping. This phenomenon occurs due to distortion ofthe waveform when an amplifier is over-driven by trying toproduce an output signal, the strength of which is beyondthe specified limits of the amplifier. This causes the signal tobe clipped at the limits resulting in a distorted wave. A sideeffect of clipping is the introduction of harmonics of thesignal at higher frequencies.

The next choice in our experiments was to generate tonesin the desired frequency range and add them to the whitenoise to obfuscate the audio leakage from the vibrationsounds. For this purpose, we used the Tone Generator func-tion in Audacity along with the Noise Generator, and used“mix and render” functionality to produce the combinedsignal that is a mixture of white noise and a sinusoidal toneof 150 Hz. The resulting observations are shown in Fig. 8a.As Fig. 8a shows, there was no masking at the lower fre-quency band despite the introduction of a low-frequency(150 Hz) tone. In particular, there was no presence of thetone at the intended frequency level. This behavior was sim-ilar to that of fake vibration sounds which also lacked thelow-frequency components present in the audio leakage.We further investigated the issue by trying to reproducevarious low-frequency sounds on two devices: MotorolaDroid X2 and LG G4 smartphones. Droid X2 is an old smart-phone, first released in 2011 whereas G4 is one of the latestdevices announced in 2015.

During our attempts to reproduce low-frequency toneswhile testing the speakers of both old (Droid X2) and new(LG G4) devices, we re-encountered the non-linear behaviorof the speaker response. The output audio signal for low-frequency tones was very low, barely registering on themicrophone. Any attempts to increase the gain would inad-vertently result in clipping of the signal producing unwantedharmonics at higher frequency levels with no improvement atthe intended low frequency. We expected better performancefrom LG G4 smartphone featuring an improved speaker butthe results were only slightly better (Fig. 8b). The speaker wasbarely an improvement over the Droid X2 speaker suffering

Fig. 7. Frequency spectrogram of real vibration sounds mixed with fakevibrations sounds. Color intensity in the spectrum is proportional to energyin the frequency band with blue color indicating lowest energy perfrequency and red being the highest energy contained per frequency.



from the same drawbacks of non linearity and clipping. Sincethe inbuilt speakers of the smartphones did not fulfill ourpurpose, we turned to other setups where we could obtainbetter speaker quality for improved sound reproduction.

7.3 Enhancing Vibreaker with Low Frequency Tone

While smartphone speakers may be limited in their capacityto reproduce low frequency sounds (sub 300 Hz), we cantry to boost their capabilities by complementing them withbetter hardware. Such an approach has been used in theindustry by introducing a case like system with built-inspeakers and/or a separate audio engine to boost thequality of smartphone’s speaker [24], [25], [26]. While [24]and [25] are geared towards iPhones, [26] is offered as anaccessory for Moto Z family of phones. These accessoriescan be put on as a case on the phone (Fig. 10).

We simulated this concept by taping a small portablespeaker to our device and playing the sound through it.This setup also emulated the scenario where the receivingdevice could have an inbuilt powerful speaker like a paymentterminal or high end media devices e.g., a smart television.For our experiments, we used three different portable speak-ers Altec [27], Sony SRS-XB2 [28] and JBL [29]. The frequencyspecifications for the tested speakers are presented in Table 1.In order to test the effectiveness of speakers in producinglow frequency sounds, we played a 150 Hz sinusoidal tonethrough each speaker and observed the recorded signal in thefrequency domain.

Since a tone below 150 Hz distorted the response fromAltec speaker, we used 150 Hz tone in our next stage ofexperiment. We connected the speaker to the smartphonevia an audio cable (or bluetooth) while rest of the experi-mental setup was similar to our previous attack experiment(Section 7) under similar threat model (Section 4). The mask-ing sound used to obfuscate audio leakage was a mix ofwhite noise and a low-frequency tone (150 Hz). We gener-ated two separate tracks containing white noise (generatedusing noise generator functionality in Audacity) and 150 Hztone (generated using tone generator functionality in

Audacity) which were then mixed and rendered to form anew track. The low-frequency tone helped in masking thelow frequencies of the audio leakage while the white noisespread across the rest of the frequency spectrum maskedthe audio leakage at higher frequencies.

During our experiments, we observed the effectivenessof masking signal against co-located adversary. We alsoobserved the effect of sound level of the masking signal inthe event of clipping. This was of particular importance aswe were operating around the lowest frequency responsefor some of the tested speakers.

The results for the portable speaker are shown in Fig. 9a.The frequency spectrum did not show the presence of audioleakage resulting fromvibrations, particularly at low frequen-cies (50 Hz-250 Hz). The graph of the sum of FFT coefficientsversus time showed that the quality of audio leakagedegraded to an extent that it became very hard to choose asuitable threshold that could determine a constant period ofvibration.While the spikes in the graphmay indicate towardspresence of vibration, the resulting pattern could not bedecoded into a valid PINmaking the detection infeasible.

This observation showed that the external portablespeaker had the required sound reproduction quality thatwas found lacking on the inbuilt smartphone speakers. Wealso measured the sound level of the masking signals via asound level measurement application for Android phoneand recorded the sound level at a distance of 10cm. Weobserved that the optimal sound level for producinglow-frequency sound of an amplitude sufficient to maskthe audio leakage from vibration sounds was around58 decibels. This sound level is approximately equal that ofconversational speech and thus not considered harmful tothe human ear.

Fig. 8. Spectrum for acoustic leakage captured through different devices. Color intensity in the spectrum is proportional to energy in the frequencyband with blue color indicating lowest energy per frequency and red being the highest energy contained per frequency.

Fig. 9. JBL Soundboost case for Motorola Z series of smartphones.

TABLE 1Frequency Response for Tested Speakers

Speaker Frequency Response(in Hz)

Altec Lansing Mini H2O Speaker Not specifiedJBL Clip Portable Bluetooth Speaker 160-20,000Sony SRS-XB2 Speaker 20-20,000



7.3.1 Security under Sophisticated Attacks

In this section, we will evaluate the masking effectiveness ofthe white noise boosted with low-frequency tones againstsome attack vectors that may be used by the adversary for amore sophisticated analysis of the eavesdropped signal.The attack techniques that we will discuss here wouldinvolve noise filtering and source separation techniques.

Noise Filtering. The defense mechanisms that we studiedtill now, relied on deliberate injection of a masking signal inthe environment for obfuscating the audio leakage duringvibrational pairing. From the adversary’s point of view,the masking signal was the noise accompanying the audioleakage (that was to be acquired and decoded). Hence, theadversary could try to remove or suppress the noise usingnoise removal algorithms.

Since a co-located adversary has the ability to process theeavesdropped signal offline and recover the informationfrom the audio leakage, we repeated the attack experiment(Section 7) according to our threat model (co-locatedattacker) with the masking signal comprising of white noisewith a low-frequency tone of 150 Hz that was capableof masking the audio leakage from the vibrations at thelow-frequency bands as detailed previously.

To evaluate the efficiency of our masking signal againstnoise filtering, we applied the noise reduction techniquecalled “spectral noise gating” to the eavesdropped signal.This technique is used in most of the audio processing soft-ware tools like Audacity. We chose a short sample from theeavesdropped signal as the noise profile and applied it tothe signal to be removed as noise. This process could berepeated multiple times until satisfactory results areobtained. The results for the Sony SRS-XB2 speaker areshown in Fig. 9b. and they show no indication of the audioleakage from the vibrations in relevant part of the frequencyspectrum. This affirmed the effectiveness of masking signalat hiding the audio leakage from the vibrations.

7.3.2 Effect on Vibrational Sensing

In a pairing mechanism based on vibrations like PIN-Vibra[9], the receiving device uses its accelerometer to readthe vibrations and then decode it based on the protocol.The masking signal, proposed in this work, comprised of a

low-frequency tone alongwith thewhite noise. The bass effectof the low-frequency tone has a tendency to produce deeprumbling sounds that have the capability of producing faintvibrations in the speaker. This effect may interfere with theaccelerometer readings of the receiving device that couldhave an negative impact on the accuracy of the vibrationaldecoding and thereby the success of the pairing process.

In order to test the impact of the masking signal on theability of the receiving device to decode the vibrations cor-rectly, we collected accelerometer readings in the backgroundon the receiving device during the vibrational pairing in thepresence ofmasking signal (as proposed in Section 7.3).

We recreated the experiment setup as in Section 7.3. Weplayed the masking sound (white noise mixed with a 150 Hztone) at different loudness level and measured the effect ofthemasking sound on the receiver asmeasured by the acceler-ometer. We usedMotorola Droid X2 as the sender, Sony SRS-XB2 as the external speaker generating the masking sound,and SamsungGalaxy S6 as the receiver.

We plotted the bit error rate for the receiver against multi-ple thresholds for the tested loudness levels. In Section 7.3,we proposed 58 dB as an appropriate sound pressure levelfor the masking sound. A louder masking sound wouldalways be better at hiding the vibration sounds but may alsocause vibrations on the smartphone (especially low fre-quency tones) that may impact the readings on the receivingdevice. Fig. 11 shows the bit error rate at the receiving devicein presence of different loudness levels of the maskingsound. We observed that at all loudness levels ranging from53 dB till 82 dB, for at least one threshold value, bit error ratefalls to 0% i.e., the transmitted PIN is successfully decoded.

Thus, we believe that themasking signalmay not have anyeffect on the decoding accuracy of the receiving device whileenhancing the security by obfuscating the audio leakageresulting from the vibrations at the same time.

7.4 Security Against Motion Sensor Exploits

While our threatmodel consists of a co-located acoustic eaves-dropping adversary that exploits vibrations sounds, thereexists another exploit that eavesdrops on motion sensorsthereby compromising vibration based pairing protocols.This class of co-located adversary can trick the unsuspectingvictim into installing a malicious application on their IoT

Fig. 10. Characterstics of the audio signal in presence of masking sound from Sony SRS-XB2 speaker before and after noise filtering. Color intensityin the graph is proportional to energy in the frequency band with blue color indicating lowest energy per frequency and red being the highest energycontained per frequency. Sum of the FFTcoefficients indicates the estimated energy at the time instant.



device (smartphone) that eavesdrops on the motion sensorreadings and forwards them to the attacker. The attacker hasnow access to the same sensor readings as the receivingdevice thereby leading to leakage of the transferred PIN/pair-ing information.

This exploit works due to the underlying assumptionthat the vibration channel (AS-OOB channel in the protocol)can not be eavesdropped directly, which is not true for thisexploit. Android platform also does not impose any usagepermissions (zero-permission) on an application for readingthe motion sensor readings. This leads to an unrestrictedaccess for a malicious application that can monitor acceler-ometer readings during the vibrational pairing process andextract the transferred PIN/passphrase.

Our threat model only considers a co-located adversaryon the transmitting device (IoT hub) as the device alreadyknows the PIN/passphrase that it is going to transfer whilethe effect on its accelerometer during the vibrational pairingleaks this confidential information. The receiving device inour model is usually a dedicated IoT device (smart lock,smart bulb, smart thermostats etc.) that usually would notallow access to internal sensors as opposed to the IoT hubi.e., smartphone which is a multipurpose device and allowsunrestricted access to motion sensors.

We observed the effect of vibration pattern generatedduring the pairing protocol on the accelerometer sensor.The readings along z axis are shown in Fig. 12a and it canbe observed that the pattern is visible in the plot. Thus, thesereadings can be exploited by an adversary to extract infor-mation about the PIN/passphrase transferred during thevibrational pairing.

Mohamed et al. [10] proposed SMASheD frameworkfor stealthily eavesdropping and even modifying varioussensors on Android platform. SMASheD uses AndroidDebug Bridge (ADB) for installing a service on Androidalong with shell privileges that allow reading and writingto sensor files. For launching this attack, two scripts areused that push the service from the computer to the androiddevice via USB and then launch it. Mohamed et al. [10] used

the proposed framework as a potential security exploitagainst Android sensors.

Shrestha et al. [11] expanded the usage of SMASheD as apotential defense mechanism “Slogger” against sensor basedtouchstroke logging attacks. Touchstroke logging attackswork by inferring the start and end of a finger touch on asmartphone’s screen from eavesdropping on motion sensorreadings. To prevent such attacks, Slogger aims to inject fakesensor readings into the system file that records the motionsensor readings from the sensor hardware. The fake readingsserve as a noise cover to the real sensor readings that couldpotentially be exploited by touchstroke logging attacks. Weapply a similar idea as “Slogger” to defend against this classof motion sensor exploiting adversaries that could learn thevibration patterns from themotion sensors of the transmittingdevice (IoT hub).

We used the SMASheD framework to install a service onthe transmitting device (Droid X2 in our experiments) thatinjects fake accelerometer readings into the accelerometersystemfile. The accelerometer systemfile logs the accelerome-ter readings obtained from the accelerometer hardware and isopen to “read” by any application that wishes to learn theaccelerometer readings during an event. Since the injectedaccelerometer readings from SMASheD should be similarto accelerometer’s response during the transmitting device’svibration, we determined the minimum and maximum accel-erometer readings during vibration along x, y and z axis toimitate the actual accelerometer readings during vibrations.

Once we determined the minimum and maximum accel-erometer readings during vibrations, we generated randomnumbers in that range and injected them into the accelerom-eter system file at random intervals lasting less than timeduration between successive accelerometer events (inverseof sampling frequency and measured in milliseconds). Theminimum and maximum readings for accelerometer sensoras per accelerometer system file during a vibration eventwere < -70, 120> , < -80, 80> , and < 0, 1600> for x, y andz axis respectively. We plotted the accelerometer readings,along z axis, as recorded by a malicious application residing

Fig. 11. Bit error rate against threshold values for PIN “4562” at the receiving device with different loudness level of masking sound. Since we beginfrom maximum threshold value, the x axis is in descending order.



on the transmitting device and eavesdropping on acce-lerometer readings while our defense injected fake sensorreadings similar to vibrations that mask the real vibrationevents in Fig. 12b. When compared to Fig. 12a, we wereunable to determine which sensor readings correspond toactual vibrations and which readings correspond to fakeinjected vibrations. Thus, we were able to mask the effect ofvibrations affecting the accelerometer during the pairingprocess.

Observing the spectrum in presence of our enhanceddefense in Fig. 12d, we notice that the red bands, correspond-ing to the vibrations in Fig. 12c have completely disappeared.Since the injected fake accelerometer readings are similarto the vibration readings, there no longer exists a contrastbetween the background noise and vibrations in the acceler-ometer signal. Our defense has enhanced the backgroundnoise with fake readings so that it is now comparable tovibration patterns and so we are unable to spot any featuresthat may characterize vibrations in the spectrum. Thus wewere able tomask the effect of vibrations affecting the acceler-ometer during the pairing process.

8 VIBREAKER AGAINST REMOTE ATTACK

So far, we have discussed acoustic eavesdropping attacksinvolving a proximity attacker and a co-located attackereavesdropping on pairing of IoT devices. The advantage ofa proximity attacker lies in the fact that proximity scenariois more plausible and allows the attacker to eavesdrop onthe acoustic leakage covertly. On the other hand, when theattacker is co-located on the device itself, the attack itselfbecomes more powerful and just masking the acousticleakage emanating from vibrations during pairing usingbackground noise (white noise) is not sufficient.

In this section, we will discuss a new attack model thatinvolves a remote attacker eavesdropping on the vibrations

during the pairing process. In this attack model, we assumethat the user is trying to pair her device (for example, smart-phone) to her other smart devices while she is on call usingthe smartphone. As a real life example, a user may already beon call with another person while she attempts to pair herphone with one of the smart devices say, a smart thermostat.To pair the phone with the thermostat, she momentarily pla-ces the phone in contact with the thermostat without hangingup the call or placing the call on hold. This is a possible usecase scenario as the pairing process disrupts the call onlymomentarily taking only a few seconds.

We also assume that the entity on the other end of theuser’s call plays the role of a malicious adversary or attacker.Since the user has no way of knowing if the attacker is eaves-dropping on the conversation, the attacker can choose torecord the whole call or wait for the moment when the userbegins the pairing process. For best quality of eavesdropping,the attacker may choose to put the call on speaker so as toamplify the vibration sounds. The attack model is describedin Fig. 13. The transmitting device while on call withthe attacker is used for pairing with a smart device by bring-ing it in physical contact with the receiving device. During the

Fig. 12. Vibration patterns as recorded on accelerometer for PIN “4562” in absence and presence of fake injected readings

Fig. 13. Remote eavesdropping attack model for vibration based pairingof IoT devices



pairing, the transmitting device generates vibrations thatare picked by the receiving device and decoded for pairing.However, the sounds generated during vibrations are alsopicked up by the microphone of the transmitting deviceand thereby are delivered to the attacker by the voice callconnection.

8.1 Attack Experiment under Remote Attack Model

The experiment setup for the attack model constitutes a usercalling the attacker using her smartphone.We usedNexus 5Xphone at the user’s end to place the call and also to act as theIoT device generating the vibrations. We also used SamsungGalaxy S6 as the receiving device that reads the vibrations,decode it and authenticate the transmitting device. On theother end of the call, the attacker used IPhone 6s to receivethe call andMacbook Air (2013) inbuilt microphone to recordthe vibrations. While we used normal phone call during theexperiment, any voice call applications (Skype, Viber, What-sapp etc.) can be used for this purpose.

The spectrum of the captured vibration sounds on theattacker’s end are depicted in Fig. 14a. Comparing thisfrequency spectrum against frequency spectrum observedin Figs. 1a and 6b shows that the audio emanations fromvibrations are more distorted and diffused unlike the cap-tured audio in proximity attack model and low frequencyaudio captured under co-located attack model. To decode thecaptured audio, we used the same algorithm as in proximityand co-located attack models described iin previous sectionsbut introduce minor changes to some of the parameters. Werestricted the energy estimate in the frequency band of 3kHz to 4 kHz and increased the window size for each vibra-tion from 200 ms to 210 ms. The corresponding bit error rateplotted against threshold is shown in Fig. 14b.

The bit error rate starts at 47% because the binary represen-tation of PIN 4562 has 8 bits set and for maximum thresholdvalue, the bit string is all zero bits leading to a bit error rate of8=17 ¼ 0:47. For a threshold value of 18.4, the bit error ratedrops to 0% indicating successful recovery of the PINfrom acoustic leakage of vibrations as captured by the remoteattacker.

8.2 Vibreaker Against Remote Attack

As we discovered in previous section that vibration basedpairing for IoT devices runs the risk of being eavesdropped

upon if the IoT hub (i.e., smartphone) is on call with theattacker using normal phone service or one of the manyVoIP applications. To safeguards in such scenario, we testedour masking based defense, Vibreaker against a remoteattacker as described in Section 8.1.

The frequency spectrum for the audio signal recorded atthe remote attacker’s end during the pairing process isdepicted in Fig. 15. On the spectrum, the acoustic leakagefrom the vibration of the IoT hub can be clearly seen inthe lower frequency band (50 Hz-300 Hz). To quantify theacoustic leakage in the spectrum, we summed up the sumof FFT coefficients in a narrower frequency band of 200 Hz-300 Hz that can be observed in the lower subfigure ofFig. 15. Using similar attack principles as described previ-ously in this work, we were able to decode the correct PINvalue that was transferred from the IoT hub to the IoTdevice in our experiment. The graph depicting bit error ratefor different threshold values is seen in Fig. 16. As we canobserve in the graph, the bit error initially starts at 47% butdrops down to 0% at a threshold value of 11.7 that indicatesthat the correct PIN value has been decoded by the attacker.This results is similar to performance of Vibreaker against a

Fig. 14. Acoustic characteristics of the vibration leakage and acuracy of an attacker for PIN “4562” under remote attack model. Color intensity in thespectrum is proportional to energy in the frequency band with blue color indicating lowest energy per frequency and red being the highest energy con-tained per frequency. Sum of the FFTcoefficients indicates the estimated energy at the time instant.

Fig. 15. Frequency spectrum of the audio signal captured by a remoteattacker in presence of Vibreaker. Color intensity in the graph is propor-tional to energy in the frequency band with blue color indicating lowestenergy per frequency and red being the highest energy contained perfrequency. Sum of the FFTcoefficients indicates the estimated energy atthe time instant.



co-located attacker (Section 7) where white noise alone isunable to mask vibration sounds at low frequency bands.

8.3 Secure Pairing with White Noise and LowFrequency Tones

Since Vibreaker was unable to mask the acoustic leakage atsub 500 Hz frequencies, we turn to the same defense mecha-nism that we utilized against a co-resident attacker due tosimilarities in the capabilities of the attacker. For both co-located attacker and a remote attacker, the recording device isthe inbuilt microphone in the IoT hub which is the closestlocation to the source of vibrations. In both scenarios, whitenoise alone was unable to mask the acoustic leakage fromvibrations at low frequencies so we boost the defense mecha-nismby adding low frequency tone to thewhite noise signal.

We played the enhanced masking signal (white noiseadded with low frequency tone) similar to the defense setupin Section 7.3. the frequency spectrum of the recorded audiosignal from the remote attacker is depicted in Fig. 17. Thelower frequency band does not show any presence of acousticleakage from the vibrations as opposed to Fig. 15 which hadonly white noise to mask the vibration sounds. The graph ofbit error rate against various threshold values (in Fig. 18)shows that the minimum threshold value that could be

achieved was 35%. The bit strings obtained at both thresholdvalues however were not valid bit strings as they lacked thenecessary header (“110”) that is attached to the bit representa-tion of the PIN for indicating the beginning of a valid pairingtransmission.

9 SUMMARY AND DISCUSSION

The emerging field of IoT devices and their spreading useeveryday exposes this nascent technology to various securityrisks. Since these devices are low energy with limited compu-tational power, traditional security mechanism designeddo not work in the context of IoT devices as expected. Addi-tionally, their tendency to collaborate with each other byexchanging information makes a secure communicationchannel a necessity. Pairing of devices facilitates in establish-ing a trusted way of data transfer between devices andinvolves the exchange of a shared short key or passphrase. Inthis paper, we studied pairing among IoT devices (especiallyan IoT hub and other IoT devices) by using an auxiliarychannel (OOB channel). In particular, we focused on the useof vibration as a means of communicating shared key or PINamong the IoT devices and evaluated its security against anacoustic eavesdropping attack.

We implemented an ON-OFF vibration scheme [9] andused it for pairing two IoT devices by transferring a 4-bit PIN.To create an acoustic eavesdropping attack, we simulatedsuch an attacker as described in [18] that proposes an acousticeavesdropping attack against pairing of constrained devicessuch as an IMD and its reader.We studied the attack inmulti-ple scenarios by placing the attacker at different locations. Wealso proposed a signal masking based defense mechanism forthwarting the acoustic eavesdropper and evaluated it underthe studied scenarios.

In a proximity eavesdropping scenario, we showed thatan attacker can decode vibration sounds successfully from adistance of 15 cm. We also showed that if white noise wereto be injected in the environment surrounding the pairingdevices, it would make it very hard for an attacker to extractvibration sounds from the resulting audio that containsboth acoustic leakage from vibrations and the maskingsignal. We also tested fake vibration sounds as possiblemasking signal for the acoustic leakage from vibrations andfound it viable for securing the process. Enhancing theattacker’s capability by using noise filtering tools did notincrease the accuracy of the eavesdropping attack.

Fig. 16. Bit error rate plotted against threshold values in remote attackscenario under Vibreaker defense. Since we begin from maximumthreshold value, the x axis is in descending order.

Fig. 17. Frequency spectrum of the audio signal captured by a remoteattacker in presence of Vibreaker and low frequency tones. Color inten-sity in the graph is proportional to energy in the frequency band withblue color indicating lowest energy per frequency and red being the high-est energy contained per frequency. Sum of the FFT coefficients indi-cates the estimated energy at the time instant.

Fig. 18. Bit error rate plotted against threshold values in remote attackscenario under Vibreaker combined with low frequency tones. Since webegin from maximum threshold value, the x axis is in descending order.



Our next attack scenario placed the eavesdropping attac-ker on the pairing device (IoT hub) itself in the form of a mali-cious application residing upon one of the device that hadaccess tomicrophone of the device. Such a devicewould com-monly be a smartphone but could also be smart speakers suchasAmazon Echo or GoogleHome. For such an attacker, whitenoise defense is not effective at hiding the acoustic leakage. Inparticular, we found out that due to co-location of the attackerwith the source of vibration, a better quality of acoustic leak-age is picked up by the eavesdropper especially at low fre-quencies. As speakers found in low power constraineddevices are designed to produce just acceptable sound, lowerfrequencies of themasking signal that are sub 500 Hzmay notbe reproduced satisfactorily.

Since white noise reproduced from a low powered con-strained device (such as many IoT devices) does nothide vibrations sounds at sub 1 kHz frequencies, we lookedtowards introducing low frequency tones via an externalspeaker. This could remedy the shortcomings of on-devicespeakers responsible for masking acoustic emanations due tovibrations. We tested the enhanced defense mechanism withlow frequency tone of 150 Hz added to the white noise, bothbeing generated via low-cost external speaker co-locatedwiththe device. We tested the effect of our enhanced setup on thedecoding ability of the receiving IoT device and found it toabsent from its sensor readings.

To evaluate our defense against an advanced attacker,we applied noise filtering algorithm and found out thatnoise filtering did not help the attacker in removing themasking signal from the eavesdropped audio signal andrecovering the acoustic leakage of the vibrations. Such anattempt by an attacker resulted in the whole signal beingreduced with barely any audio present in the filtered signal.

While this defense setup required an external speaker, webelieve that for smartphones, better quality external speakersalready exist as phones cases or as external modules that canbe attached to the phonewithout affecting their usability.

To complete the security of our defense, we proposedan additional measure that prevents a co-located adversaryon the transmitting device to exploit the accelerometerreadings that determine the information transferred duringnoisy pairing protocol. The proposed method works on asimilar notion as noisy vibrations by injecting fake read-ings in accelerometer readings thereby masking the effectof vibrations of the transmitting device on its accelerome-ter. Since the injected readings are done programatically,the enhanced defense is transparent to the user. We alsoobserved the effect of noisy vibrational pairing on thereceiving device’s capability of decoding the transmittedPIN/passphrase in presence of masking sound at differentloudness levels. We showed that the masking sound at dif-ferent loudness levels does not affect the ability of thereceiving device to correctly decode the PIN/passphrase.

The advent of voice over IP (VoIP) applications has led tomultiple ways for people to communicate in addition tonormal voice calling facility. We investigated the prospectof a malicious entity eavesdropping over any VoIP callsincluding normal phone calls and potentially recording thevibrations sounds during pairing. We recorded vibrationssounds over a normal call and discovered that vibrationsounds were both audible after being recorded on the other

end of the call. Moreover, they could also be decoded usingsimilar technique as previous eavesdropping attacks lead-ing to full disclosure of the pairing secret being sharedbetween the two devices.

We tested the remote eavesdropping attacker againstVibreaker and showed that such an attack model is successfulin decoding the transmission into the exchanged informationin a similar manner to a co-located attack model. The acousticleakage from the vibrations was visible at low frequenciesthat demonstrated the inability of white noise in maskingalone.We then tested the remote eavesdropping attackmodelagainst enhanced defense that combined Vibreaker with lowfrequency tone. The enhanced defense was found out to besufficient at thwarting the remote attacker as the remotelycaptured audio signal could not be decoded correctly into thetransferred PIN for all values of the threshold in the relevantfrequency bands.

10 CONCLUSION

In this work, we showed that vibration based pairing pro-tocols for constrained devices like those comprising IoTnetwork, can be defended against an acoustic eavesdrop-ping attack by adding artificial noise to the environment.We showed that traditional acoustic eavesdropping mecha-nism that places the attacker in proximity of the pairingdevices, can be impeded by using white noise as a maskingsignal to cloak audio leakage resulting from vibrations. Wealso examined a novel scenario where the eavesdroppingmay be done through the device’s microphone itself or aspying bug attached on the device. This setup, termed as“co-resident or co-located” attack, proved to be more potentthan proximity attack as “co-located” attack is able to capturemore details of the audio leakage than the proximity attack.To defend against our new attack model, we demonstratedthat white noise alone was not sufficient and needed to beenhanced with the addition of low frequency tones throughan external speaker. This step was also necessitated by thefact that smartphone speakers were not able to reproducelow frequency tones at sufficient intensity to cloak the audioleakage from vibrations in our experiments.

The “co-located” adversarial model was also expandedto include motion sensor exploits that can compromise theon-board motion sensors like accelerometer on the IoT hubto learn the transferred secret via vibrations. We extendedour defense by complimenting it by injecting fake acceler-ometer readings that are able to mask the actual vibrationeffect on the accelerometer. We also examined the possibil-ity of a remote attack model that records the acoustic leak-age resulting from the vibration over a remote connectionsuch as phone call or a VoIP connection. Such a situation isplausible if one of the devices involved in the pairing pro-cess is a smartphone on call with a dishonest entity. Thisattack model was successful against Vibreaker but enhanc-ing Vibreaker with low frequency tones cloaked the acousticleakage to a sufficient degree so as to impede the attack. Ouranalysis showed that while vibration pairing may seem tobe an attractive mechanism for ensuring the security andtrust in an IoT network, it needs to be protected againstacoustic side channel attacks by defensive measures such asmasking signals that are low cost and easy to implement.



ACKNOWLEDGMENTS

This submission is an extension of work done in [1] and [2].

REFERENCES

[1] S. A.Anand andN. Saxena, “Vibreaker: Securing vibrational pairingwith deliberate acoustic noise,” in Proc. 9th ACM Conf. SecurityPrivacyWirelessMobile Netw., 2016, pp. 103–108.

[2] S. A. Anand and N. Saxena, “Coresident evil: Noisy vibrationalpairing in the face of co-located acoustic eavesdropping,” in Proc.10th ACM Conf. Security Privacy Wireless Mobile Netw., 2017,pp. 173–183.

[3] Global Standards Initiative on Internet of Things, Overview of theInternet of Things (ITU-T Y.4000/Y.2060). 2012. [Online]. Available:http://handle.itu.int/11.1002/1000/11559

[4] Gartner, Inc., Gartner Says 8.4 Billion Connected “Things” Will Bein Use in 2017, Up 31 Percent From 2016. 2017. [Online]. Available:http://www.gartner.com/newsroom/id/3598917

[5] T. Halevi and N. Saxena, “On pairing constrained Wireless devicesbased on secrecy of auxiliary channels: The case of acousticeavesdropping,” in Proc. ACM Conf. Comput. Commun. Security,Oct. 2010, pp. 97–108.

[6] R. Kainda, I. Flechais, and A. W. Roscoe, “Usability and securityof out-of-band channels in secure device pairing protocols,” inProc. Symp. Usable Privacy Security, 2009, Art. no. 11.

[7] V. Boyko, P. Mackenzie, and S. Patel, “Provably secure password-authenticated key exchange using diffie-Hellman,” in Proc. Int.Conf. Theory Appl. Cryptographic Techn. Bruges, 2000, pp. 156–171.

[8] D. Halperin, T. S. Heydt-Benjamin, B. Ransford, S. S. Clark,B. Defend, W. Morgan, K. Fu, T. Kohno, and W. H. Maisel,“Pacemakers and implantable cardiac defibrillators: Softwareradio attacks and zero-power defenses,” in Proc. IEEE Symp.Security Privacy, 2008, pp. 129–142.

[9] N. Saxena,M. B. Uddin, J. Voris, andN.Asokan, “Vibrate-to-unlock:Mobile phone assisted user authentication to multiple personalRFID tags,” in Proc. IEEE Int. Conf. Pervasive Comput. Commun., 2011,pp. 181–188.

[10] M. Mohamed, B. Shrestha, and N. Saxena, “Smashed: Sniffing andmanipulating android sensor data for offensive purposes,” IEEETrans. Inf. Forensics Security, vol. 12, no. 4, pp. 901–913, Apr. 2017.

[11] P. Shrestha, M. Mohamed, and N. Saxena, “Slogger: Smashingmotion-based touchstroke logging with transparent system noise,”in Proc. 9th ACM Conf. Security Privacy Wireless Mobile Netw., 2016,pp. 67–77.

[12] D. Balfanz, D. Smetters, P. Stewart, and H. C. Wong, “Talking tostrangers: Authentication in Ad-Hoc Wireless networks,” in Proc.Symp. Netw. Distrib. Syst. Security, Feb. 2002.

[13] M. T. Goodrich, M. Sirivianos, J. Solis, G. Tsudik, and E. Uzun,“Loud and clear: Human-verifiable authentication based on audio,”inProc. 26th IEEE Int. Conf. Distrib. Comput. Syst., 2006, pp. 10–10.

[14] J. M. McCune, A. Perrig, and M. K. Reiter, “Seeing-is-believing:Using camera phones for human-verifiable authentication,” inProc. IEEE Symp. Security Privacy, 2005, pp. 110–124.

[15] N. Saxena, J.-E. Ekberg, K. Kostiainen, and N. Asokan, “Securedevice pairing based on a visual channel,” in Proc. IEEE Symp.Security Privacy, 2006, Art. no. 313.

[16] Y. Kim, W. S. Lee, V. Raghunathan, N. K. Jha, and A. Raghunathan,“Vibration-based secure side channel for medical devices,” in Proc.52ndAnnu. Des. Autom. Conf., 2015, pp. 32:1–32:6.

[17] E. Uzun, K. Karvonen, and N. Asokan, Usability Analysis of SecurePairing Methods. Berlin, Germany: Springer, 2007, pp. 307–324.

[18] T. Halevi and N. Saxena, “Acoustic eavesdropping attacks onconstrained Wireless device pairing,” IEEE Trans. Inf. ForensicsSecurity, vol. 8, no. 3, pp. 563–577, Mar. 2013.

[19] Dynex–USB Microphone–Multi. 2015. [Online]. Available: https://www.dynexproducts.com/pdp/DX-USBMIC13/6084114

[20] N. Roy, M. Gowda, and R. R. Choudhury, “Ripple: Communicatingthrough physical vibration,” in Proc. 12th USENIX Conf. Netw. Syst.Des. Implementation, 2015, pp. 265–278.

[21] C. A. Inc., B6 Omnidirectional Lavalier. 2017. [Online]. Available:http://www.countryman.com/b6-omnidirectional-lavalier-microphone

[22] A. P. Felt, E. Ha, S. Egelman, A. Haney, E. Chin, and D. Wagner,“Android permissions: User attention, comprehension, and behav-ior,” in Proc. 8th Symp. Usable Privacy Security, 2012, pp. 3:1–3:14.

[23] Y. Kim, W. S. Lee, V. Raghunathan, N. K. Jha, and A. Raghunathan,“Vibration-based secure side channel for medical devices,” in Proc.52nd Annu.Des. Autom. Conf., 2015, Art. no. 32.

[24] AmpAudio, AmpAudio. Apr. 2016. [Online]. Available: https://www.ampaudio.com/

[25] Zagg, Zagg Speaker Case. Apr. 2016. [Online]. Available: http://www.zagg.com/us/en_us/cases/iphone-6-case/speaker-case

[26] Motorola, JBL Soundboost Speaker, Mar. 2017. [Online]. Available:https://www.motorola.com/us/products/moto-mods/jbl-soundboost-speaker

[27] A. Lansing, Altec Lansing - Mini H2O Bluetooth Speaker,Apr. 2016. [Online]. Available: http://www.alteclansing.com/en/al-products/mini-h20-speaker/

[28] Sony, Sony SRS-XB2. Mar. 2017. [Online]. Available: http://www.sony.com/electronics/wireless-speakers/srs-xb2

[29] JBL, JBL - Clip Portable Bluetooth Speaker. Apr. 2016. [Online].Available: http://www.bestbuy.com/site/jbl-clip-portable-bluetooth-speaker-purple/6050039.p?id=1219696711027&skuId=6050039

S Abhishek Anand is working toward the PhDdegree in the Department of Computer Science,University of Alabama at Birmingham. He servesas a graduate research assistant with the Dep-artment of Computer Sciences, University ofAlabama at Birmingham. His research interestsinclude security of privacy-sensitive channels andside channel attacks. He was the recipient of theOutstanding Masters student in the Departmentof Computer Science, University of Alabama atBirmingham in 2015.

Nitesh Saxena is an associate professor of Com-puter and Information Sciences, University ofAlabama at Birmingham (UAB), and the foundingdirector of the Security and Privacy in EmergingSystems (SPIES) group/lab. He works in thebroad areas of computer and network security,and applied cryptography, with a keen interest inwireless and mobile device security, and theemerging field of usable security. His currentresearch has been externally supported by multi-ple grants from NSF and NIJ, and by gifts/awards/

donations from the industry, including Google (2 Google FacultyResearch awards), Cisco, Comcast, Intel, Nokia and Research in Motion.He has published more than 110 journal, conference and workshoppapers, many at top tier venues in Computer Science, including: IEEETransactions, ISOC NDSS, ACM CCS, ACM WWW, ACM WiSec, ACMACSAC, ACM CHI, ACM Ubicomp, IEEE Percom, IEEE ICME and IEEES&P. On the educational/service front, Saxena currently serves as thedirector and principal investigator for the UAB’s Scholarship for Service(SFS) program and a co-director for UAB’s MS program in ComputerForensics and Security Management. He serves as an associate editorfor flagship security journals, the IEEE Transactions on InformationForensics and Security (TIFS), and Springer’s International Journalof Information Security (IJIS). His work has received extensive mediacoverage, for example, at NBC, MSN, Fox, Discovery, ABC, Bloomberg,MIT Tech Review, ZDNet, ACM TechNews, Yahoo! Finance, Commu-nications of ACM, Yahoo News, CNBC, Slashdot, Computer World,Science Daily andMotherboard.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.



http://handle.itu.int/11.1002/1000/11559

http://www.gartner.com/newsroom/id/3598917

https://www.dynexproducts.com/pdp/DX-USBMIC13/6084114

https://www.dynexproducts.com/pdp/DX-USBMIC13/6084114

http://www.countryman.com/b6-omnidirectional-lavalier-microphone

http://www.countryman.com/b6-omnidirectional-lavalier-microphone

https://www.ampaudio.com/

https://www.ampaudio.com/

http://www.zagg.com/us/en_us/cases/iphone-6-case/speaker-case

http://www.zagg.com/us/en_us/cases/iphone-6-case/speaker-case

https://www.motorola.com/us/products/moto-mods/jbl-soundboost-speaker

https://www.motorola.com/us/products/moto-mods/jbl-soundboost-speaker

http://www.alteclansing.com/en/al-products/mini-h20-speaker/

http://www.alteclansing.com/en/al-products/mini-h20-speaker/

http://www.sony.com/electronics/wireless-speakers/srs-xb2

http://www.sony.com/electronics/wireless-speakers/srs-xb2

http://www.bestbuy.com/site/jbl-clip-portable-bluetooth-speaker-purple/6050039.p?id=1219696711027&skuId=6050039



noisy vibrational pairing of iot devices

Documents