detecting location fraud in indoor mobile crowdsensing · data at these tags (red numbered dots)....

6
Detecting Location Fraud in Indoor Mobile Crowdsensing Qiang Xu Department of Computing and Soware McMaster University 1280 Main Street West Hamilton, Ontario, Canada L8S 4L8 [email protected] Rong Zheng Department of Computing and Soware McMaster University 1280 Main Street West Hamilton, Ontario, Canada L8S 4L8 [email protected] Ezzeldin Tahoun Department of Computing and Soware McMaster University 1280 Main Street West Hamilton, Ontario, Canada L8S 4L8 [email protected] ABSTRACT Mobile crowdsensing allows a large number of mobile devices to measure phenomena of common interests and form a body of knowl- edge about natural and social environments. In order to get location annotations for indoor mobile crowdsensing, reference tags are usu- ally deployed which are susceptible to tampering and compromises by aackers. In this work, we consider three types of location- related aacks including tag forgery, tag misplacement and tag removal. Dierent detection algorithms are proposed to deal with these aacks. First, we introduce location-dependent ngerprints as supplementary information for beer location identication. A truth discovery algorithm is then proposed to detect falsied data. Moreover, visiting paerns are utilized for the detection of tag mis- placement and removal. Experiments on both crowdsensed and emulated dataset show that the proposed algorithms can detect all three types of aacks with high accuracy. 1 INTRODUCTION As a result of ubiquitous sensor peripherals and increasing com- putation capability of mobile devices, MCS (mobile crowdsensing), a special form of crowdsourcing where communities contribute sensing information and human intelligence using mobile devices, has shown great potential in a variety of environmental, commer- cial and social applications. Examples include measuring pollution levels in a city[5], providing indoor localization [17], detecting pot- holes on a road [12], and many others. Among a variety of MCS applications, most of them require location tagging. Obtaining locations is trivial for outdoor applications since the Global Posi- tioning System (GPS) complimented by cellular and WiFi access map based methods can provide accurate and robust location in- formation in most outdoor environments. However, this is not the case for indoor applications. Despite the fact that people spend majority of their time indoor, indoor positioning systems (IPS) only have limited success due to the lack of pervasive infrastructural support, and the desire to keep user devices as simple as possible. erefore, deploying a set of reference tags seems to be the best choice in order to provide accurate location tagging in indoor MCS applications. As an example, reference tags are instrumental in collecting crowdsensed data for ngerprint-based indoor localization. Gener- ally, ngerprint-based localization works in two stages: training and operational stages. In the training stage, comprehensive site survey is periodically conducted to record the ngerprints at tar- geted locations. In the operational stage, when a user submits a location query with his current ngerprints, a localization server 1 2 3 4 5 6 7 8 9 10 11 Figure 1: An example of using reference tags for indoor mo- bile crowdsensing. e users are required to upload sensing data at these tags (red numbered dots). computes and returns the estimated location. Comprehensive site survey is time-consuming, labor-intensive, and subjects to environ- mental changes. is makes it a perfect candidate application for mobile crowdsensing. Specically, we can deploy a certain number of reference tags at targeted locations and then recruit volunteers or workers to contribute ngerprints at these tags. However, the adoption of reference tag based method is not without cost. It faces the challenges of incurring additional costs in deploying the tags and malicious behaviors from users. Tags such as QR codes, though cost next to nothing, are susceptible to duplication, damage and removal. As a result, the crowd-sensed data can be of low quality. For example, a user can easily copy a QR code and scan it at somewhere else when QR codes are used as tags. In our previous experience on MobiBee, a game for mobile crowdsensing, we observed cheats during the data collection[17]. In machine learning, it is well known that “garbage in garbage out”. Using crowdsensing data with contaminated location tags will necessarily lead to poor trained models. us, it is important to safe guard against fraudulent behaviors and falsied data, especially when incentives are involved. In this work, we consider three types of aacks for location- tagged mobile crowdsensed data: (1) Tag Forgery: An aacker forges one or multiple tags and uses them to annotate the collected data. (2) Tag Misplacement: A tag is moved from its designated place to somewhere else. (3) Tag Removal: A tag is removed illegally. We believe that most of the location-related malicious behaviors can be represented by these three types of aacks. Dierent detec- tion algorithms are proposed to deal with these aacks. First, as supplementary information, location-dependent ngerprints are introduced for beer location identication. A truth discovery al- gorithm is then proposed to detect falsied data. Moreover, visiting arXiv:1708.06308v1 [cs.CR] 21 Aug 2017

Upload: others

Post on 18-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Detecting Location Fraud in Indoor Mobile Crowdsensing · data at these tags (red numbered dots). computes and returns the estimated location. Comprehensive site survey is time-consuming,

Detecting Location Fraud in Indoor Mobile CrowdsensingQiang Xu

Department of Computing andSo�ware

McMaster University1280 Main Street West

Hamilton, Ontario, Canada L8S [email protected]

Rong ZhengDepartment of Computing and

So�wareMcMaster University1280 Main Street West

Hamilton, Ontario, Canada L8S [email protected]

Ezzeldin TahounDepartment of Computing and

So�wareMcMaster University1280 Main Street West

Hamilton, Ontario, Canada L8S [email protected]

ABSTRACTMobile crowdsensing allows a large number of mobile devices tomeasure phenomena of common interests and form a body of knowl-edge about natural and social environments. In order to get locationannotations for indoor mobile crowdsensing, reference tags are usu-ally deployed which are susceptible to tampering and compromisesby a�ackers. In this work, we consider three types of location-related a�acks including tag forgery, tag misplacement and tagremoval. Di�erent detection algorithms are proposed to deal withthese a�acks. First, we introduce location-dependent �ngerprintsas supplementary information for be�er location identi�cation. Atruth discovery algorithm is then proposed to detect falsi�ed data.Moreover, visiting pa�erns are utilized for the detection of tag mis-placement and removal. Experiments on both crowdsensed andemulated dataset show that the proposed algorithms can detect allthree types of a�acks with high accuracy.

1 INTRODUCTIONAs a result of ubiquitous sensor peripherals and increasing com-putation capability of mobile devices, MCS (mobile crowdsensing),a special form of crowdsourcing where communities contributesensing information and human intelligence using mobile devices,has shown great potential in a variety of environmental, commer-cial and social applications. Examples include measuring pollutionlevels in a city[5], providing indoor localization [17], detecting pot-holes on a road [12], and many others. Among a variety of MCSapplications, most of them require location tagging. Obtaininglocations is trivial for outdoor applications since the Global Posi-tioning System (GPS) complimented by cellular and WiFi accessmap based methods can provide accurate and robust location in-formation in most outdoor environments. However, this is not thecase for indoor applications. Despite the fact that people spendmajority of their time indoor, indoor positioning systems (IPS) onlyhave limited success due to the lack of pervasive infrastructuralsupport, and the desire to keep user devices as simple as possible.�erefore, deploying a set of reference tags seems to be the bestchoice in order to provide accurate location tagging in indoor MCSapplications.

As an example, reference tags are instrumental in collectingcrowdsensed data for �ngerprint-based indoor localization. Gener-ally, �ngerprint-based localization works in two stages: trainingand operational stages. In the training stage, comprehensive sitesurvey is periodically conducted to record the �ngerprints at tar-geted locations. In the operational stage, when a user submits alocation query with his current �ngerprints, a localization server

1

2

3

4 5

6

7

89

1011

Figure 1: An example of using reference tags for indoor mo-bile crowdsensing. �e users are required to upload sensingdata at these tags (red numbered dots).

computes and returns the estimated location. Comprehensive sitesurvey is time-consuming, labor-intensive, and subjects to environ-mental changes. �is makes it a perfect candidate application formobile crowdsensing. Speci�cally, we can deploy a certain numberof reference tags at targeted locations and then recruit volunteersor workers to contribute �ngerprints at these tags.

However, the adoption of reference tag based method is notwithout cost. It faces the challenges of incurring additional costsin deploying the tags and malicious behaviors from users. Tagssuch as QR codes, though cost next to nothing, are susceptible toduplication, damage and removal. As a result, the crowd-senseddata can be of low quality. For example, a user can easily copy aQR code and scan it at somewhere else when QR codes are used astags. In our previous experience on MobiBee, a game for mobilecrowdsensing, we observed cheats during the data collection[17].In machine learning, it is well known that “garbage in garbageout”. Using crowdsensing data with contaminated location tags willnecessarily lead to poor trained models. �us, it is important to safeguard against fraudulent behaviors and falsi�ed data, especiallywhen incentives are involved.

In this work, we consider three types of a�acks for location-tagged mobile crowdsensed data:

(1) Tag Forgery: An a�acker forges one or multiple tags anduses them to annotate the collected data.

(2) Tag Misplacement: A tag is moved from its designatedplace to somewhere else.

(3) Tag Removal: A tag is removed illegally.We believe that most of the location-related malicious behaviors

can be represented by these three types of a�acks. Di�erent detec-tion algorithms are proposed to deal with these a�acks. First, assupplementary information, location-dependent �ngerprints areintroduced for be�er location identi�cation. A truth discovery al-gorithm is then proposed to detect falsi�ed data. Moreover, visiting

arX

iv:1

708.

0630

8v1

[cs

.CR

] 2

1 A

ug 2

017

Page 2: Detecting Location Fraud in Indoor Mobile Crowdsensing · data at these tags (red numbered dots). computes and returns the estimated location. Comprehensive site survey is time-consuming,

pa�erns are utilized for the detection of tag misplacement and re-moval. Experiments on both crowdsensed and emulated datasetshow that the proposed algorithms can detect all three types ofa�acks with high accuracy.

�e rest of the paper is organized as follows. First, we providea review of the related work in Section 2. �e three types of at-tacks are elaborated in Section 3. A�ack detection algorithms arepresented in Section 4. Experimental results are given in Section 5.Finally, we conclude the paper with summary and future work.

2 RELATEDWORK�ere has been some work on leveraging mobile crowdsensing forindoor applications. With the booming research on indoor local-ization, many indoor positioning systems try to take advantageof MCS. A comprehensive review can be found in [15]. Gener-ally, these applications fall into two broad categories based on thepurpose: for �ngerprinting and for map construction. In the �rstcategory, schemes such as Zee and LiFS rely on an alternativeodometry based solutions for initial localization or displacementestimates[19, 24]. MobiBee is a crowdsensing solution for location�ngerprint collection that participants are uncontrolled and uti-lize their own devices [17]. Several schemes have been proposedin literatures to construct indoor maps by organically combiningusers’ trajectories through pedestrian dead reckoning [6, 21]. �eseschemes di�er in how the trajectories are combined. In [3], theauthors propose IndoorCrowd2D to reconstruct the indoor interiorview of a building from crowdsensed data.

Despite many work on indoor positioning system, deployingreference tags seems to be the simplest to implement. A variety oftechniques, including Radio Frequency Identi�cation (RFID), NearField Communication (NFC), �ick Response (QR) code, have beenexploited for location reference [2, 10, 13, 22, 23]. �ough each ofthese techniques has its own advantages, none of them are absolutesafe in terms of is nonforgeability. Security threats in NFC andRFID have been studied in [8, 9]. QR codes can be easily copied andspread because they not have any copy protection mechanism. QRcode forgery a�ack has been reported in our previous work[23]. Inorder to provide veri�cations for users’ location claims, locationproof mechanisms have been proposed [7, 20]. However, thesemethods require either additional infrastructures or lots of userswhich will complicate the design of a mobile crowdsensing system.

Truth discovery is a process to integrate multi-source noisyinformation by estimating the reliability of each source. A com-prehensive survey on truth discovery can be found at [11]. Gen-erally, a truth discovery task can be modeled in three ways: itera-tive methods[4], optimization based methods[1], and probabilisticgraphical model based methods[25]. In this work, we formulate thefalsi�ed data detection as a truth discovery problem and propose aprobabilistic graphical model.

3 ATTACKS ON LOCATION TAGS�ere are many security threats that can compromise the data qual-ity while conducting mobile crowdsensing. In this work, we onlytarget those that can a�ect the location tags. �e crowdsensed datawith location tags can be described as 〈rid,uid, lid, sensed data〉,where rid is the record index, uid is the user ID, lid is the location

tag ID, and sensed data is the sensed data. Take the applicationof surveying the temperature of a building as an example, we candeploy a set of reference tags and ask workers to contribute thetemperatures at these tags. Table 1 is a piece of the crowdsenseddata. We categorize these location-related a�acks into three types,i.e., tag forgery, tag misplacement, and tag removal. �e rationalebehind these a�acks will be discussed in the rest of this section.

3.1 Tag Forgery Attack�e �rst and primary type of a�ack is called tag forgery. During theoperation of MCS, an a�acker can forge one or multiple locationtags and use these illegitimate tags to annotate the data. As aresult, the location tags will be compromised. �is type of a�ackwill introduce falsi�ed data and the contributors of these falsi�eddata can be considered as a�ackers. Tag forgery can be motivatedby pro�ts or intentions to sabotage. O�en times it is easier togenerate fraudulent data than legitimate ones as the later requiresa participant to physically travel to target locations.

3.2 Tag Misplacement AttackAnother type of a�ack is tag misplacement. A�er the deploymentof reference tags, some tags may be moved from their designatedplaces to somewhere else. For example, a tag could be moved bythe custodian if it blocks important signages, or a tag might beintentionally moved to another place by malicious users. Similar totag forgery a�ack, tag misplacement will introduce falsi�ed data aswell. However, worse than tag forgery, all subsequent data will befalsi�ed if a tag has been moved. �erefore an additional detectionmechanism is required to enable the administrator to correct themisplaced tags in time.

3.3 Tag Removal AttackIn tag removal, a tag may be removed by accidence or maliciousbehaviors. In this case, there will be no falsi�ed data. However,this type of a�acks will lead to an incomplete dataset. In the afore-mentioned temperature survey example, we might never get thetemperatures at some target locations due to tag removal a�ack.�erefore, a detection method would be helpful to inform the ad-ministrator to redeploy the missing tags.

In summary, both tag forgery and tag misplacement can intro-duce falsi�ed data, and hence a falsi�ed data detection algorithmis needed. As for tag misplacement and tag removal, actions arerequired when the a�acks happen. �e details of the correspondingdetection algorithms will be elaborated in the following section.

4 ATTACK DETECTIONIf the data to be collected contains li�le location dependent infor-mation, it is not possible to detect di�erent forms of a�acks. Tomake our detection algorithms agnostic to individual applications,we take advantage of the location-dependent �ngerprints. We col-lect these �ngerprints in addition to the required data during theoperation of MCS. �ese �ngerprints serve as additional identi�ca-tion of location. Location-dependent �ngerprints have been wellstudied for indoor localization. Classical �ngerprints include WiFireceived signal strength (RSS), magnetic �eld magnitude, illumi-nation level etc. In this work, we use WiFi RSS and manetic �eld

2

Page 3: Detecting Location Fraud in Indoor Mobile Crowdsensing · data at these tags (red numbered dots). computes and returns the estimated location. Comprehensive site survey is time-consuming,

Table 1: A demonstration of the formulated attack detectionproblem

Raw Crowdsensed Data Outputrid lid uid sensed data FP Time-stamp location

validity1 loc1 u1 30◦C f p1 2017-1-09 11:20:20 12 loc2 u2 31◦C f p2 2017-1-09 12:10:20 0...

.

.

....

.

.

....

.

.

....

magnitude. Note that unlike �ngerprinting based localization, wedo not perform site survey to construct �ngerprint maps. �e prob-lem can therefore be described as: given a dataset in the form of〈rid,uid, lid, sensed data, f inдerprint〉, how to detect the threetypes of attacks? Table 1 is a demonstration of this a�ack detec-tion problem. It is not trivial due to the absence of ground-truth�ngerprint and sensing data at each location.

4.1 Detection of falsi�ed data4.1.1 Coarse-grained detection. Under the tag forgery and tag

misplacement a�acks, the required data along with the �ngerprintsare collected at illegitimate locations. Comparing with the referencetag, the �ngerprints are much more di�cult to be forged. �erefore,it is possible to �lter the contributed data by simply checking thevalidity of the associated �ngerprints. Speci�cally, we can checkthe service set identi�cations (SSIDs) in the �ngerprints to detectproblematic scans that do not contain the legitimate SSIDs.

In addition to checking the validity of SSIDs, we can further de-tect the anomalies based on the time stamps of the contributed dataand the topology information of all location tags. �e idea is thata user needs to travel from one location to another to collect data.�erefore, we can estimate the user’s walking speed based on thetime stamp and topology information of location tags. Under thetag forgery a�ack, the a�ackers do not have to “travel” physically.For example, an a�acker may forge all the location tags and collectdata from the same location but tags them with di�erent locations.A speed anomaly might be detected if this is the case. Formally,we use 〈Xu

i ,Xui+1〉 to denote two consecutive uploads contributed

by user u, where Xui = (loc

ui , ts

ui , · · · ), X

ui+1 = (loc

ui+1, ts

ui+1, · · · ),

where locui represents the location tag and tui is the time stamp. We

�ag both Xui and Xu

i+1 as falsi�ed if dist (locui+1,locui )tsui+1−tsui

> ρ where ρis the maximum of human walking/running speed. In our experi-ments, we set ρ = 10m/s .

�ough both SSIDs checking and speed anomaly detection canbene�t tag forgery detection, given the small collection of SSIDsand the ease of changing SSIDs on WiFi APs, such solutions arelikely to work only with unsophisticated perpetrators. �ere is aneed to develop more generic detection algorithms.

4.1.2 Truth-discovery based detection. �ough it is reasonable toassume majority of the contributors are honest, one cannot assumethe majority of the collected data is legitimate. Furthermore, it maybe insu�cient to identify the “bad guy” as the bad guy may stillcontribute “good” data. For instance, a perpetrator scans legitimatetag once before commencing on tag forgery a�acks since she hasto explore the legitimate tags anyway. Under these considerations,we formulate the a�ack detection problem as a truth discovery

𝛽

𝑡

𝑦

𝑥𝑇

𝜃𝐹 𝜃𝑇

𝑥𝐹

𝑧𝑇𝑧𝐹

Figure 2: �e probabilistic graphical model

problem. Truth discovery integratesmulti-source noisy informationby estimating the reliability of each source. Interested readers canrefer to [11] for a comprehensive survey on truth discovery.

We �rst describe the notations used and then give a formalde�nition of the truth discovery problem.

Input. Given a set of locations: K = {k}Kk=1 that we are inter-ested in, and a set of users (the community)U = {u}Uu=1 which pro-vide sensing data along with �ngerprints from all K locations. �eraw dataset is denoted asX = {(x1,u1, z1, ts1), · · · , (xN ,uN , zN , tsN )},where xi is the �ngerprints (the sensing data can be included if itis location dependent), zi ∈ K is the location index, ui ∈ U is theuser index, and tsi is the time stamp.

Output. �e location validity labels for the given dataset aredenoted as t = (t1, · · · , tN ), where

ti =

{1 if the location tag zi is truthful0 if the location tag of zi is falsi�ed.

(1)

Truth Discovery Task. �e truth discovery task is formallyde�ned as estimating µ and t given the raw dataset X.

In order to tackle this problem, we propose a probabilistic graph-ical model, as illustrated in Figure 2. Speci�cally, we use two Gauss-ian Mixture Models (GMM) p(x|θT ) and p(x|θ F ) to model the truth-ful and falsi�ed �ngerprints, respectively. For �ngerprints withtruthful location tags, each component corresponds to a uniquelocation. As for �ngerprints with falsi�ed location tags, we assumethat all these �ngerprints from the same user are collected at thesame illegal location. Hence, each component in the GMM of falsi-�ed data corresponds to a unique user. As such, the total number ofcomponents for p(x|θT ) is K , and the total number of componentsfor p(x|θ F ) isU . �e two models are denoted as

p(x|θT ) =K∑k=1

αTk p(x|θTk ) (2)

p(x|θ F ) =U∑u=1

αFu p(x|θ Fu ) (3)

with mixing weights that∑k α

Tk = 1 and

∑u α

Fu = 1, θTk =

(µTk , ΣTk ), and θ

Fu = (µFu , ΣFu ). Unlike the traditional EM algorithm

for GMM where the component indicator is an unknown hiddenvariable, both zi and ui are known in our case. In our model, thelocation validity variable ti is considered as hidden variable thatneeds to be estimated.

3

Page 4: Detecting Location Fraud in Indoor Mobile Crowdsensing · data at these tags (red numbered dots). computes and returns the estimated location. Comprehensive site survey is time-consuming,

�e associated log-likelihood for the data xi with i ∈ {1, · · · ,N }is given as L(Θ) = ∑

i log∑ti p(xi , ti ;Θ) where Θ = (θ

T ,θ F , β), βrepresents user reliability which will be elaborated later.

For each i , letQi be some distribution over the ti ’s(∑ti Qi (ti ) = 1,

Qi (ti ) ≥ 0), then we have

L(θ ) =∑i

log∑ti

Qi (ti )p(xi , ti ;Θ)Qi (ti )

≥∑i

∑ti

Qi (ti ) logp(xi , ti ;Θ)Qi (ti )

(4)

In such a se�ing, the EM algorithm gives us an e�cient methodfor maximum likelihood estimation. Instead of maximizing L(Θ)directly, we repeatedly construct a lower bound of L(Θ) (E-step)and optimize that lower bound (M-step). We can now write downthe sequence of our EM algorithm for our truth discovery task:

E − step : For each i , set (5)

Qi (ti ) = p(ti |xi ,θT ,θ F , β) =p(xi |ti ,θT ,θ F ) · p(ti |β)

C(6)

M − step : Set (7)

Θ = argmaxΘ

∑i

∑ti

Qi (ti ) logp(xi , ti ;θ )Qi (ti )

(8)

where βu is the reliability of user u and βu = 1 if all the the datafrom user u are completely truthful, p(ti = 1|β) = βui , and C is aconstant. Let us de�ne

J (Q,Θ) =∑i

∑ti

Qi (ti ) logp(xi , ti ;Θ)Qi (ti )

(9)

�en we can maximize J (Q,Θ) with respect to θT , θ F , and β re-spectively,

∂ J (Q, Θ)∂θTk

=∑

i :zi=kQi (ti = 1)

∂∂θTk

p(xi |θTk )

p(xi |θTk )= 0 (10)

∂ J (Q, Θ)∂θ Fu

=∑

i :ui=uQi (ti = 0)

∂∂θ Fu

p(xi |θ Fu )

p(xi |θ Fu )= 0 (11)

∂ J (Q, Θ)∂βu

= 0 (12)

A�er straightforward derivations, we can update θT , θ F and β inM-step as:

αTk =

∑i :zi=k Qi (ti = 1)∑

i Qi (ti = 1) (13)

µTk =

∑i :zi=k Qi (ti = 1)xi∑i :zi=k Qi (ti = 1) (14)

ΣTk =

∑i :zi=k Qi (ti = 1)(xi − µTk )(xi − µ

Tk )>∑

i :zi=k Qi (ti = 1) (15)

α Fu =

∑i :ui=u Qi (ti = 0)∑

i Qi (ti = 0) (16)

µFu =

∑i :ui=u Qi (ti = 0)xi∑i :ui=u Qi (ti = 0) (17)

ΣFu =

∑i :ui=u Qi (ti = 0)(xi − µFu )(xi − µFu )>∑

i :ui=u Qi (ti = 0) (18)

βu =

∑i :ui=u Qi (ti = 1)∑

i :ui=u 1(19)

Hallway1 2

3

(a) A normal size-3 trajectory (b) An abnormal size-3 trajectory

3

Hallway1 2

3

Been moved

Figure 3: An example of normal and abnormal length-3 tra-jectories. In (b), tag 3 has been moved from dotted circle tosolid circle.

With these, we can estimate the hidden variable ti repeatedlyuntil convergence. Similar to standard EM, the convergence ofour algorithm can be proved by showing that our choice of theQi ’s always monotonically improves the log-likelihood. In the �nalfalsi�ed data detection algorithm, the coarse-grained method isperformed in the beginning and the result is used as the initialvalue of t.

4.2 Detecting tag misplacement�e detection of tag misplacement is based on the visiting pa�ern oflocation tags. In indoor MCS, it is likely that a user will contributesensing data in a continuous manner. All sensing data uploadedfrom user u can be sorted in chronological order and represented astraju = 〈(locu1 , ts

u1 ), (loc

u2 , ts

u2 ), · · · 〉. We de�ne a length-m trajec-

tory of user u as trajum,i = 〈(locui , ts

ui ), · · · , (loc

ui+m−1, ts

ui+m−1)〉

where trajum,i is a subsequence of traju and tsj+1 − tsj ≤ TW for

any i ≤ j ≤ i +m − 2, TW is a time di�erence threshold.Given TW , the raw data set X = {X1, · · · ,XN } can be eas-

ily transformed into a set of length-3 trajectories which can beused for tag misplacement detection. We introduce the normaland abnormal length-3 trajectories. A length-3 trajectory traju,3i =

〈(locui , tsui ), (loc

ui+1, ts

ui+1), (loc

ui+2, ts

ui+2)〉 is normal ifdist(locui+2, loc

ui ) ≥

dist(locui+1, locui ), and abnormal otherwise. Figure 3 provides an

example of normal and abnormal length-3 trajectories. Although anoccasional abnormal length-3 trajectory does not necessarily meantag misplacement, frequent abnormal length-3 trajectories arounda location tag usually implies that this tag has been moved. Basedon this observation, a threshold based tag misplacement detectionalgorithm is described as Algorithm 1.

Algorithm 1: tag misplacement detectionInput: Raw dataset: XOutput: A set of misplaced location tagsInitialize each countlock to 0;traj3 = a set of length-3 trajectories generated from X;fortraj3,ui ≡ 〈(locui , tsui ), (locui+1, tsui+1), (locui+2, tsui+2)〉 ∈ traj3do

if traj3i is abnormal thencountloci+1 = countloci+1 + 1;

Sort 〈loc1, · · · , locK 〉 in descending order based on count ;Flag the top-ranking lock s as misplaced;

4

Page 5: Detecting Location Fraud in Indoor Mobile Crowdsensing · data at these tags (red numbered dots). computes and returns the estimated location. Comprehensive site survey is time-consuming,

Table 2: �e details of two datasetsMobiBee Dataset Emulated Dataset

# of location tag 24 50# of user 5 20

# of a�acker 2 5# of record 21722 17487

# of falsi�ed record 8338 3186# of misplaced tags 0 5# of removed tags 0 5

4.3 Detection of tag removalIn real world, a reference tag could be removed on purpose orby accidence. Once a tag has been removed, it will never showup in the rest of data collection except that it might be forgedby some a�ackers. �e detection of tag removal relies on the ob-servation that a tag could have been removed if it does show upbut its adjacent tags show up frequently. Based on this observa-tion, the tag removal a�ack can be detected by comparing thefrequency of a tag with the frequencies of its neighbors. Con-cretely, all location tags can be ranked in ascending order based on

f r eqlockmean({f r eqneiдhbork1

,f r eqneiдhbork2, · · · }) , wheredist(neiдhbor

kj , lock ) ≤

ϕ. �e distance threshold ϕ is set to 5m in our experiment. �enthe top-ranking location tags can be �agged as removed. In prac-tice, this detection method can be easily converted to an onlinealgorithm by se�ing a threshold.

5 EVALUATION AND RESULTS5.1 Two datasets�e experiments are based on both crowdsensed and emulateddataset. �e real world dataset is collected by a treasure hunt gameMobiBee which is designed for location dependent �ngerprint col-lection. QR codes are used as location tags. Various incentivestrategies, including monetary, entertainment and competition, areutilized for be�er user participation. More details can be foundin [17] about the game design. Over the one month operation, 26users have signed up the game but only 14 of them actually con-tribute data. In the experiments, we select 5 users who contributethe vast majority of the data. During the operation of MobiBee, twousers have been found cheating which can be categorized as tagforgery a�ack. However, both tag misplacement and tag removal didnot appear in the data collection of MobiBee. �erefore, anotherdataset is needed to evaluate the performances of proposed tagmisplacement and tag removal detection algorithms.

�e second dataset is constructed by emulation. Concretely,we develop a �ngerprint collection app. Several volunteers arerecruited to travel around inside the target building and uploadsensing data at a set of pre-selected locations. �e trajectoriesof these volunteers are carefully designed to imitate the actualscenario of MCS. 20 users have been emulated and each user hasbeen to every location at least once. 5 users have been designedas a�ackers who perform tag forgery a�ack. 5 location tags aremoved to illegitimate places during the data collection, as the resultof tag misplacement a�ack. Similarly, another 5 location tags areremoved at some points, as the result of tag removal. �e details ofthe two datasets are given in Table 2.

0.00600.00550.00500.00450.00400.00350.00300.0025

0.001

0.000

0.001

0.002

y

c = 1

0.005 0.004 0.003 0.002 0.001

0.0050

0.0055

0.0060

0.0065

0.0070

0.0075

0.0080

0.0085c = 2

0.006 0.005 0.004 0.003 0.002

0.0045

0.0050

0.0055

0.0060

0.0065

0.0070

0.0075

c = 3

0.0030.0040.0050.0060.0070.0080.0090.005

0.004

0.003

0.002

0.001

0.000

0.001

0.002c = 4

0.00800.00750.00700.00650.00600.00550.00500.0045x

0.006

0.005

0.004

0.003

0.002

0.001

y

c = 5

0.0040.0050.0060.0070.0080.0090.010x

0.003

0.002

0.001

0.000

0.001

0.002

c = 6

0.007 0.006 0.005 0.004 0.003x

0.0090

0.0085

0.0080

0.0075

0.0070

0.0065

0.0060

0.0055

c = 7

0.0040.0050.0060.0070.0080.0090.0100.011x

0.0030

0.0025

0.0020

0.0015

0.0010

0.0005

0.0000c = 8

Figure 4: Gaussian kernel density estimation of �ngerprintscollected at di�erent locations

Table 3: �e performance of falsi�ed data detection algo-rithm

MobiBeeDataset

Emulated Dataset(coarse-grained)

Emulated Dataset(Integrated)

Overall Accuracy 88.4% 83.4% 87.2%Precision for Falsi�edData

87.2% 100% 62%

Recall for Falsi�edData

81.7% 8.9% 77.6%

5.2 Performance of falsi�ed data detectionIn section 4.1, we model the �ngerprints collected from di�erentlocations as Gaussian Mixture. �e Gaussianality of �ngerprintscollected from the same location is demonstrated as Figure 4.

Two users have been found cheating in MobiBee. �e falsi�edinstances in MobiBee dataset were �rst detected by SSID checks andare further con�rmed by interviewing the respective perpetrators,and thus the veri�ed data is used as ground truth. �erefore, thecoarse-grained detection methods are not used for MobiBee dataset.As for the emulated dataset, it is a di�erent story since we knowthe ground truth in the �rst place. �erefore, both coarse-graineddetection algorithm and truth discovery based algorithm are appliedon the emulated dataset. �e experiment results show that theproposed algorithm can successfully detect falsi�ed data with highaccuracy. �e detailed results of the experiments on two di�erentdatasets are reported as Table 3. Note that, the recall of the coarse-grained detection algorithm is very low whereas the precision isvery high, which means that although coarse-grained detectionapproach is accurate it is not capable enough to capture all falsi�eddata. A�er integrating the truth discovery algorithm, we can seethat the recall has been signi�cantly improved despite that theprecision is sacri�ced. While detecting falsi�ed data, False Negative(FN) error is more harmful than False Positive error (FP), since it cancontaminate the dataset. In this sense, combining coarse-grainedand truth discovery algorithm is more desirable.

5.3 Estimation of user reliability βA side output of the truth discovery algorithm in Section 4.1.2 isuser reliability β . For MobiBee dataset, there are 5 users in totaland 2 of them were found cheating. A�er the convergence of thealgorithm, the estimated β1−5 are listed in the �rst row of Table 4.

5

Page 6: Detecting Location Fraud in Indoor Mobile Crowdsensing · data at these tags (red numbered dots). computes and returns the estimated location. Comprehensive site survey is time-consuming,

Table 4: �e estimated user reliabilityEstimation

MobiBee DSβ1−5 [0.97129253, 0.32158027, 1., 0.96290124, 0.522175211]

Simulated DSβ1−5 [0.5121315 , 0.67818931, 0.66130878, 0.61648436, 0.62845224]β6−10 [0.33275762, 0.46122064, 0.42303993, 0.46608656, 0.30396718]β11−15 [0.91738818, 0.95683221, 0.93786919, 0.97727463, 1.0]β16−20 [0.97310623, 0.93121491, 0.98167733, 0.99308929, 0.96049801]

�is estimation is consistent to the ground truth that user 2 and5 are a�ackers. While using the emulated dataset, the estimateduser reliabilities are provided in the rest of Table 4. We can easilyobserve that user 1-10 have relatively worse reliability than others.�is accurately re�ects the fact that user 1-5 upload some data frommisplaced tags, and user 6-10 are a�ackers who upload some datausing forged tags.

5.4 Performance of tag misplacement detectionAccording to Table 2, only the emulated dataset contains tag mis-placement a�ack and hence is used to evaluate the tag misplacementdetection performance. Tag {5, 15, 25, 35, 45} were actually movedduring the data collection. Based on Algorithm 1, the top-10 candi-dates of misplaced tags are {5, 15, 25, 45, 29, 12, 33, 35, 20, 14}, whichcapture all 5 misplaced location tags. We will capture 4 out of 5misplaced tags if we �ag top-5 candidates as misplaced.

5.5 Performance of tag removal detectionSimilar to the evaluation of tag misplacement detection, only theemulated dataset is utilized to evaluate the performance of tagremoval detection. As ground truth, tag 3, 10, 20, 30, 40 wereactually removed in the meantime of data collection. Using themethod described in section 4.3, the top-10 candidates of removedtags are {10, 40, 30, 21, 17, 3, 34, 23, 43, 13} which successfully cover4 out 5 removed tags. 3 removed tags will be successfully detectedif top-5 candidates are �agged as removed.

6 SUMMARY AND FUTUREWORKIn this work, we abstracted three types of location-related a�acksin indoor mobile crowdsensing. Additional location dependent�ngerprints are collected as supplementary information for falsi-�ed data detection. Moreover, another two detection methods areproposed to detect tag misplacement and tag removal. In order toevaluate the proposed detection methods, both crowdsensed andemulated datasets are employed. Experiment results have shownthat the proposed a�ack detection algorithms can indeed detecta�acks with high accuracy. As a part of the future work, we wouldlike to explore some more sophisticated detection algorithms fortag misplacement and tag removal and investigate other types ofsecurity threats in mobile crowdsourcing.

REFERENCES[1] B. I. Aydin, Y. S. Yilmaz, Y. Li, Q. Li, J. Gao, and M. Demirbas. Crowdsourcing for

multiple-choice question answering. In AAAI, pages 2946–2953, 2014.[2] M. Bouet and A. L. Dos Santos. R�d tags: Positioning principles and localization

techniques. InWireless Days, 2008. WD’08. 1st IFIP, pages 1–5. IEEE, 2008.

[3] S. Chen, M. Li, K. Ren, X. Fu, and C. Qiao. Rise of the indoor crowd: Reconstruc-tion of building interior view via mobile crowdsourcing. In Proceedings of the13th ACM Conference on Embedded Networked Sensor Systems, pages 59–71. ACM,2015.

[4] X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating con�icting data: therole of source dependence. Proceedings of the VLDB Endowment, 2(1):550–561,2009.

[5] P. Du�a, P. M. Aoki, N. Kumar, A. Mainwaring, C. Myers, W. Wille�, andA. Woodru�. Common sense: participatory urban sensing using a networkof handheld air quality monitors. In Proceedings of the 7th ACM conference onembedded networked sensor systems, pages 349–350. ACM, 2009.

[6] R. Gao, M. Zhao, T. Ye, F. Ye, Y. Wang, K. Bian, T. Wang, and X. Li. Jigsaw:Indoor �oor plan reconstruction via mobile crowdsensing. In Proceedings of the20th annual international conference on Mobile computing and networking, pages249–260. ACM, 2014.

[7] R. Khan, S. Zawoad, M. M. Haque, and R. Hasan. �who, when, andwhere?�location proof assertion for mobile devices. In IFIP Annual Confer-ence on Data and Applications Security and Privacy, pages 146–162. Springer,2014.

[8] E. Kim, Y.-S. Lee, S.-Y. Kim, J.-W. Choi, andM.-S. Jung. A study on the informationprotection modules for secure mobile payments. In IT Convergence and Security(ICITCS), 2013 International Conference on, pages 1–2. IEEE, 2013.

[9] M. Langheinrich. A survey of r�d privacy approaches. Personal and UbiquitousComputing, 13(6):413–421, 2009.

[10] M. Lee and D. Han. Qrloc: User-involved calibration using quick response codesfor wi-� based indoor localization. In Computing and Convergence Technology(ICCCT), 2012 7th International Conference on, pages 1460–1465. IEEE, 2012.

[11] Y. Li, J. Gao, C. Meng, Q. Li, L. Su, B. Zhao, W. Fan, and J. Han. A survey on truthdiscovery. Acm Sigkdd Explorations Newsle�er, 17(2):1–16, 2016.

[12] P. Mohan, V. N. Padmanabhan, and R. Ramjee. Nericell: rich monitoring of roadand tra�c conditions using mobile smartphones. In Proceedings of the 6th ACMconference on Embedded network sensor systems, pages 323–336. ACM, 2008.

[13] L. M. Ni, Y. Liu, Y. C. Lau, and A. P. Patil. Landmarc: indoor location sensingusing active r�d. Wireless networks, 10(6):701–710, 2004.

[14] B. Ozdenizci, K. Ok, V. Coskun, and M. N. Aydin. Development of an indoornavigation system using nfc technology. In Information and Computing (ICIC),2011 Fourth International Conference on, pages 11–14. IEEE, 2011.

[15] L. Pei, M. Zhang, D. Zou, R. Chen, and Y. Chen. A survey of crowd sensingopportunistic signals for indoor localization. Mobile Information Systems, 2016,2016.

[16] G. B. Prince and T. D. Li�le. A two phase hybrid RSS/AoA algorithm for indoordevice localization using visible light. In Global Communications Conference(GLOBECOM), pages 3347–3352. IEEE, 2012.

[17] R. Z. Qiang Xu. MobiBee: A Mobile Treasure Hunt Game for Location-dependentFingerprint Collection. In Proceedings of the 2016 ACM conference on Pervasiveand ubiquitous computing adjunct publication. ACM, 2016.

[18] Qiang Xu, Rong Zheng, Steve Hranilovic. IDyLL: Indoor localization usinginertial and light sensors on smartphones. In Proceedings of the 2015 ACMInternational Joint Conference on Pervasive and Ubiquitous Computing, UbiComp’15, 2015.

[19] A. Rai, K. K. Chintalapudi, V. N. Padmanabhan, and R. Sen. Zee: zero-e�ortcrowdsourcing for indoor localization. In Proceedings of the 18th annual inter-national conference on Mobile computing and networking, pages 293–304. ACM,2012.

[20] N. Sastry, U. Shankar, and D. Wagner. Secure veri�cation of location claims. InProceedings of the 2nd ACM workshop on Wireless security, pages 1–10. ACM,2003.

[21] G. Shen, Z. Chen, P. Zhang, T. Moscibroda, and Y. Zhang. Walkie-markie: Indoorpathway mapping made easy. In Proceedings of the 10th USENIX Conference onNetworked Systems Design and Implementation, nsdi’13, pages 85–98, Berkeley,CA, USA, 2013. USENIX Association.

[22] E. Siira, T. Tuikka, and V. Tormanen. Location-based mobile wiki using nfc taginfrastructure. In Near Field Communication, 2009. NFC’09. First InternationalWorkshop on, pages 56–60. IEEE, 2009.

[23] Q. Xu and R. Zheng. Mobibee: a mobile treasure hunt game for location-dependent �ngerprint collection. In Proceedings of the 2016 ACM InternationalJoint Conference on Pervasive and Ubiquitous Computing: Adjunct, pages 1472–1477. ACM, 2016.

[24] Z. Yang, C. Wu, and Y. Liu. Locating in �ngerprint space: wireless indoorlocalization with li�le human intervention. In Proceedings of the 18th annualinternational conference on Mobile computing and networking, pages 269–280.ACM, 2012.

[25] B. Zhao, B. I. Rubinstein, J. Gemmell, and J. Han. A bayesian approach todiscovering truth from con�icting sources for data integration. Proceedings ofthe VLDB Endowment, 5(6):550–561, 2012.

6