friends or foes: distributed and randomized algorithms to determine dishonest recommenders in online...

13
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 10, OCTOBER 2014 1695 Friends or Foes: Distributed and Randomized Algorithms to Determine Dishonest Recommenders in Online Social Networks Yongkun Li, Member, IEEE, and John C. S. Lui, Fellow, IEEE Abstract—Viral marketing is becoming important due to the popularity of online social networks (OSNs). Companies may provide incentives (e.g., via free samples of a product) to a small group of users in an OSN, and these users provide recommendations to their friends, which eventually increases the overall sales of a given product. Nevertheless, this also opens a door for malicious behaviors: dishonest users may intentionally give misleading recommendations to their friends so as to distort the normal sales distribution. In this paper, we propose a detection framework to identify dishonest users in the OSNs. In particular, we present a set of fully distributed and randomized algorithms, and also quantify the performance of the algorithms by deriving probability of false positive, probability of false negative, and the distribution of number of detection rounds. Extensive simulations are also carried out to illustrate the impact of misleading recommendations and the effectiveness of our detection algorithms. The methodology we present here will enhance the security level of viral marketing in the OSNs. Index Terms— Dishonest recommenders, misbehavior detection, distributed algorithms, online social networks. I. I NTRODUCTION I N THE past few years, we have witnessed an exponential growth of user population in online social networks (OSNs). Popular OSNs such as Facebook, Twitter and Taobao [1] have attracted millions of active users. Moreover, due to the rapid development of intelligent cell phones and their integration of online social networking services [37], [38], many users have integrated these services into their daily activities, and they often share various forms of information with each other. For example, users share Manuscript received January 15, 2014; revised March 17, 2014, April 24, 2014, and July 23, 2014; accepted July 29, 2014. Date of publication August 7, 2014; date of current version September 12, 2014. The work of Y. Li was supported in part by the National Natural Science Foundation of China under Grant 61303048 and in part by the Fundamental Research Funds for the Central Universities under Grant WK0110000040. The work of J. C. S. Lui was supported by the General Research Fund under Grant 415211. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Athanasios Vasilakos. Y. Li is with the School of Computer Science and Technology, Univer- sity of Science and Technology of China, Hefei 215123, China (e-mail: [email protected]). J. C. S. Lui is with the Department of Computer Science and Engi- neering, The Chinese University of Hong Kong, Hong Kong (e-mail: [email protected]). Digital Object Identifier 10.1109/TIFS.2014.2346020 their opinions on purchased products with their friends, and they may also receive or even seek recommendations from their friends before doing any purchase. Therefore, when one buys a product, she may be able to influence her friends to do further purchases. This type of influence between users in OSNs is called the word-of-mouth effect, and it is also referred as social influence. Due to the large population and the strong social influence in OSNs, companies are also adapting a new way to reach their potential customers. In particular, instead of using the conventional broadcast-oriented advertisement (e.g., through TV or newspaper), companies are now using target-oriented advertisement which takes advantage of the social influence so to attract users in OSNs to do their purchases. This new form of advertisement can be described as follows: firms first attract a small fraction of initial users in OSNs by providing free or discounted samples, then rely on the word-of-mouth effect to finally attract a large amount of buyers. As the word- of-mouth effect spreads quickly in social networks, this form of advertisement is called the viral marketing, which is a proven and effective way to increase the sales and revenue for companies [16], [19], [24], [33]. We like to emphasize that viral marketing in OSNs do exist in the real world. One particular prime example is the Taobao [1], which is one of the major operations under the Alibaba group, and it is the biggest e-commerce website in China. As of June 2013, Taobao has over 500 million registered users and 60 million of regular visitors per day. It also hosts more than 800 million types of products and represents 100 million dollars turnover per year [5], [6]. Users can buy various types of products from Taobao, and they can also run their own shops in selling products. Moreover, Taobao also developed an application addon called Friends Center on top of the website. With this application addon, users in Taobao can follow other users just like the following relationship in Twitter, hence, an OSN is formed on top of this e-commerce system. In this OSN, users can share many types of information with their friends, including the products they purchased, the shops they visited, as well as the usage experiences or opinions on products or shops. In particular, users can also forward their friends’ posts or even give comments. In addition to using the Friends Center 1556-6013 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: john-c-s

Post on 28-Feb-2017

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Friends or Foes: Distributed and Randomized Algorithms to Determine Dishonest Recommenders in Online Social Networks

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 10, OCTOBER 2014 1695

Friends or Foes: Distributed and RandomizedAlgorithms to Determine Dishonest

Recommenders in OnlineSocial Networks

Yongkun Li, Member, IEEE, and John C. S. Lui, Fellow, IEEE

Abstract— Viral marketing is becoming important due to thepopularity of online social networks (OSNs). Companies mayprovide incentives (e.g., via free samples of a product) to asmall group of users in an OSN, and these users providerecommendations to their friends, which eventually increasesthe overall sales of a given product. Nevertheless, this alsoopens a door for malicious behaviors: dishonest users mayintentionally give misleading recommendations to their friendsso as to distort the normal sales distribution. In this paper,we propose a detection framework to identify dishonest users inthe OSNs. In particular, we present a set of fully distributed andrandomized algorithms, and also quantify the performance of thealgorithms by deriving probability of false positive, probabilityof false negative, and the distribution of number of detectionrounds. Extensive simulations are also carried out to illustratethe impact of misleading recommendations and the effectivenessof our detection algorithms. The methodology we present herewill enhance the security level of viral marketing in the OSNs.

Index Terms— Dishonest recommenders, misbehaviordetection, distributed algorithms, online social networks.

I. INTRODUCTION

IN THE past few years, we have witnessed an exponentialgrowth of user population in online social

networks (OSNs). Popular OSNs such as Facebook,Twitter and Taobao [1] have attracted millions of activeusers. Moreover, due to the rapid development of intelligentcell phones and their integration of online social networkingservices [37], [38], many users have integrated these servicesinto their daily activities, and they often share various formsof information with each other. For example, users share

Manuscript received January 15, 2014; revised March 17, 2014, April 24,2014, and July 23, 2014; accepted July 29, 2014. Date of publication August 7,2014; date of current version September 12, 2014. The work of Y. Li wassupported in part by the National Natural Science Foundation of China underGrant 61303048 and in part by the Fundamental Research Funds for theCentral Universities under Grant WK0110000040. The work of J. C. S. Luiwas supported by the General Research Fund under Grant 415211. Theassociate editor coordinating the review of this manuscript and approvingit for publication was Dr. Athanasios Vasilakos.

Y. Li is with the School of Computer Science and Technology, Univer-sity of Science and Technology of China, Hefei 215123, China (e-mail:[email protected]).

J. C. S. Lui is with the Department of Computer Science and Engi-neering, The Chinese University of Hong Kong, Hong Kong (e-mail:[email protected]).

Digital Object Identifier 10.1109/TIFS.2014.2346020

their opinions on purchased products with their friends, andthey may also receive or even seek recommendations fromtheir friends before doing any purchase. Therefore, when onebuys a product, she may be able to influence her friends todo further purchases. This type of influence between usersin OSNs is called the word-of-mouth effect, and it is alsoreferred as social influence.

Due to the large population and the strong social influencein OSNs, companies are also adapting a new way to reachtheir potential customers. In particular, instead of using theconventional broadcast-oriented advertisement (e.g., throughTV or newspaper), companies are now using target-orientedadvertisement which takes advantage of the social influenceso to attract users in OSNs to do their purchases. This newform of advertisement can be described as follows: firms firstattract a small fraction of initial users in OSNs by providingfree or discounted samples, then rely on the word-of-moutheffect to finally attract a large amount of buyers. As the word-of-mouth effect spreads quickly in social networks, this formof advertisement is called the viral marketing, which is aproven and effective way to increase the sales and revenuefor companies [16], [19], [24], [33].

We like to emphasize that viral marketing in OSNs doexist in the real world. One particular prime example isthe Taobao [1], which is one of the major operations underthe Alibaba group, and it is the biggest e-commerce websitein China. As of June 2013, Taobao has over 500 millionregistered users and 60 million of regular visitors per day.It also hosts more than 800 million types of products andrepresents 100 million dollars turnover per year [5], [6]. Userscan buy various types of products from Taobao, and theycan also run their own shops in selling products. Moreover,Taobao also developed an application addon called FriendsCenter on top of the website. With this application addon,users in Taobao can follow other users just like the followingrelationship in Twitter, hence, an OSN is formed on topof this e-commerce system. In this OSN, users can sharemany types of information with their friends, including theproducts they purchased, the shops they visited, as well asthe usage experiences or opinions on products or shops.In particular, users can also forward their friends’ posts oreven give comments. In addition to using the Friends Center

1556-6013 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: Friends or Foes: Distributed and Randomized Algorithms to Determine Dishonest Recommenders in Online Social Networks

1696 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 10, OCTOBER 2014

in Taobao, one can also associate her Taobao account with heraccount in Sina Weibo [2], [3], which is the biggest OSN inChina. According to Weibo’s prospectus, the monthly activeusers of Weibo reached 143.8 million in March 2014, andthe daily active users also reached 66.6 million [7]. By asso-ciating Taobao account with Sina Weibo, users can easilyshare their purchasing experience and ratings on products withtheir friends in Weibo. Based on Taobao and Weibo, manycompanies can easily perform target-oriented advertisement topromote their products. In fact, this type of advertisement canbe easily launched in any OSN.

However, the possibility of doing target-oriented advertise-ment in OSNs also opens a door for malicious activities.Precisely, dishonest users in an OSN may intentionally givemisleading recommendations to their neighbors, e.g., by givinghigh (low) rating on a low-quality (high-quality) product.To take advantage of the word-of-mouth effect, firms mayalso hire some users in an OSN to promote their products.In fact, this type of advertisement becomes very common inTaobao and Weibo. Worse yet, companies may even considerpaying users to badmouth their competitors’ products. Dueto the misleading recommendations given by dishonest users,even if a product is of low quality, people may still be misled topurchase it. Furthermore, products of high quality may lose outsince some potential buyers are diverted to other low-qualityproducts.

As we will show in Section VII-B via simulation, mislead-ing recommendations made by dishonest users indeed havea significant impact on the market share. In particular, asimple strategy of promoting one’s own product while bad-mouthing competitors’ products can greatly enhance the salesof one’s product. Furthermore, even if a product is of lowquality, hiring a small percentage of users to promote it byproviding misleading recommendations can severely shift themarket share on various products. Therefore, it is of bigsignificance to identify dishonest users and remove them fromthe networks so as to maintain the viability of viral marketingin OSNs. With respect to the normal users in OSNs, it isalso of big interest to identify the dishonest users among theirneighbors so as to obtain more accurate recommendations andmake wiser decisions on purchasing products. Motivated bythis, this paper addresses the problem of detecting dishonestrecommenders in OSNs, in particular, how can a normal userdiscover and identify foes from a set of friends during asequence of purchases?

However, it is not an easy task to accurately identifydishonest users in OSNs. First, an OSN usually containsmillions of users, and the friendships among these users arealso very complicated, which can be indicated by the highclustering coefficient of OSNs. Second, users in an OSNinteract with their friends very frequently, which makes itdifficult to identify dishonest users by tracing and analyzingthe behaviors of all users in a centralized way. Last but notthe least, in the scenario of OSNs, honest users may also havemalicious behaviors unintentionally, e.g., they may simply for-ward the received misleading recommendations given by theirdishonest neighbors without awareness. Conversely, dishonestusers may also act as honest ones sometimes so as to confuse

their neighbors and try to evade the detection. Therefore, thedistinction between dishonest users and honest ones in termsof their behaviors becomes obscure, which finally makes thedetection more challenging.

To address the problem of identifying dishonest rec-ommenders in OSNs, this work makes the followingcontributions:• We propose a fully distributed and randomized algorithm

to detect dishonest recommenders in OSNs. In particular,users in an OSN can independently execute the algorithmto distinguish their dishonest neighbors from honest ones.We further exploit the distributed nature of the algorithmby integrating the detection results of neighbors so asto speed up the detection, and also extend the detectionalgorithm to handle network dynamics, in particular, the“user churn” in OSNs.

• We provide theoretical analysis on quantifying the perfor-mance of the detection algorithm, e.g., probability of falsepositive, probability of false negative, and the distributionof time rounds needed to detect dishonest users.

• We carry out extensive simulations to validate the accu-racy of the performance analysis, and further validatethe effectiveness of our detection algorithm using a realdataset.

The outline of this paper is as follows. In Section II,we review related work and illustrate the difference of detect-ing dishonest users in OSNs from that in general recommendersystems. In Section III, we formulate the type of recommen-dations and the behavior of users in OSNs. In Section IV,we present the detection algorithm in detail, and also providetheoretical analysis on the performance of the algorithm.In Section V, we develop a cooperative algorithm to speedup the detection, and in Section VI, we design a scheme todeal with the network dynamics of OSNs. We demonstratethe severe impact of misleading recommendations and validatethe effectiveness of the detection algorithms via simulations inSection VII, and finally conclude the paper in Section VIII.

II. RELATED WORK

A lot of studies focus on the information spreading effectin OSNs, see [21], [29], and results show that OSNs arevery beneficial for information spreading due to their specificnatures such as high clustering coefficient. To take advantageof the easy-spreading nature and the large population ofOSNs, viral marketing which is based on the word-of-moutheffect is becoming popular and has been widely studied, see[16], [19], [24], [33]. In particular, because of the strongsocial influence in OSNs, a small fraction of initial buyerscan even attract a large amount of users to finally purchasethe product [29], [39]. A major portion of viral marketingresearch thinks viral marketing as an information diffusionprocess, and then study the influence maximization problem,see [12], [21]. However, viral marketing in OSNs also opens adoor for malicious behaviors as dishonest recommenders caneasily inject misleading recommendations into the system soto misguide normal users’ purchases.

In the aspect of maintaining system security, some worklike [34] considers to exploit the framework of trust struc-

Page 3: Friends or Foes: Distributed and Randomized Algorithms to Determine Dishonest Recommenders in Online Social Networks

LI AND LUI: FRIENDS OR FOES: DISTRIBUTED AND RANDOMIZED ALGORITHMS 1697

ture [11], [22], [36]. The rough idea is to compute a trustvalue for every pair of nodes in distributed systems. Thisframework is suitable for building delegation systems andreputation systems, while it still faces a lot of challenges toaddress the problem of identifying dishonest users in OSNsstudied in this work. First, OSNs usually contain millionsof users and billions of links, the cost on computing thetrust values for every pair of users must be extremely high.Second, even if the trust values for every pair of users havebeen computed, it still requires a mapping from the trustvalue to a tag indicating whether a user is dishonest or not,which is also a challenging task, especially when users haveno priori information about the number of dishonest users.

With respect to malicious behavior detection, it waswidely studied in wireless networks (see [20], [28], [35]),P2P networks (see [27], [30]), general recommendersystems [8] (see [13], [14], [23]), and online rating systems(see [17], [31], [32]). Unlike previous works, in this paper weaddress the problem of malicious behavior detection in a dif-ferent application scenario, online social networks. In particu-lar, we focus on the identification of dishonest recommendersin OSNs, which is a very different problem and also bringsdifferent challenges even comparing to the two most closelyrelated problems of shill attack detection in recommendersystems and review spam detection in online rating systems.For example, every user in an OSN may give recommendationsto her friends, while this is totally different from the case ofrecommender systems where recommendations are made onlyby the system and are given to users in a centralized way.Besides, users’ ratings or recommendations may propagatethrough the network for OSNs, while there is no propaga-tion of recommendations or ratings in recommender systemsand online rating systems. This type of forwarding behaviormakes normal users in OSNs also have the chance of doingmalicious activities, e.g., forwarding neighbors’ misleadingrecommendations without awareness. Furthermore, an OSNusually has an extremely large number of nodes and linksand also evolves dynamically, so a distributed detection algo-rithm becomes necessary considering the computation cost.However, this increases the difficulty of the detection becausethe detector only has the local information including herown purchasing experience and the recommendations receivedfrom her neighbors, but not the global information of thewhole network as in recommender systems. Lastly, in terms ofthe detection methodology, related work studying review spamdetection usually uses machine leaning techniques, while ourdetection framework is based on suspicious set shrinkage withdistributed iterative algorithms.

III. PROBLEM FORMULATION

In this section, we first present the model of OSNs and givethe formal definitions on different types of recommendations,then we formalize the behaviors of users in OSNs on how toprovide recommendations. In particular, considering that theobjective of dishonest users is to promote their target productsand decrease the chance of being detected, we formalizethe behaviors of dishonest users into a probabilisticstrategy.

A. Modeling on Online Social Networks

We model an OSN as an undirected graph G = (V , E),where V is the set of nodes in the graph and E is the setof undirected edges. Each node i ∈ V represents one userin an OSN, and each link (i, j) ∈ E indicates the friendshipbetween user i and user j , i.e., user i is a neighbor or friendof user j and vice versa. That is, user i and user j can interactwith each other via the link (i, j), e.g., give recommendationsto each other. Usually, OSNs are scale-free [29], [39] and thedegrees of nodes follow power law distribution [9]. Precisely,p(k) ∝ k−γ , where p(k) is the probability of a randomlychosen node in G having degree k and γ is a constant with atypical value 2 < γ < 3. We denote Ni = { j |(i, j) ∈ E} asthe neighboring set of user i and assume that |Ni | = N .

B. Products and Recommendations

We first formalize products, and then give the definitionsof different types of recommendations. We consider a set of“substitutable” products P1, P2, . . ., PM that are produced byfirms F1, F2, . . ., FM , respectively, and these firms competein the same market. Two products are substitutable if theyare compatible, e.g., polo shirts from brand X and brand Yare substitutable goods from the customers’ points of view.We characterize each product Pj with two properties: (1) itssale price and (2) users’ valuations. We assume that eachproduct Pj has a unique price which is denoted as p j . Withrespect to users’ valuations, since different users may havedifferent ratings on a product because of their subjectivity,we denote vi j as the valuation of user i on product Pj .

We categorize a product into two types according to its saleprice and users’ valuations. In particular, if user i thinks thata product Pj is sold at the price that truly reveals its quality,then she considers this product as a trustworthy product. Thatis, product Pj is classified as a trustworthy product by useri only when p j = vi j . Here the equal sign means that theproduct is sold at a fair price from the point of view of user i .Conversely, if user i thinks that the price of product Pj doesnot reveal its quality, or formally, p j �= vi j , then she classifiesit as an untrustworthy product. Similarly, here the inequalitysign just means that user i thinks that Pj is priced unfair,maybe much larger than its value. For example, maybe thisproduct is of low quality or even bogus, but it is producedby speculative and dishonest companies who always seek tomaximize their profit by cheating customers. Formally, we useTi (Pj ) to denote the type of product Pj classified by user i ,and we have

Ti (Pj )={

1, if user i considers Pj to be trustworthy,0, if user i considers Pj to be untrustworthy.

Since products are categorized into two types, we assumethat there are two types of recommendations: positive recom-mendations and negative recommendations, which are denotedby R P (Pj ) and RN (Pj ), respectively.

Definition 1: A positive recommendation on product Pj

(R P(Pj )) always claims that Pj is a trustworthy productregardless of its type, while a negative recommendation on Pj

(RN (Pj )) always claims that Pj is an untrustworthy product

Page 4: Friends or Foes: Distributed and Randomized Algorithms to Determine Dishonest Recommenders in Online Social Networks

1698 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 10, OCTOBER 2014

regardless of its type. Formally, we have

R P (Pj ) � “Pj is a trustworthy product”,

RN (Pj ) � “Pj is an untrustworthy product”.

Note that a recommendation, either RP (Pj ) or RN (Pj ),does not reveal the type of product Pj classified by users, soone may make positive (or negative) recommendations even ifshe takes the product as an untrustworthy (or a trustworthy)product. To have the notion of correctness, we further classifyrecommendations into correct recommendations and wrongrecommendations by integrating users’ valuations.

Definition 2: A recommendation on product Pj is correctfor user i , which is denoted as RC

i (Pj ), only when it revealsthe type of Pj classified by user i , i.e., Ti (Pj ), while a wrongrecommendation on product Pj for user i (RW

i (Pj )) revealsthe opposite type of product Pj classified by user i . Formally,we have

RCi (Pj ) �

{R P (Pj ), if Ti (Pj )=1,RN (Pj ), if Ti (Pj )=0.

RWi (Pj ) �

{R P (Pj ), if Ti (Pj )=0,RN (Pj ), if Ti (Pj )=1.

C. Behaviors of Users in OSNs

In this subsection, we formalize the behaviors of users in anOSN. We assume that for any user, if she buys a product, thenshe can valuate the product based on her usage experience,and then categorizes it into either a trustworthy product or anuntrustworthy product from her point of view.

1) Behaviors of Honest Users: We define honest users as theones who will not intentionally give wrong recommendations.That is, if an honest user buys a product, since she canvaluate the product and determine its type, she always givescorrect recommendations on the product to her neighbors.Precisely, she gives positive recommendations if the productis considered to be trustworthy and negative recommendationsotherwise.

On the other hand, if an honest user did not buy a product,she may also give recommendations to her neighbors bysimply forwarding the received recommendations from otherones. This type of forwarding behavior is quite common inOSNs. For example, in Taobao and Weibo, many users forwardtheir friends’ posts, including their purchasing experiences andratings on products. In a study of online social networks [4], itwas found that 25% of users had forwarded an advertisementto other users in the network. Moreover, due to the anonymityof users’ identities (e.g., users usually use pseudo namesto register their Weibo account), it is extremely difficult totrace the information spreading process in OSNs, and sousers may simply forward any form of information withoutconfirming its truthfulness. In particular, a user may forwarda positive (negative) recommendation given by her neighborswithout validating the quality of the product. Because of thisforwarding behavior, it is possible that honest users may givewrong recommendations to their neighbors. Thus, a user whogives wrong recommendations is not strictly dishonest, butonly potentially dishonest. In other words, if the detector

considers a product to be trustworthy and receives negativerecommendations from a neighbor, she still can not be certainthat this neighbor is dishonest, mainly because it is possiblethat this neighbor does not intend to cheat, but is just misledby her neighbors.

2) Behaviors of Dishonest Users: We define dishonest usersas the ones who may give wrong recommendations intention-ally, e.g., give positive recommendations on an untrustworthyproduct. Note that dishonest users may also behave differentlyas they may aim for promoting different products, e.g., userswho are hired by firm Fi aim for promoting product Pi ,while users who are hired by firm Fj aim for promoting Pj .Without loss of generality, we assume that there are m typesof dishonest users who are hired by firms F1, F2, . . ., Fm ,and they promote products P1, P2, . . ., Pm , respectively.Furthermore, we assume that the products promoted by dis-honest users (i.e., products P1, P2, . . ., Pm ) are untrustworthyfor all users. The intuition is that these products are of lowquality (or even bogus) so that they can be easily identifiedby people. The main reason to make this assumption is thatin this case dishonest users have incentives to promote theseproducts for a larger profit, meanwhile, honest users alsohave incentives to detect such dishonest users so as to avoidpurchasing untrustworthy products. To further illustrate this,note that when users in an OSN are attracted to buy a productpromoted by dishonest users, if the product is a trustworthyone, then there is no difference for these buyers to purchaseother trustworthy products instead of the one promoted bydishonest users, and so honest users have no incentive toidentify the dishonest users who promote trustworthy prod-ucts. In other words, promoting trustworthy products canbe regarded as normal behaviors, so we only focus on thecase where the promoted products are untrustworthy in thispaper. However, we would like to point out that when wemodel the behaviors of dishonest users in the following,we allow dishonest users to behave as honest ones and givecorrect recommendations.

Recall that the goal of dishonest users is to attract as manyusers as possible to purchase the product they promote, onesimple and intuitive strategy to achieve this goal is to givepositive recommendations on the product they promote andnegative recommendations on all other products. On the otherhand, besides attracting as many users as possible to buy theirpromoted product, dishonest users also hope to avoid beingdetected so that they can perform malicious activities for along time. Therefore, dishonest users may also adopt a moreintelligent strategy so to confuse the detector and decrease thechance of being detected. For instance, instead of always bad-mouthing other products by giving negative recommendations,they may probabilistically give correct recommendations andbehave like honest users sometimes. The benefit of this prob-abilistic strategy is to make the detection more difficult sothat dishonest users may hide in a longer time. In this paper,we allow dishonest users to adopt this intelligent strategy anduse Sl

j to denote the one adopted by a type-l (1 ≤ l ≤ m)dishonest user j . Moreover, we allow dishonest users to bemore powerful by assuming that they know honest users’valuation on each product so that they can mislead as many

Page 5: Friends or Foes: Distributed and Randomized Algorithms to Determine Dishonest Recommenders in Online Social Networks

LI AND LUI: FRIENDS OR FOES: DISTRIBUTED AND RANDOMIZED ALGORITHMS 1699

users as possible. The intelligent strategy Slj can be formally

expressed as follows.

Slj � R P (Pl)∧

[∧Mn=1,n �=l

[δRC

j (Pn) ∨ (1−δ)RN (Pn)]]

, (1)

where δ denotes the probability of giving correct recommenda-tions. Recall that the goal of type-l dishonest users is to attractas many users as possible to purchase product Pl , while givingpositive recommendations on other products (say Pn, n �= l)goes against their objective, so we assume that dishonestusers only give correct recommendations on Pn with a smallprobability, i.e., δ is small. In particular, δ = 0 implies thatdishonest users always bad-mouth other products.

Note that there is a possibility that dishonest users donot adopt the probabilistic strategy as in Equation (1), whilechoose to promote trustworthy products over a long time justto create a good reputation, and then behave maliciously bygiving misleading recommendations on a product. However,as long as the dishonest users start performing maliciousactivities, our detection framework still provides us with theopportunity of detecting them as we can keep executing thedetection algorithm continuously. On the other hand, even ifour framework may fail to detect the dishonest users if theyonly perform malicious activities in a very limited number ofrounds, the effect of misleading recommendations given bydishonest users is also very limited, and so the correspondingmiss detection error should be very small.

Another possibility we would like to point out is thatmultiple dishonest users may collude and a single dishonestuser may also create multiple Sybil accounts to consistentlypromote a low-quality product. This type of collaboratedmalicious attack is still detectable under our framework, thisis because our detection framework is fully distributed andwhen the detector determines whether a neighbor is dishonestor not, she only relies on her own valuation on a product andthe recommendation given by this neighbor. Therefore, thepossibility of a dishonest user being detected only dependson the amount of malicious activities she performs, and it isirrelevant to other users’ behaviors. However, for the coop-erative detection algorithm that is developed for speeding upthe detection, dishonest users may evade the detection if theycollude as the detector may determine the type of a neighborby exploiting other neighbors’ detection information, while thepossibility of evading the detection depends on the parameterscontrolled by the detector. Hence, there is a tradeoff betweendetection accuracy and detection efficiency, and we will furtherillustrate this in Section V.

D. Problem

In this paper, we develop distributed algorithms that canbe run at any user in an OSN to identify her dishonestneighbors. Specifically, we first develop a randomized baselinealgorithm which only exploits the information of the detector,see Section IV for details. We also quantify the performanceof the algorithm via theoretical analysis. Then we propose acooperative algorithm which further takes advantage of thedetection results of the detector’s neighbors so as to speed upthe detection, see Section V for details. After that, we further

extend the algorithm to deal with network dynamics of OSNs,i.e., user churn, in Section VI.

IV. BASELINE DETECTION ALGORITHM

In this section, we first illustrate the rough idea of thedetection framework, and then present the detection algorithmin detail. We also quantify various performance measures ofthe algorithm.

A. General Detection Framework

Our detection algorithm is fully distributed, and so userscan independently execute it to identify dishonest users amongtheir neighbors. Without loss of generality, we only focus onone particular user, say user i , and call her the detector. Thatis, we present the algorithm from the perspective of user iand discuss how to detect her dishonest neighbors. For easeof presentation, we simply call a product as a trustworthy(or untrustworthy) product if the detector considers it to betrustworthy (or untrustworthy).

Note that even if users’ subjectivity creates different prefer-ences on different products, we assume that the detector andher neighbors have a consistent valuation on most products.This assumption is reasonable, especially for the cases wherethe quality of products can be easily identified, and its ratio-nality can be further justified as follows. First, users in anOSN prefer to have friends with others who share similarinterests and tastes. Hence users in an OSN are similar totheir neighbors [15] and so they have a consistent valuationwith their neighbors on many products. Secondly, “wisdomof the crowd” is considered to be the basis of online ratingsystems like Amazon and Epinions, and it is also widely usedby people in their daily lives, so it is reasonable to assumethat most products have intrinsic quality so that the detectorand her neighbors will have a consistent rating.

Note that the above assumption allows users who are notfriends with each other to have very different valuations on thesame product, and it also allows the detector and her neighborsto have different valuations on some products. In fact, if anhonest neighbor has a different rating on a product, then fromthe detector’s point of view, it is just equivalent to the casewhere this honest neighbor is misled by dishonest users and sogives a wrong recommendation. Another issue we would liketo point out is that even if the above assumption does not hold,e.g., if the detector and her neighbors have different ratingson all products, our detection framework still provides asignificant step toward identifying dishonest behavior in OSNs’advertisement. This is because if a neighbor has differentvaluations on all products from the detector, then our detectionframework will take this neighbor as “misleading” no mattershe intends to cheat or not. This is acceptable as users alwaysprefer neighbors to have the similar taste (or preference) withthem so that they can purchase a product they really like ifthey take their neighbors’ recommendations.

We model the purchase experience of detector i as a discretetime process. Particularly, we take the duration between twocontinuous purchases made by detector i as one round, andtime proceeds in rounds t = 1, 2, . . .. That is, round t is

Page 6: Friends or Foes: Distributed and Randomized Algorithms to Determine Dishonest Recommenders in Online Social Networks

1700 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 10, OCTOBER 2014

Fig. 1. General detection process via suspicious set shrinkage.

defined as the duration from the time right before the t th

purchase instance to the time right before the (t+1)th purchaseinstance. Based on this definition, detector i purchases onlyone product at each round, while she may receive variousrecommendations on the product from her neighbors, e.g.,some neighbors may give her positive recommendations andothers may give her negative recommendations.

The general idea of our detection framework can be illus-trated as in Figure 1. Initially, detector i is conservative andconsiders all her neighbors as potentially dishonest users.We use Si (t) to denote the set of potentially dishonestneighbors of detector i until round t , which is termed as thesuspicious set, and we have Si (0) = Ni . As time proceeds,detector i differentiates her neighbors based on their behaviorsin each round, and shrinks the suspicious set by removingher trusted neighbors which are classified as honest users.After sufficient number of rounds, one can expect that allhonest neighbors are removed from the suspicious set and onlydishonest neighbors left. Therefore, after t rounds, detectori takes a neighbor as dishonest if and only if this neighborbelongs to the suspicious set Si (t).

B. Operations in One Round

In this subsection, we describe the detailed operations ofshrinking the suspicious set in only one round, say round t .Note that detector i buys a product at round t , which wedenote as Pjt ( jt ∈ {1, 2, . . . , M}), so she can valuate theproduct and determine its type Ti (Pjt ) from her point of view.Moreover, she can further categorize the received recom-mendations (that are either positive or negative) into correctrecommendations and wrong recommendations based on hervaluation on the product, and so she can differentiate herneighbors according to their recommendations. Specifically,we define NC

i (t) as the set of neighbors whose recom-mendations given at round t are classified as correct bydetector i . Accordingly, we denote NW

i (t) and N Ni (t) as the

set of neighbors who give detector i wrong recommendationsand no recommendation at round t , respectively. We haveNi = NC

i (t) ∪NWi (t) ∪N N

i (t).Recall that a product is either trustworthy or untrustworthy,

so detector i faces two cases at round t: (1) the purchasedproduct Pjt is an untrustworthy product, i.e., Ti (Pjt ) = 0, and(2) the purchased product Pjt is a trustworthy product, i.e.,Ti (Pjt ) = 1. In the following, we illustrate on how to shrinkthe suspicious set in the above two cases.

In the first case, a neighbor who gives correct recommenda-tions can not be certainly identified as honest, mainly becausea dishonest neighbor may also give correct recommendations.For example, a type-l (l �= jt ) dishonest user may give negativerecommendations on product Pjt based on the intelligent

Algorithm 1 Randomized Detection Algorithm at Round t forDetector i1: Estimate the type of the purchased product Pjt ;2: Differentiate neighbors by determining NC

i (t), NWi (t), and

N Ni (t);

3: Let D(t)← NWi (t) ∩N N

i (t);4: if Ti (Pjt ) = 1 then5: with probability p: Si (t)← Si (t − 1) ∩D(t);6: with probability 1− p: Si (t)← Si (t − 1);7: else8: Si (t)← Si (t − 1);9: end if

strategy, and this recommendation will be classified as correctas the detector valuates product Pjt as an untrustworthyproduct. Therefore, detector i is not able to differentiate honestneighbors from dishonest ones if Ti (Pjt ) = 0. We adopt aconservative policy by keeping the suspicious set unchanged,i.e., Si (t) = Si (t − 1).

In the second case, since detector i valuates product Pjt as atrustworthy product, Pjt cannot be a product promoted by dis-honest users, and we have Pjt ∈ {Pm+1, ..., PM }. In this case,even if a dishonest user may give correct recommendations onproduct Pjt based on the intelligent strategy, the correspondingprobability δ is considered to be small, and so a dishonestuser should belong to the set N W

i (t) with high probability.Note that it is also possible that dishonest users do not makeany recommendation at round t , so dishonest users can be ineither NW

i (t) or N Ni (t). We use D(t) to denote the union of

the two sets, i.e., D(t) = NWi (t)∪N N

i (t), which denotes theset to which dishonest users belong with high probability atround t . To balance the tradeoff between detection accuracyand detection rate, we employ a randomized policy that onlyshrinks the suspicious set with probability p. Precisely, we letSi (t) = Si (t − 1) ∩ D(t) only with probability p. Here p isa tunable parameter chosen by detector i , and it reflects thedegree of conservatism of the detector. The detailed algorithmwhich is referred as the randomized detection algorithm atround t is stated in Algorithm 1.

To further illustrate the detection process in Algorithm 1,we consider Figure 2 as an example to show the operationsat round t . User i have seven neighbors that are labeled froma to g. Assume that two of them are dishonest (i.e., user aand b). Suppose that neighbors a, b, c and e are still in thesuspicious set before round t , i.e., Si (t − 1) = {a, b, c, e}.We use dashed cycles to denote suspicious users in Figure 2.If user i buys a trustworthy product at round t , and onlyneighbors e and f give her correct recommendations, thenuser i can be certain that neighbor e is honest with a highprobability and it can be removed from the suspicious set.Therefore, according to Algorithm 1, the suspicious set shrinkswith probability p, and if this probabilistic even happens, thenwe have Si (t) = {a, b, c} as shown on the right hand side ofFigure 2.

Note that Algorithm 1 is fully distributed in the sensethat it can be executed by any user to identify her dishonestneighbors. The benefit of the distributed nature is twofold.

Page 7: Friends or Foes: Distributed and Randomized Algorithms to Determine Dishonest Recommenders in Online Social Networks

LI AND LUI: FRIENDS OR FOES: DISTRIBUTED AND RANDOMIZED ALGORITHMS 1701

Fig. 2. An example illustrating Algorithm 1.

First, the size of an OSN is usually very large, e.g., it maycontain millions of nodes and billions of links, so a distributedalgorithm becomes necessary so as to make the computa-tion feasible. Second, an OSN itself is fully distributed, inparticular, a user in an OSN only receives information, e.g.,recommendations on products, from her direct neighbors, andso she only needs to care about the honesty of her neighborsso as to make the received recommendations more accurate.Therefore, a fully distributed detection algorithm is indeednecessary for the application we consider.

In terms of the implementation of Algorithm 1, it canbe deployed as a third-party application just like others thatare deployed in OSNs. In particular, when this applicationhas been deployed in an OSN, each user has a choice toinstall it or not. If a user chooses to install it, then sheneeds to submit some necessary information to the socialnetwork provider continuously, e.g., her ratings on productsand the recommendations that she would like to make, andthe provider will aggregate and store the information for eachuser. Finally, the computation can be done by either the serverof the social network provider or the client computer of eachuser.

C. Performance Evaluation

To characterize the performance of the detection algorithm,we define three performance measures: (1) probability offalse negative which is denoted as Pf n(t), (2) probabilityof false positive which is denoted as Pf p(t), and (3) thenumber of rounds needed to shrink the suspicious set until itonly contains dishonest users, which is denoted by a randomvariable R. Specifically, Pf n(t) characterizes the probabilitythat a dishonest user is wrongly regarded as an honest one aftert rounds, and Pf p(t) characterizes the error that an honestuser is wrongly regarded as a dishonest one after t rounds.Recall that detector i takes a neighbor j ∈ Ni as dishonest ifand only if this neighbor belongs to the suspicious set (i.e.,j ∈ Si (t)), so we define Pf n(t) as the probability that adishonest neighbor of detector i is not in Si (t) after t rounds.Formally, we have

Pf n(t) = # of dishonest neighbors of i that are not in Si (t)

total # of dishonest neighbors of detector i. (2)

On the other hand, since all neighbors of detector i are initiallyincluded in the suspicious set (i.e., Si (0) = Ni ), an honest useris wrongly regarded as a dishonest one only if she still remains

in the suspicious set after t rounds. Thus, we define Pf p(t)as the probability of an honest user not being removed fromthe suspicious set after t rounds. Formally, we have

Pf p(t) = # of honest neighbors of i that are in Si (t)

total # of honest neighbors of detector i. (3)

To derive the above three performance measures forAlgorithm 1, note that the suspicious set shrinks at round tonly when detector i valuates her purchased product as atrustworthy product and this round is further used for detectionwith probability p. We call such a round a detectable roundand use a 0-1 random variable d(t) as an indicator, whered(t) = 1 means that round t is detectable and 0 otherwise.In addition to the indicator d(t), detector i also obtains theset D(t) to which dishonest users may belong at round t .Therefore, we use a tuple (d(t),D(t)) to denote the infor-mation that detector i obtains at round t , and the set of alltuples until round t constitute the detection history, which wedenote as H(t). Formally, we have

H(t) = {(d(1),D(1)), (d(2),D(2)), ..., (d(t),D(t))}.Based on the detection history H(t), the performance

measures of Pf n(t), Pf p(t) and the distribution of R forAlgorithm 1 can be derived as in Theorem 1.

Theorem 1: After running Algorithm 1 for t rounds, prob-ability of false negative and probability of false positive arederived in Equation (4) and Equation (5), respectively.

Pf n(t) = 1− (1− δ)∑t

τ=1 d(τ ) , (4)

Pf p(t) ≈t∏

τ=1,d(τ )=1

|D(τ − 1) ∩D(τ )||D(τ − 1)| , (5)

where D(0) = Ni and D(τ ) is set as D(τ − 1) if d(τ ) = 0.The number of rounds needed for detection until the suspiciousset only contains dishonest users follows the distribution of

P(R=r)

=r∑

d=1

(r − 1

d − 1

)(pd)d(1− pd)

r−d

×[[1−(1− phc)d ]N−k−[1−(1− phc)

d−1]N−k], (6)

where phc is the average probability of an honest usergiving correct recommendations at each round and pd is theprobability of a round being detectable, which are estimatedaccordingly in our technical report.

Proof: Please refer to our technical report [26].Since probability of false positive Pf p(t) is critical to design

the complete detection algorithm (see Section IV-D), we usean example to further illustrate its derivation. Note that a userin an OSN usually has a large number of friends, so we letdetector i have 100 neighbors labeled from 1 to 100. Amongthese 100 neighbors, we assume that the last two are dishonest,whose labels are 99 and 100. Before starting the detectionalgorithm, we initialize D(0) as Ni and let Pf p(0) = 1.

In the first detection round, suppose that user i buys atrustworthy product and further takes this round as detectable.Besides, suppose that only neighbor 1 and neighbor 2 give hercorrect recommendations, i.e., D(1) = {3, 4, . . . , 100}, then

Page 8: Friends or Foes: Distributed and Randomized Algorithms to Determine Dishonest Recommenders in Online Social Networks

1702 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 10, OCTOBER 2014

we have Si (1) = {3, 4, . . . , 100}. Based on Equation (5), theprobability of false positive can be derived as

Pf p(1) = Pf p(0) ∗ |D(0) ∩D(1)||D(0)| = 0.98.

Note that according to the definition in Equation (3), theaccurate value of probability of false positive is 96

98 , which is alittle bit smaller than the result derived by Theorem 1. In fact,Theorem 1 provides a good approximation when the numberof neighbors is large and the number of dishonest users amongthem is small, which is the common case for OSNs as usersoften tend to have a lot of friends and a company can onlycontrol a small number of users to promote its product.

Now let us consider the second detection round. Supposethat the event with probability p does not happen. Thatis, this round is not detectable. So we set D(2) = D(1),and the suspicious set remains the same, i.e., Si (2) =Si (1) = {3, 4, . . . , 100}. The probability of false positive isstill

Pf p(2) = 0.98.

We further examine one more round. Suppose that the thirdround is detectable and neighbor 1 to neighbor 4 give user icorrect recommendations, i.e., D(3) = {5, . . . , 100}. Based onAlgorithm 1, we have Si (3) = Si (2) ∩ D(3) = {5, . . . , 100}.The probability of false positive can be derived as

Pf p(3) = Pf p(2) ∗ |D(2) ∩D(3)||D(2)| = 0.96.

Note that according to the definition in Equation (3), theaccurate value after round t is 94

98 = 0.959.Based on Theorem 1, we see that Pf p(t) → 0, and this

implies that all honest users will be removed from the sus-picious set eventually. However, Pf n(t) does not convergeto zero, which implies that dishonest users may evade thedetection. Fortunately, as long as Pf n(t) is not too large whenPf p(t) converges to zero, one can still effectively identify alldishonest users (as we will show in Section VII) by executingthe detection process multiple times. On the other hand, theexpectation of R quantifies the efficiency of the detectionalgorithm, in particular, it indicates how long a detector needsto identify her dishonest neighbors on average. Note that thedetection algorithm itself does not rely on the derivation ofthis performance measure, and it is just used for studying thedetection efficiency of the algorithm.

D. Complete Detection Algorithm

In Section IV-B, we present a partial detection algorithmwhich describes the operations in a particular round t . In thissubsection, we present the corresponding complete algorithmwhich describes how to shrink the suspicious set until dis-honest users can be identified. To achieve this, we have todetermine the termination condition when repeating the partialalgorithm round by round. Observe that after executing thedetection algorithm for t rounds, only users in the suspiciousset Si (t) are taken as dishonest ones. Intuitively, to avoid a bigdetection error, the detection process can only be terminatedwhen users in Si (t) are really dishonest with high probability.

Algorithm 2 Complete Detection Algorithm1: t ← 0;2: Si (0)← Ni ;3: repeat4: t ← t + 1;5: Derive the suspicious set Si (t) at round t by executing

Algorithm 1;6: Update probability of false positive P f p(t);7: until P f p(t) ≤ P∗f p8: Take users in Si (t) as dishonest and blacklist them;

Based on the definition of probability of false positive Pf p(t),it is sufficient to terminate the algorithm when Pf p(t) islower than a predefined small threshold P∗f p . In other words,as long as probability of false positive is small enough,we can guarantee that all users in the suspicious set arereally dishonest with high probability. Based on the aboveillustration, the complete detection algorithm can be stated asfollows.

V. COOPERATIVE ALGORITHM TO

SPEED UP THE DETECTION

In the last section, we propose a distributed and randomizedalgorithm that only exploits the detector’s local information.By running this algorithm, honest users can detect theirdishonest neighbors simultaneously and independently. Thatis, each user in an OSN maintains her own suspicious setcontaining her potentially dishonest neighbors. Since users inan OSN interact with each other frequently, they can also sharetheir detection results, e.g., their suspicious sets. By doing this,a detector can further exploit her neighbors’ detection historyto speed up her own detection, and we term this scenario ascooperative detection.

We still focus on a particular detector, say user i , and useSi (t) to denote her suspicious set. At round t , user i mayshrink her suspicious set based on her purchasing experienceand her received recommendations, and she may also requestthe detection results of her neighbors. In particular, we assumethat detector i can obtain two sets from each neighborj at round t : the neighboring set and the suspicious setof neighbor j , which we denote as N j and S j (t),respectively.

To exploit neighbors’ detection results, at round t , detector ifirst shrinks her own suspicious set according to Algorithm 1,and we call this step as the independent detection step.After that, detector i further shrinks her suspicious set byexploiting the information received from her neighbors (i.e.,{(N j ,S j (t)), j ∈ Ni }), and we term this step as the coopera-tive detection step. Since detector i may have different degreesof trust on her neighbors, we use wi j (t) (0 ≤ wi j (t) ≤ 1)to denote the weight of trust of user i on neighbor j atround t . That is, user i only exploits the detection resultsof neighbor j with probability wi j (t) at round t . Intuitively,wi j (t) = 1 implies that user i fully trusts neighbor j , whilewi j (t) = 0 means that user i does not trust j at all. Thecooperative detection algorithm for user i at round t is statedin Algorithm 3.

Page 9: Friends or Foes: Distributed and Randomized Algorithms to Determine Dishonest Recommenders in Online Social Networks

LI AND LUI: FRIENDS OR FOES: DISTRIBUTED AND RANDOMIZED ALGORITHMS 1703

Algorithm 3 Cooperative Detection Algorithm at Round t forDetector i1: Derive the suspicious set Si (t) based on local information

(i.e., using Algorithm 1);2: Exchange detection results with neighbors;3: for each neighbor j ∈ Ni do4: with probability wi j (t): Si (t)← Si (t)\(N j\S j (t));5: with probability 1−wi j (t): Si (t)← Si (t);6: end for

Fig. 3. An example illustrating Algorithm 3.

We take Figure 3 as an example to further illustrate theoperations at the cooperative detection step (i.e., Line 3-6 inAlgorithm 3). Since user i first shrinks her suspicious set byusing Algorithm 1, we still use the setting in Figure 2 whereSi (t) shrinks to {a, b, c} after the first step. Now to furtherexploit neighbors’ detection results to shrink Si (t), supposethat only user c is a neighbor of user d , and it has alreadybeen removed from user d’s suspicious set. That is, c ∈ Nd

and c /∈ Sd . If user i fully trusts neighbor d (i.e., wid (t) = 1),then user i can be certain that neighbor c is honest as c isidentified as honest by neighbor d . Thus, user i can furthershrink her suspicious set, and we have Si (t) = {a, b} as shownon the right hand side of Figure 3.

To implement Algorithm 3, we need to set the weights oftrust on different neighbors, i.e., wi j (t). One simple strategyis only trusting the neighbors that are not in the suspiciousset as users in the suspicious set are potentially dishonest.Mathematically, we can express this strategy as follows.

wi j (t) ={

0, if j ∈ Si (t),1, otherwise.

(7)

Note that wi j (t) is a tunable parameter for detector i , andit affects the shrinking rate of the suspicious set of detector i .On the other hand, since detector i may further shrink hersuspicious set by exploiting her neighbors’ detection results,dishonest users may evade the detection if they collude, whilethe possibility also depends on the parameter wi j (t). In fact,there is a tradeoff between detection accuracy and efficiencywhen choosing this parameter. Specifically, larger wi j (t)’simply that detector i is more aggressive to further exploither neighbors’ detection results, and so the detection rateshould be larger, while the risk of dishonest users evadingthe detection also becomes larger.

Again, Algorithm 3 is only a partial algorithm that describesthe operation at round t . To develop the complete version ofthe cooperative detection algorithm, we can still use the idea inSection IV-D to set the termination condition. That is, we keep

running Algorithm 3 until probability of false positive is lessthan a predefined threshold P∗f p. To achieve this, we have toderive the probability of false positive Pf p(t) for Algorithm 3,and the result is stated in Theorem 2.

Theorem 2: After running Algorithm 3 for t rounds, prob-ability of false positive can be derived as follows.

Pf p(t) ≈Pf p(t − 1) |D(t−1)∩D(t)|

|D(t−1)| N − |C(t)|N

,

where Pf p(0) = 1 and C(t) denotes the set of neighborsthat are removed from the suspicious set in the cooperativedetection step at round t .

Proof: Please refer to our technical report [26].

VI. ALGORITHM DEALING WITH USER CHURN

In previous sections, we proposed a randomized detec-tion algorithm and also discussed about how to speedup the detection. These algorithms are designed based onthe assumption that the underlying network is static, i.e.,the friendships between users are fixed and do not changeduring the detection. However, an online social networkusually evolves dynamically, in particular, new users mayjoin in the network and existing users may change theirfriendships or even leave the network by deleting theirprofiles [18], [25], [40]. Taking the the network dynamicsinto consideration, for detector i , new users may become herfriends and existing friends may also disconnect with herat some time. We call these behaviors as user churn. Notethat even if users may leave the network and rejoin it aftersome time, while they may not be able to recover the pastfriendships as establishing links or friendships usually requiresthe confirmation of other users in OSNs. In this section,we extend our detection algorithm to address the problem ofuser churn in OSNs.

We still focus on a particular detector, say user i . At eachround, we first employ previous algorithms, e.g., Algorithm 1or Algorithm 3, to shrink the suspicious set. After that, we dothe following checks: (1) whether there are new users becom-ing the neighbors of detector i , and (2) whether some existingneighbors of detector i disconnect with her. In particular, ifnew neighbors come in, we add them into the neighboringset Ni and the suspicious set Si (t). In other words, we areconservative to take new users as potentially dishonest. Forease of presentation, we use NU(t) to denote the set ofnew users that become the neighbors of detector i at roundt . On the other hand, if some existing neighbors disconnectwith detector i at round t , we simply remove them from boththe neighboring set Ni and the suspicious set Si (t). We useL(t) to denote the set of neighbors that leave detector i atround t , and use LS(t) to denote the set of users that are inthe suspicious set Si (t) and leave detector i at round t , i.e.,LS(t) = Si (t) ∩ L(t). Now we present the detailed detectionalgorithm at round t in Algorithm 4. Note that if Algorithm 3is used to shrink the suspicious set in Algorithm 4, thencooperative detection is used to speed up the detection.

Let us use an example to illustrate the operations inAlgorithm 4 and it is shown in Figure 4. Since the suspicious

Page 10: Friends or Foes: Distributed and Randomized Algorithms to Determine Dishonest Recommenders in Online Social Networks

1704 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 10, OCTOBER 2014

Algorithm 4 Dealing With User Churn at Round t1: Derive the suspicious set Si (t) (by executing Algorithm 1

or Algorithm 3);2: Derive the set NU(t) and L(t);3: Si (t)← (Si (t) ∪NU(t))\L(t);4: Ni ← (Ni ∪NU(t))\L(t);

Fig. 4. An example illustrating Algorithm 4.

set first shrinks by using Algorithm 1 or Algorithm 3, whichhas been illustrated before. Here we only show the step dealingwith user churn (i.e., Line 2-4). Suppose that at round t ,user i disconnects with neighbor b (i.e., L(t) = {b}), andinitiates a connection with a new user that is labeled as h(i.e., NU(t) = {h}), then user i can safely remove b from thesuspicious set as she does not care user b any more, while shehas no priori information about the type of the new user h,so she is conservative and add user h into the suspicious set.Thus, we have Si (t) = {a, h} as shown in Figure 4.

The complete algorithm can also be developed by keepingrunning the detection process until probability of false positiveis smaller than a predefined threshold P∗f p . Thus, we have toderive the probability of false positive Pf p(t) for Algorithm 4,and the result is stated in Theorem 3.

Theorem 3: After running Algorithm 4 for t rounds, prob-ability of false positive can be derived as follows.

Pf p(t)

≈Pf p(t−1) |D(t−1)∩D(t)|

|D(t−1)| N(t−1)−|C(t)|+|NU(t)|−|LS(t)|N(t)

,

where Pf p(0) = 1 and N(t) denotes the number of neighborsafter round t .

Proof: Please refer to our technical report [26].

VII. SIMULATION AND MODEL VALIDATION

Our model aims to detect dishonest users who intentionallygive wrong recommendations in OSNs. Since each user in anOSN performs her own activities continuously, e.g., purchas-ing a product, giving recommendations to her neighbors, andmaking decisions on which product to purchase, the networkevolves dynamically. Therefore, we first synthesize a dynam-ically evolving social network to emulate users’ behaviors,then we show the impact of misleading recommendations andvalidate the analysis of our detection algorithm based on thesynthetic network. We also validate the effectiveness of ourdetection algorithm using a real dataset drawn from an onlinerating network.

A. Synthesizing a Dynamically Evolving OSN

In this subsection, we synthesize a dynamic OSN tosimulate the behaviors of users in the network. To achievethis, we make assumptions on (1) how users make recommen-dations to their neighbors, (2) how users make decisions onpurchasing which product, and (3) how fast the recommenda-tions spread.

First, there are two types of users in the network: honestusers and dishonest users. Dishonest users adopt the intelli-gent strategy to make recommendations. For an honest user,if she buys a product, she gives correct recommendations toher friends based on her valuation on the product. On theother hand, even if an honest user does not buy a prod-uct, she still gives recommendations based on her receivedrecommendations. We adopt the majority rule in this case.That is, if more than half of her neighbors give positive(negative) recommendations to her, then she gives positive(negative) recommendations to others. Otherwise, she doesnot give any recommendation. In the simulation, we let allhonest users have the same valuation on each product, and sowe randomly choose an honest user as the detector in eachsimulation.

Second, to simulate the behaviors of users on decidingto purchase which product, we assume that an honest userbuys the product with the maximum number of effectiverecommendations that is defined as the number of positiverecommendations subtracting the number of negative recom-mendations. The rationale is that one buys a product thatreceives high ratings as many as possible and low ratings asfew as possible.

Last, we assume that the spreading rate of recommendationsis much higher than the purchasing rate. In other words,when one gives a positive (negative) recommendation on aparticular product to her neighbors, her neighbors update theirstates accordingly, i.e., update the number of received positive(negative) recommendations. If the corresponding numberssatisfy the majority rule, then they further make recommen-dations on this product, and this process continues until noone in the system can make a recommendation according tothe majority rule. Moreover, the whole process finishes beforethe next purchase instance made by any user in the network.

To model the evolution of the network, we assume that itstarts from the “uniform” state in which all products havethe same market share. During one detection round, 10%|V |purchase instances happen, where |V | is the total number ofusers in the network, i.e., between two successive purchasesof detector i , 10%|V | purchases are made by other users inthe network. Note that the assumptions we make in this sub-section are only for the simulation purpose, and our detectionalgorithms do not require these assumptions.

B. Impact of Misleading Recommendations

In this subsection, we show the impact of misleadingrecommendations using the synthetic network. We employ theGLP model proposed in [10] that is based on preferentialattachment [9] to generate a scale-free graph with power lawdegree distribution and high clustering coefficient. We generate

Page 11: Friends or Foes: Distributed and Randomized Algorithms to Determine Dishonest Recommenders in Online Social Networks

LI AND LUI: FRIENDS OR FOES: DISTRIBUTED AND RANDOMIZED ALGORITHMS 1705

Fig. 5. Impact of misleading recommendations on the market sharedistribution: dishonest users aim to promote an untrustworthy product P1.

a graph with around 8,000 nodes and 70,000 edges, whoseclustering coefficient is around 0.3. We assume that initiallyno product has been purchased, and consider 10,000 purchaseinstances in the simulation. For each purchase instance, oneuser purchases and she buys the product with the maximumnumber of effective recommendations. After that, she givesa recommendation on the product to her friends. The rec-ommendation will spread throughout the network until noone can make a recommendation according to the majorityrule. We assume that there are five products, P1, . . . , P5,and dishonest users aim to promote product P1 which is anuntrustworthy product, while the rest are trustworthy products.Our objective is to measure the fraction of purchases ofeach product out of the total 10,000 purchases. We run thesimulation multiple times and take the average value.

The simulation results are shown in Figure 5. First, we cansee that if no dishonest user exists in the network to givemisleading recommendations, the untrustworthy product P1 ispurchased with only a small probability. The reason why theprobability is non-zero is that if a user does not receive anyrecommendation, she simply makes a random choice over thefive products to make a purchase. However, if we randomlyset 5% of users as dishonest and let them adopt the intelligentstrategy to promote P1 by setting δ = 0, then even if P1 isan untrustworthy product, it is still purchased with probabilityaround 0.15. In other words, many users in the network aremisled by these dishonest users to purchase P1. In summary,the existence of dishonest users who intentionally give mis-leading recommendations can severely distort the market sharedistribution.

C. Analysis Validation via a Synthetical OSN

In this subsection, we synthesize a dynamically evolvingnetwork based on the description in Section VII-A, and thenvalidate our analysis on the performance of the detectionalgorithms. In the simulation, we randomly select 5% ofusers as dishonest users, and let them adopt the intelligentstrategy. We also randomly choose an honest user who hasdishonest neighbors and take her as the detector. We carry outthe simulation many times and take the average value as thesimulation results.

Let us first focus on the performance measures of Pf n(t)and P f p(t) for Algorithm 1. The theoretic results and

Fig. 6. Probability of false negative and probability of false positive of therandomized detection algorithm (Algorithm 1) where δ = 0.1 and p = 0.8.

Fig. 7. The improvement of probability of false positive for the cooperativealgorithm (Algorithm 3) where δ = 0.1 and p = 0.8.

simulation results are shown in Figure 6. First, we can seethat the theoretic results match well with the simulationresults. Second, one only needs to run the detection algorithmfor a small number of rounds to remove all honest usersfrom the suspicious set, which shows the effectiveness andefficiency of the detection algorithm. However, probability offalse negative is not zero as dishonest users may act as honestones sometimes with the hope of evading the detection. Thisimplies that only a part of dishonest users are detected inone execution of the algorithm. Fortunately, when probabilityof false positive goes to zero, probability of false negativeis still not close to one. Therefore, to detect all dishonestusers, one can run the algorithm multiple times. At each time,a subset of dishonest users are detected and then removed.Eventually, all dishonest users can be identified. For example,in Figure 6, after ten rounds, probability of false positiveis close to zero, and probability of false negative is justaround 0.6, which indicates that at least 40% of dishonestusers can be detected in one execution of the algorithm.

Now we focus on the cooperative detection algorithm,i.e., Algorithm 3. Figure 7 compares the probability of falsepositive for the randomized detection algorithm (Algorithm 1)with its corresponding cooperative version (Algorithm 3).Results show that our theoretic analysis provides a goodapproximation of probability of false positive, which validatesthe effectiveness of the termination condition used in thecomplete detection algorithm. Moreover, comparing the twogroups of curves, we can see that probability of false positiveof the cooperative algorithm is always smaller than that of the

Page 12: Friends or Foes: Distributed and Randomized Algorithms to Determine Dishonest Recommenders in Online Social Networks

1706 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 10, OCTOBER 2014

Fig. 8. Probability of false positive of the algorithm dealing with user churn(Algorithm 4) where δ = 0.1 and p = 0.8.

Fig. 9. Probability mass function of R when the randomized detectionalgorithm is used and δ = 0.1 and p = 0.8.

non-cooperative algorithm, which implies that the cooperativescheme effectively speeds up the detection.

Now we focus on the detection algorithm dealing withuser churn, i.e., Algorithm 4, and the results are shown inFigure 8. In the figure, one group of curves corresponds tothe case where cooperative algorithm is employed, i.e., usingAlgorithm 3 to derive the suspicious set in the first step ofAlgorithm 4, the other group corresponds to the case wherecooperative detection is not used, i.e., using Algorithm 1 toderive the suspicious set in the first step. To simulate userchurn, we add a new neighbor to the detector with probability0.3 in each round. Simulation results show that probability offalse positive goes to zero eventually, which implies that usersin the suspicious set must be dishonest with high probabilityafter sufficient number of rounds. At last, we also observe thespeedup of the detection for the cooperative algorithm.

Let us look at the distribution of number of detection roundsfor the randomized detection algorithm, i.e., Algorithm 1.Results are shown in Figure 9. The horizontal axis is thenumber of rounds needed for the detection, and the verticalaxis is the probability mass function. We can see that evenif the probability mass function is not accurately quantified,the expected number of rounds, E[R], is still well approx-imated. The deviation of the probability mass function canbe explained as follows. First, the probability of an honestuser giving correct recommendations is not a constant at eachround, e.g., as more users purchase a product, the probabilityof giving correct recommendations also increases since moreusers can have their own valuations. Therefore, there must bean approximation error when we use a constant parameter,

say phc, to approximate it. Second, since the performancemeasure is quantified in a probabilistic way, it is requiredto run the simulation many times so as to match with thetheoretic results. However, running the simulation too manytimes takes a lot of time because of the large graph size.To balance the tradeoff, we only run the simulation 1000 times,and the inadequate number of simulation times also contributesto the approximation error. However, since the detectionalgorithm does not require the accurate quantification of thedistribution of R, it is still effective to employ the algorithm toidentify dishonest users even if an approximation error exists.

To further validate the effectiveness of our detectionalgorithm, we also run experiments with a real dataset from asocial rating network where users share their ratings on moviesand also establish friendships with others. Please refer to ourtechnical report [26].

VIII. CONCLUSION

In this paper, we develop a set of fully distributedand randomized detection algorithms based on the idea ofshrinking suspicious set so to identify dishonest users in OSNs.We formalize the behaviors of dishonest users wherein theycan probabilistically bad-mouth other products while givepositive recommendations on the product they aim to promote.Our detection algorithms allow users to independently performthe detection so as to discover their dishonest neighbors.We provide mathematical analysis on quantifying the effec-tiveness and efficiency of the detection algorithms. We alsopropose a cooperative scheme to speed up the detection, aswell as an algorithm to handle network dynamics, i.e., “userchurn” in OSNs. Via simulations, we first show that the marketshare distribution may be severely distorted by misleadingrecommendations given by a small fraction of dishonest users,and then validate the effectiveness and efficiency of ourdetection algorithms. The detection framework in this papercan be viewed as a valuable tool to maintain the viability ofviral marketing in OSNs.

REFERENCES

[1] (Apr. 2014). Taobao Website. [Online]. Available:http://www.taobao.com

[2] (Apr. 2014). Sina Weibo Website. [Online]. Available: http://weibo.com[3] Alibaba Released Weibo for Taobao With Sina. [Online]. Available:

http://www.chinainternetwatch.com/2767/, accessed Apr. 2014.[4] Microsoft Digital Advertising Solutions. (2007). Word of the Web

Guidelines for Advertisers: Understanding Trends and MonetisingSocial Networks [Online]. Available: http://advertising.microsoft.com/uk/wwdocs/user/en-uk/advertise/partner%20properties/piczo/Word%20of%20the%20Web%20Social%20Networking%20Report%20Ad5% .pdf

[5] The Chinese e-Maket Overview. [Online]. Available: http://businessinchinasaos.wordpress.com/2013/06/06/the-chinese-e-maket-overview/, accessed Apr. 2014.

[6] The Unexpected Leaders of Asian E-commerce. [Online]. Available:http://news.alibaba.com/article/detail/news/100922371-1-unexpected-leaders-asian-e-commerce.html, accessed Apr. 2014.

[7] Weibo Trending Topic Attracted Massive User Discussion. [Online].Available: http://www.chinainternetwatch.com/7132/weibo-trending-topic-attracted-massive-user-discussion/, accessed Apr. 2014.

[8] G. Adomavicius and A. Tuzhilin, “Toward the next generation ofrecommender systems: A survey of the state-of-the-art and possi-ble extensions,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 6,pp. 734–749, Jun. 2005.

[9] A.-L. Barabasi and R. Albert, “Emergence of scaling in random net-works,” Science, vol. 286, no. 5439, pp. 509–512, 1999.

Page 13: Friends or Foes: Distributed and Randomized Algorithms to Determine Dishonest Recommenders in Online Social Networks

LI AND LUI: FRIENDS OR FOES: DISTRIBUTED AND RANDOMIZED ALGORITHMS 1707

[10] T. Bu and D. Towsley, “On distinguishing between Internet powerlaw topology generators,” in Proc. IEEE INFOCOM, Jun. 2002,pp. 638–647.

[11] M. Carbone, M. Nielsen, and V. Sassone, “A formal model for trust indynamic networks,” in Proc. 1st Int. Conf. Softw. Eng. Formal Methods,Sep. 2003, pp. 54–61.

[12] W. Chen, C. Wang, and Y. Wang, “Scalable influence maximization forprevalent viral marketing in large-scale social networks,” in Proc. 16thACM SIGKDD Int. Conf. Knowl. Discovery Data Mining (KDD), 2010,pp. 1029–1038.

[13] P.-A. Chirita, W. Nejdl, and C. Zamfir, “Preventing shilling attacks inonline recommender systems,” in Proc. 7th Annu. ACM Int. WorkshopWeb Information Data Manag. (WIDM), 2005, pp. 67–74.

[14] D. Cosley, S. K. Lam, I. Albert, J. A. Konstan, and J. Riedl, “Is seeingbelieving?: How recommender system interfaces affect users’ opinions,”in Proc. CHI, vol. 5. 2003, pp. 585–592.

[15] D. Crandall, D. Cosley, D. Huttenlocher, J. Kleinberg, and S. Suri,“Feedback effects between similarity and social influence in onlinecommunities,” in Proc. 14th ACM SIGKDD Int. Conf. Knowl. DiscoveryData Mining (KDD), 2008, pp. 160–168.

[16] P. Domingos and M. Richardson, “Mining the network value ofcustomers,” in Proc. ACM SIGKDD, New York, NY, USA, 2001,pp. 57–66.

[17] G. Fei, A. Mukherjee, B. Liu, M. Hsu, M. Castellanos, and R. Ghosh,“Exploiting burstiness in reviews for review spammer detection,” inProc. Int. AAAI Conf. Weblogs Soc. Media (ICWSM), 2013, pp. 175–184.

[18] J. Golbeck, “The dynamics of web-based social networks: Membership,relationships, and change,” First Monday, vol. 12, no. 11, pp. 1–10, Nov.2007.

[19] J. Goldenberg, B. Libai, and E. Muller, “Talk of the network: A complexsystems look at the underlying process of word-of-mouth,” MarketingLett., vol. 12, no. 3, pp. 211–223, 2001.

[20] E. Kehdi and B. Li, “Null keys: Limiting malicious attacks via null spaceproperties of network coding,” in Proc. IEEE INFOCOM, Apr. 2009,pp. 1224–1232.

[21] D. Kempe, J. Kleinberg, and E. Tardos, “Maximizing the spread ofinfluence through a social network,” in Proc. ACM SIGKDD, 2003,pp. 137–146.

[22] K. Krukow and M. Nielsen, “Trust structures,” Int. J. Inf. Security, vol. 6,nos. 2–3, pp. 153–181, 2007.

[23] S. K. Lam and J. Riedl, “Shilling recommender systems for funand profit,” in Proc. 13th Int. Conf. World Wide Web (WWW), 2004,pp. 393–402.

[24] J. Leskovec, L. A. Adamic, and B. A. Huberman, “The dynamics of viralmarketing,” in Proc. 7th ACM Conf. Electron. Commerce, New York,NY, USA, 2006, pp. 228–237.

[25] J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins, “Microscopicevolution of social networks,” in Proc. 14th ACM SIGKDD Int. Conf.Knowl. Discovery Data Mining (KDD), 2008, pp. 462–470.

[26] Y. Li and J. C. S. Lui. “Friends or foes: Distributed and randomizedalgorithms to determine dishonest recommenders in online social net-works,” Tech. Rep. [Online]. Available: http://arxiv.org/abs/1407.4945,accessed Jul. 2014.

[27] Y. Li and J. C. S. Lui, “Stochastic analysis of a randomized detectionalgorithm for pollution attack in P2P live streaming systems,” Perform.Eval., vol. 67, no. 11, pp. 1273–1288, 2010.

[28] Y. Li and J. C. S. Lui, “Epidemic attacks in network-coding-enabledwireless mesh Networks: Detection, identification, and evaluation,” IEEETrans. Mobile Comput., vol. 12, no. 11, pp. 2219–2232, Nov. 2013.

[29] Y. Li, B. Q. Zhao, and J. C. Lui, “On modeling product advertisementin large-scale online social networks,” IEEE/ACM Trans. Netw., vol. 20,no. 5, pp. 1412–1425, Oct. 2012.

[30] J. Liang, R. Kumar, Y. Xi, and K. Ross, “Pollution in P2P file sharingsystems,” in Proc. IEEE INFOCOM, Mar. 2005, pp. 1174–1185.

[31] A. Mukherjee et al., “Spotting opinion spammers using behavioralfootprints,” in Proc. ACM SIGKDD, 2013, pp. 632–640.

[32] M. Rahman, B. Carbunar, J. Ballesteros, G. Burri, and D. H. P. Chau,“Turning the tide: Curbing deceptive yelp behaviors,” in Proc. SIAMData Mining Conf. (SDM), 2014.

[33] M. Richardson and P. Domingos, “Mining knowledge-sharing sites forviral marketing,” in Proc. ACM SIGKDD, New York, NY, USA, 2002,pp. 61–70.

[34] M. Spear, J. Lang, X. Lu, N. Matloff, and S. Wu, “Messagereaper: Usingsocial behavior to reduce malicious activity in networks,” Dept. Comput.Sci., Univ. California, Davis, CA, USA, Tech. Rep. CSE-2008-2, 2008.

[35] G. Theodorakopoulos and J. S. Baras, “Malicious users in unstructurednetworks,” in Proc. IEEE INFOCOM, May 2007, pp. 884–891.

[36] S. Weeks, “Understanding trust management systems,” in Proc. IEEESymp. Security Privacy, May 2001, pp. 94–105.

[37] D. Zhang, D. Zhang, H. Xiong, C.-H. Hsu, and A. V. Vasilakos, “BASA:Building mobile Ad-Hoc social networks on top of android,” IEEENetw., vol. 28, no. 1, pp. 4–9, Jan./Feb. 2014.

[38] D. Zhang, D. Zhang, H. Xiong, L. T. Yang, and V. Gauither, “NextCell:Predicting location using social interplay from cell phone traces,” IEEETrans. Comput., Nov. 2013, doi: 10.1109/TC.2013.223.

[39] B. Q. Zhao, Y. K. Li, J. C. Lui, and D.-M. Chiu, “Mathematical modelingof advertisement and influence spread in social networks,” in Proc. ACMNetEcon, 2009.

[40] X. Zhao et al., “Multi-scale dynamics in a massive online socialnetwork,” in Proc. ACM Conf. Internet Meas. Conf. (IMC), 2012,pp. 171–184.

Yongkun Li (M’14) is currently an AssociateResearcher with the School of Computer Scienceand Technology, University of Science and Technol-ogy of China, Hefei, China. He received the B.Eng.degree in computer science from the University ofScience and Technology of China, in 2008, and thePh.D. degree in computer science and engineeringfrom the Chinese University of Hong Kong, HongKong, in 2012. After that, he was a Post-DoctoralFellow with the Institute of Network Coding, Chi-nese University of Hong Kong. His research mainly

focuses on performance evaluation of networking and storage systems.

John C. S. Lui (A’92–M’93–SM’02–F’10) iscurrently a Professor with the Department of Com-puter Science and Engineering, Chinese Universityof Hong Kong, Hong Kong. He received the Ph.D.degree in computer science from the University ofCalifornia at Los Angeles (UCLA), Los Angeles,CA, USA. When he was a Ph.D student at UCLA, heworked as a research intern in the IBM T. J. WatsonResearch Laboratory. After the graduation, he joinedthe IBM Almaden Research Laboratory/San JoseLaboratory, San Jose, CA, USA, where he was

involved in various research and development projects on file systems andparallel I/O architectures. He then joined the Department of Computer Scienceand Engineering at the Chinese University of Hong Kong. He serves asa reviewer and panel member of the National Science Foundation, theCanadian Research Council, and the National Natural Science Foundationof China. He served as the Chairman of the Department of Computer Scienceand Engineering from 2005 to 2011. He serves on the Editorial Board ofIEEE/ACM TRANSACTIONS ON NETWORKING, the IEEE TRANSACTIONS

ON COMPUTERS, the IEEE TRANSACTIONS ON PARALLEL AND DISTRIB-UTED SYSTEMS, Journal of Performance Evaluation, and International Jour-nal of Network Security. He received various departmental teaching awardsand the CUHK Vice Chancellor’s Exemplary Teaching Award. He was also aco-recipient of the IFIP WG 7.3 Performance 2005 and the IEEE/IFIP NOMS2006 Best Student Paper Awards. He is an elected member of the IFIP WG7.3, a fellow of the Association for Computing Machinery, and a CroucherSenior Research Fellow. His current research interests are in communicationnetworks, network/system security, network economics, network sciences,cloud computing, large-scale distributed systems, and performance evaluationtheory.