trust-baseduserproﬁlingnimadokoohaki.com/papers/phdthesis.pdf · trust-baseduserproﬁling nima...

Trust-Based User Profiling

NIMA DOKOOHAKI

Doctoral Thesis inInformation and Communication Technology

School of Information and Communication Technologies (ICT)KTH - Royal Institute of Technology

Stockholm, Sweden 2013

TRITA-ICT/ECS AVH 13:10ISSN 1653-6363ISRN KTH/ICT/ECS/AVH-13/10-SEISBN 978-91-7501-651-1

KTH School of Information andCommunication Technology

SE-164 40 KistaSWEDEN

Akademisk avhandling som med tillstånd av Kungl Tekniska högskolan framläggestill offentlig granskning för avläggande av teknologie doktorsexamen i informationoch communication eknologie Fredagen den 8 Mars 2013 klockan 13.00 i C1 salen,Electrum, IT-Universitetet, Kungliga Tekniska Högskolan (KTH), Isafjordsgatan20, Kista.

© Nima Dokoohaki, February 2013

Tryck: Universitetsservice US AB

iii

Abstract

We have introduced the notion of user profiling with trust, as a solution to theproblem of uncertainty and unmanageable exposure of personal data duringaccess, retrieval and consumption by web applications. Our solution sug-gests explicit modeling of trust and embedding trust metrics and mechanismswithin very fabric of user profiles. This has in turn allowed information sys-tems to consume and understand this extra knowledge in order to improveinteraction and collaboration among individuals and system. When formaliz-ing such profiles, another challenge is to realize increasingly important notionof privacy preferences of users. Thus, the profiles are designed in a way toincorporate preferences of users allowing target systems to understand pri-vacy concerns of users during their interaction. A majority of contributionsof this work had impact on profiling and recommendation in digital librariescontext, and was implemented in the framework of EU FP7 Smartmuseumproject. Highlighted results start from modeling of adaptive user profilesincorporating users taste, trust and privacy preferences. This in turn led toproposal of several ontologies for user and content characteristics modeling forimproving indexing and retrieval of user content and profiles across the plat-form. Sparsity and uncertainty of profiles were studied through frameworksof data mining and machine learning of profile data taken from on-line so-cial networks. Results of mining and population of data from social networksalong with profile data increased the accuracy of intelligent suggestions madeby system to improving navigation of users in on-line and off-line museum in-terfaces. We also introduced several trust-based recommendation techniquesand frameworks capable of mining implicit and explicit trust across ratingsnetworks taken from social and opinion web. Resulting recommendation al-gorithms have shown to increase accuracy of profiles, through incorporationof knowledge of items and users and diffusing them along the trust networks.At the same time focusing on automated distributed management of profiles,we showed that coverage of system can be increased effectively, surpassingcomparable state of art techniques. We have clearly shown that trust clearlyelevates accuracy of suggestions predicted by system. To assure overall pri-vacy of such value-laden systems, privacy was given a direct focus when archi-tectures and metrics were proposed and shown that a joint optimal setting foraccuracy and perturbation techniques can maintain accurate output. Finally,focusing on hybrid models of web data and recommendations motivated usto study impact of trust in the context of topic-driven recommendation insocial and opinion media, which in turn helped us to show that leveragingcontent-driven and tie-strength networks can improve systems accuracy forseveral important web computing tasks.

iv

Acknowledgements

First and foremost I start by thanking my supervisor Professor Mihhail Matskin.It has been an honor to be his Ph.D. student. Throughout all these years his guid-ance helped me in research and writing of this thesis. I want to show the depth ofmy gratitude to all his contributions of valuable resources and most importantlypatience and wisdom, to make my doctoral experience at KTH this productive andjoyful.

Besides my advisor, I would like to thank Dr. Vladimir Vlassov for his support ofmy work in his role as secondary advisor. I would also like to thank Prof. RassulAyani for his kind and generous feedbacks for his role as the reviewer of my thesis.

I want to thank my colleagues whom without their help this thesis would have notbeen possible. I start with Smartmuseum scientific project partners, namely Dr.Tuukka Ruotsalo, Dr. Tommi Kauppinen, Dr. Eetu Mäkelä, Dr. Alar Kuusik, Dr.Tannel Tammet and Prof. Eero Hyvönen. I would also like thank museum partnersspecially Brian Restall, Marco Berni, Elena Fani and of course Mr. Silver Toomla.I want to also dedicate my sincere gratitudes to colleagues whom I had the pleasureof meeting and working with, Dr. Ralf Krestel, Dr. Federica Cena, Dr. CihanKelili and Dr. Huseyin Polat. Thank you for your dedications and contributionsto my research.

My many thanks goes to my former colleagues and students specially Dr. Le-andro Navarro, Alireza Zarghami, Soude Fazeli, Stefan Magureanu and RamonaBunea for their enthusiasm, devotion and hard work. My special thanks to ShahabMokarizadeh for his support of time, company and knowledge throughout my stu-dentship here at KTH. I would like to thank my friends at ICT school, speciallyByron Roberto Navas and Kathrin Dannmann for sharing memorable times withme.

Finally, I would like to thank all members of my family for their encouragementand support throughout the duration of my Ph.D studies. My special thanks goesto my dearest cousin Ardavan Ghalebi, for his dedication of time and patience toproof read my thesis.

v

Dedicated to Shahram and Parichehr

For your love, support and encouragement through-out all these years.

Contents

Contents vi

List of Figures ix

List of Tables xii

I Introduction 5

1 Introduction 7

2 State of the Art 15

3 Detailed Contributions 33

4 Discussions and Conclusions 43

II Included Papers 49

5 PAPER (A2): Effective Design of Trust Ontologies for Improve-ment in the Structure of Socio-Semantic Trust Networks 515.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.3 Evaluating Trust Ontologies . . . . . . . . . . . . . . . . . . . . . . . 605.4 Engineering and Construction of Trust Ontology . . . . . . . . . . . 695.5 Trust Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 765.6 Structural Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 795.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.8 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6 PAPER (B1): Personalizing Human Interaction through HybridOntological Profiling: Cultural Heritage Case Study 876.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

vi

CONTENTS vii

6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 916.3 Profile Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926.4 Extending Metadata with Human User Metadata . . . . . . . . . . . 946.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

7 PAPER (B2): Reasoning about Weighted Semantic User Pro-files through Collective Confidence Analysis: A Fuzzy Evaluation 997.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1027.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037.3 Fuzzy Confidence Framework . . . . . . . . . . . . . . . . . . . . . . 1047.4 Smartmuseum Simulation . . . . . . . . . . . . . . . . . . . . . . . . 1077.5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 1107.6 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

8 PAPER (C1): Forging Trust and Privacy with User ModelingFrameworks: An Ontological Analysis 1138.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1168.2 User Modeling on Social Web: State of the Art . . . . . . . . . . . . 1178.3 Understanding Importance of Social User Models in Cross-Systems

Personalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1198.4 Our framework for user modeling in the social web . . . . . . . . . . 1208.5 Privacy model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1218.6 Trust and reputation model . . . . . . . . . . . . . . . . . . . . . . . 1238.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

9 PAPER (C2): An Adaptive Framework for Discovery and Min-ing of User Profiles from Social Web-based Interest Communities1279.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1309.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1319.3 Quest-Driven Social Web Mining Architecture . . . . . . . . . . . . . 1339.4 Evaluating Quest Schematics: A LiveJournal Experiment . . . . . . 1409.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . 149

10 PAPER (D1): Mechanizing Social Trust-Aware Recommenderswith T-index Augmented Trustworthiness 15110.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15410.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15410.3 A Semantic Trust-ware Recommendation Framework . . . . . . . . . 15510.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16110.5 Conclusion and Further Work . . . . . . . . . . . . . . . . . . . . . . 165

11 PAPER (D2): Epidemic Trust-based Recommender Systems 16711.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16911.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

viii CONTENTS

11.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17411.4 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 17811.5 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17911.6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 185

12 PAPER (E): Achieving Optimal Privacy in Trust-Aware Collab-orative Filtering Recommender Systems 18712.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19012.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19112.3 Recommendation Framework . . . . . . . . . . . . . . . . . . . . . . 19312.4 Recommendation Framework Evaluation . . . . . . . . . . . . . . . . 20012.5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 20412.6 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

13 PAPER (F1): Diversifying Product Review Rankings: Gettingthe Full Picture 20713.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20913.2 Overview of the Field . . . . . . . . . . . . . . . . . . . . . . . . . . 21113.3 How to Rank Reviews? . . . . . . . . . . . . . . . . . . . . . . . . . . 21413.4 Modeling Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21613.5 Ranking Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21913.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22113.7 Conclusions & Future Work . . . . . . . . . . . . . . . . . . . . . . . 227

14 PAPER (F2): Mining Divergent Opinion Trust Networks throughLatent Dirichlet Allocation 22914.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23214.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23314.3 Mining Topic Facts and Opinions from Social Media . . . . . . . . . 23514.4 Modeling Tweets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23714.5 Experiment: Mining Networks of Eurozone Trending News Corpora 24214.6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 246

IIIReferences 247

Bibliography 249

List of Figures

2.1 Semantic projects diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 192.2 Smartmuseum recommender system . . . . . . . . . . . . . . . . . . . . 222.3 Privacy in personalization diagram . . . . . . . . . . . . . . . . . . . . . 27

5.1 Trust ontology components . . . . . . . . . . . . . . . . . . . . . . . . . 725.2 Hybrid network depicted . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.3 Hybrid network social network structure . . . . . . . . . . . . . . . . . . 785.4 Meshed network depicted . . . . . . . . . . . . . . . . . . . . . . . . . . 795.5 Meshed network social network structure . . . . . . . . . . . . . . . . . 805.6 Golbeck’s network evolution . . . . . . . . . . . . . . . . . . . . . . . . . 855.7 Konfidi’s network evolution . . . . . . . . . . . . . . . . . . . . . . . . . 855.8 Our Ontology’s network evolution . . . . . . . . . . . . . . . . . . . . . 855.9 Cluster visualization of meshed trust network of Golbeck. . . . . . . . . 865.10 Cluster visualization of meshed trust network of our trust ontology. . . . 865.11 Cluster visualization of meshed trust network of Konfidi. . . . . . . . . . 86

6.1 Structure of User Profiles (visualized). . . . . . . . . . . . . . . . . . . . 936.2 Attribute Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 956.3 Extended CH Metadata with Human Keywords (visualized). . . . . . . 96

7.1 Linear presentation of crisp trust values . . . . . . . . . . . . . . . . . . 1087.2 Stacked linear presentation of confidence values. . . . . . . . . . . . . . 111

8.1 The privacy ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1238.2 Trust and reputation ontologies . . . . . . . . . . . . . . . . . . . . . . . 125

9.1 Quest component architecture and knowledge flow . . . . . . . . . . . . 1349.2 Primary taxonomy tree constructed from cultural heritage domain under

study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1359.3 Visualized schematics for query formulation. . . . . . . . . . . . . . . . . 1369.4 Sample centroid formation for interest topics with respect to schematics

used. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1399.5 Stacked plot of clustering errors. . . . . . . . . . . . . . . . . . . . . . . 144

ix

x List of Figures

9.6 Stacked plot of classification errors. . . . . . . . . . . . . . . . . . . . . . 1459.7 Boundary visualization of generated profiles. . . . . . . . . . . . . . . . 1469.8 Empirical comparison of the frequency and relevancy of centroids spread

of topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

10.1 Ontological Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 15610.2 A scenario of utilizing TopTrustee List . . . . . . . . . . . . . . . . . . 15810.3 User Ontology Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15910.4 The Top-10 trustworthy users Indegree . . . . . . . . . . . . . . . . . . . 16210.5 Generated Trust Networks for Top-10 Trustworthy Users (n= 5, m= 5):

Without T-index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16310.6 Generated Trust Networks for Top-10 Trustworthy Users (n= 5, m= 5):

With T-index= 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16310.7 Alignment of Trust Networks for Top-10 Trustworthy Users (n= 5, m=

5) : Inferred Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16410.8 Generated Trust Networks for Top-10 Trustworthy Users (n= 5, m= 5):

With T-index= 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16510.9 Comparing the results based on different T-index values: Coverage . . . 16510.10Comparing the results based on different T-index values: MAE . . . . . 166

11.1 Trust distribution of Yahoo! dataset. . . . . . . . . . . . . . . . . . . . . 17711.2 Evolution of MAE over neighborhood size for Yahoo! Webscope and

Epinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18011.3 Evolution of MAE over T-Man rounds for different trust metrics on

Epinions dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18111.4 Influence of search range on item coverage and prediction accuracy for

Epinions dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18211.5 Influence of distance metric on coverage as network converges in the case

of Epinions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

12.1 Architecture of a Private Trust-Aware Recommender System. . . . . . . 19412.2 MAE of recommendation framework, without adding any perturbations 20112.3 Effects of adding perturbations on MAE . . . . . . . . . . . . . . . . . . 20212.4 Filling unrated items with random data having Gaussian distribution

with respect to f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

13.1 Overview of the Review Ranking System: Reviews together with ratingsare used to extract topic distributions using LDA. Rankings are com-puted minimizing KL-Divergence with task-specific target distributions. 215

13.2 Preprocessed Review Snippet: Original on Top; Segmented and POS-tagged on Bottom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

13.3 Plate Notation for Latent Dirichlet Allocation . . . . . . . . . . . . . . . 21713.4 Example of the Greedy Algorithm to find a Ranking Summarizing the

Three Reviews A,B, and C . . . . . . . . . . . . . . . . . . . . . . . . . 220

List of Figures xi

13.5 Average number of positive (1.0) and negative (−1.0) mentions of aspectsgrouped by given rating . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

13.6 Summary Strategy: Comparing Recency with LDA and LM (α = 0.99):”America West Airlines” . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

13.7 Summary Strategy: Comparing Recency with LDA and LM (α = 0.99):“Pokemon Snap for Nintendo 64” . . . . . . . . . . . . . . . . . . . . . . 224

13.8 Summary Strategy: Comparing Recency with LDA and LM (α = 0.99):“Microsoft Windows ME” . . . . . . . . . . . . . . . . . . . . . . . . . . 225

13.9 Summary Strategy: Comparing Recency with LDA and LM (α = 0.99):“Starbucks” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

13.10Sentiment Strategy: Comparing Recency with LM+LDA focusing onlyon positive or negative aspects respectively (α = 0.99): ”America WestAirlines” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

13.11Sentiment Strategy: Comparing Recency with LM+LDA focusing onlyon positive or negative aspects respectively (α = 0.99): “Pokemon Snapfor Nintendo 64” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

13.12Sentiment Strategy: Comparing Recency with LM+LDA focusing onlyon positive or negative aspects respectively (α = 0.99): “Microsoft Win-dows ME” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

13.13Sentiment Strategy: Comparing Recency with LM+LDA focusing onlyon positive or negative aspects respectively (α = 0.99): “Starbucks” . . 227

14.1 Overall Framework for Opinion Trust Modeling and Mining. . . . . . . . 23614.2 Preprocessed Tweet lines using TweetNLP . . . . . . . . . . . . . . . . . 23714.3 Graphical Presentation of Latent Dirichlet Allocation . . . . . . . . . . 23914.4 Average divergence of trend distributions measured against various den-

sities of user clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24314.5 visualization of evolution of learned trust network. . . . . . . . . . . . . 24414.6 Node level analytics on trust graph. . . . . . . . . . . . . . . . . . . . . 24514.7 Network level analytics on trust graph. . . . . . . . . . . . . . . . . . . . 246

List of Tables

2.1 Comparison among trust ontologies based on ontology component struc-ture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1 Correlating research questions and contributions . . . . . . . . . . . . . 45

5.1 Comparison among trust ontologies based on ontology component struc-ture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.2 Social Network Analysis on Small Graphs . . . . . . . . . . . . . . . . . 815.3 Social Network Analysis on Small Graphs . . . . . . . . . . . . . . . . . 815.4 Social Network Analysis on Large Graphs . . . . . . . . . . . . . . . . . 815.5 Social Network Analysis on Large Graphs . . . . . . . . . . . . . . . . . 81

13.1 Top terms composing the latent topics “ticket” and “waiting” for Amer-ica West Airlines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

13.2 Distribution of the Ratings for the Test Products . . . . . . . . . . . . . 22213.3 Sample Annotation Form for “America West Airlines” Reviews . . . . . 223

14.1 Sample top 5 words in topics with proportions for tweets presentingEurozone trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

xii

List of Tables 1

Lists of Publications

Publications Included in This ThesisPaper(A1) N. Dokoohaki and M. Matskin, Structural Determination of Ontology-Driven

Trust Networks in Semantic Social Institutions and Ecosystems, InternationalConference on Mobile Ubiquitous Computing, Systems, Services and Tech-nologies (UBICOMM ’07), IEEE Computer Society, pp. 263-268, Nov. 2007.

Paper(A2) N. Dokoohaki and M. Matskin, Effective Design of Trust Ontologies for Im-provement in the Structure of Socio-Semantic Trust Networks, InternationalJournal On Advances in Intelligent Systems, vol. 1, no. 1942 - 2679, pp.23-42, 2008.

Paper(B1) N. Dokoohaki and M. Matskin, Personalizing Human Interaction through Hy-brid Ontological Profiling: Cultural Heritage Case Study, 1st InternationalWorkshop on Semantic Web Applications and Human Aspects (SWAHA), Col-located with 3rd Asian Semantic Web Conference 2008 (ASWC ’08), 2008, pp.133-140.

Paper(B2) N. Dokoohaki and M. Matskin, Reasoning about Weighted Semantic UserProfiles through Collective Confidence Analysis: A Fuzzy Evaluation, AtlanticWeb Intelligence Conference (AWIC ’10), in Advances in Intelligent WebMastering 2, vol. 67, no. 5, V. Snášel, P. S. Szczepaniak, A. Abraham, andJ. Kacprzyk, Eds. Springer Berlin Heidelberg, 2010, pp. 71-81.

Paper(C1) F. Cena, N. Dokoohaki, and M. Matskin, Forging Trust and Privacy with UserModeling Frameworks: An Ontological Analysis, First International Confer-ence on Social Eco-Informatics (SOTICS ’2011), 2011, pp. 43-48.

Paper(C2) N. Dokoohaki and M. Matskin, Quest: An Adaptive Framework for User Pro-file Acquisition from Social Communities of Interest, 2nd IEEE InternationalConference on Advances in Social Network Analysis and Mining (ASONAM’10), vol. 0, pp. 360-364, 2010.

Paper(C2) N. Dokoohaki and M. Matskin, An Adaptive Framework for Discovery andMining of User Profiles from Social Web-based Interest Communities, A Chap-ter in The Influence of Technology on Social Network Analysis and MiningBook, T. Özyer, Ed. Springer Verlag, 2012.

Paper(D1) S. Fazeli, A. Zarghami, N. Dokoohaki, and M. Matskin, Mechanizing SocialTrust-Aware Recommenders with T-Index Augmented Trustworthiness, the7th international conference on Trust, privacy and security in digital business(TrustBus ’10), vol. 6264, M. S. Sokratis Katsikas, Javier López, Ed. SpringerBerlin / Heidelberg, 2010, pp. 202-213-213.

2 List of Tables

Paper(D2) S. Magureanu, N. Dokoohaki, S. Mokarizadeh, and M. Matskin, EpidemicTrust-based Recommender Systems, IEEE international conference on SocialComputing 2012 (SocialCom ’12), 2012.

Paper(E) N. Dokoohaki, C. Kaleli, H. Polat, and M. Matskin, Achieving Optimal Pri-vacy in Trust-Aware Collaborative Filtering Recommender Systems, 2nd In-ternational Conference on Social Informatics (SocInfo ’10), LNCS 6430, pp.62-79, Springer, Heidelberg, 2010.

Paper(F1) R. Krestel and N. Dokoohaki, Ranking Product Reviews, Regular Issue ofACM Transactions on Intelligent Systems (TIST), Sep. 2012 (Submitted forReview).

Paper(F2) N. Dokoohaki and M. Matskin, Mining Divergent Opinion Trust Networksthrough Latent Dirichlet Allocation, International Symposium on Founda-tions of Open Source Intelligence and Security Informatics (FOSINT-SI2012),2012 IEEE/ACM International Conference on Social Network Analysis andMining (ASONAM ’12), IEEE Computer Society. August 2012.

Other Publications By Author

1. N. Dokoohaki, "Deliverable D2.1 - Report of User Profile Formal Represen-tation and Metadata Keyword Extension", EU FP7 Smartmuseum project,2008.

2. N. Dokoohaki, T. Ruotsalo, T. Kauppinen, and E. Mäkelä, "Deliverable 2.2 -Report describing methods for dynamic user profile creation", EU FP7 Smart-museum project, 2009.

3. A. Zarghami, S. Fazeli, N. Dokoohaki, and M. Matskin, Social Trust-AwareRecommendation System: A T-Index Approach, IEEE/WIC/ACM Interna-tional Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT ’09), 2009, vol. 3, pp. 85-90.

4. S. Fazeli, A. Zarghami, N. Dokoohaki, and M. Matskin, Elevating PredictionAccuracy in Trust-aware Collaborative Filtering Recommenders through T-index Metric and TopTrustee lists, the Journal of Emerging Technologies inWeb Intelligence (JETWI), Special Issue On Web Personalization, Reputationand Recommender Systems, vol. 2, no. 4, 2010.

5. S. Mokarizadeh, N. Dokoohaki, M. Matskin, and P. Kungas, Trust and Pri-vacy Assisted Service Composition Using Social Experience, 10th IFIP Inter-national Conference on e-business,e-services and e-society (2010), SpringerHeidelberg, 2010.

List of Tables 3

6. S. Magureanu, N. Dokoohaki, S. Mokarizadeh, and M. Matskin, Designa andAnalysis of A Gossip-based Decentralized Trust Recommender System,In Pro-ceedings of Workshop on Recommenders on Social Web (RSWEB ’12), collo-cated with ACM Recommender Systems 2012 (RecSys ’12), 2012.

7. R. Krestel and N. Dokoohaki, Diversifying Product Review Rankings: Gettingthe Full Picture, 2011 IEEE/WIC/ACM International Conferences on WebIntelligence and Intelligent Agent Technology (WI-IAT ’11), IEEE ComputerSociety, pp. 138-145, Aug. 2011.

8. R. Bunea, S. Mokarizadeh, N. Dokoohaki and M. Matskin, Exploiting Trustfor Privacy Inference in a Collaborative Filtering Recommender Framework,In PinSoDa: Privacy in Social Data, in conjunction with the 11th IEEEInternational Conference on Data Mining (ICDM 2012), IEEE ComputerSociety, December 10, 2012, Brussels Belgium.

9. S. Mokarizadeh, N. Dokoohaki, R. Bunea and M. Matskin, Enabling SocialFactorization with Privacy, In Annual ACM Symposium on Applied Ccom-puting (SAC 2013), ACM, March 2013, Coimbra, Portugal.

Part I

Introduction

5

Chapter 1

Introduction

In a networked world, trust isthe most important currency.

Eric Schmidt, University ofPennsylvania Commencement

Address, 2009

Personalization and recommendation are the most popular intelligent techniquesused over the web today. Introduced by research over a decade ago, and widelyadopted by web enterprises today, personalization aims at exploiting the differencesamong users it allows data to adapt in both quantity and quality to the individual,based on interactions with the web. To implement personalization, the notion ofproduct suggestion was born and coined as Recommendation Systems. Such sys-tems use a range of algorithms which return a collection of items to users, basedon a derived knowledge of their tastes or from previous interactions. This gatheredknowledge constitutes models of user which are collectively constructs a user model,referred to as user profiles. The process of gathering and enriching this knowledgeis referred to as user profiling. Personalized interaction and system-derived rec-ommendations have been so widely adopted that these techniques have effectivelyaltered the way we receive and perceive consume.

Taking into account dynamic nature of these technologies, two main concerns havebeen raised. The first deals with privacy nature of existing implementations of per-sonalization across the web, due to lack of transparency on what sort of informationis actually gathered about users and how users are profiled. The second concernis the filtering of personalization and recommendation. Filtered information mayshield users from consuming data that does not correspond with what the systemshas calculated as relevant. This restricts the user to their own cultural or ideolog-ical filter bubble [281].One explanation for a lack of transparency is due to largeamount of uncertainty in profiles and profiled elements. The reason could be the

7

8 CHAPTER 1. INTRODUCTION

constraints to the quality of solutions based on profiles. Another explanation foris the possibility that the data used in user profiling is outdated and is based oninformation gathered by the system in the past. People change as do their tastesand preferences. Despite the awareness that such absence of transparency detractsfrom the quality of system recommendations, little effort has been placed in reme-diation. To address this problem, uncertainty should be measured and be processedin the profiles.

Trust is a fundamental notion affecting daily human encounters. People rely ontheir perceptions of trust for being able to thrive and survive in human societies.With the extended usage of web technologies in daily life, it is reasonable that soft-ware designers seek ways to accommodate human trust in their systems. Capturingand presenting trust as a computational concept has several benefits. Trust can beused to measure and improve the reliability of user profiles and user profiling tech-nologies. Presenting trust can also help in dealing with uncertainty in user profilingand increase the transparency of overall system.

User profiles and profiling technologies have become the most important commod-ity for social enterprise. All major stakeholders on the net now offer users theability to create, maintain and manage their personal data, activities and contenton-line via their respective profiles. As user profiling on the web is relied uponso heavily, integrating trust within the user model becomes an intrinsic challenge.While the research community has invested immense effort into defining trust, itsapplication into the web has been less successful. This has been due to the factthat database and information retrieval communities have been slower in realizingthe value and the impact of trust in their applications. We must understand whatis the correct model and implementation of trust at the profile-level, and introducethe concept of trust-aware user profiling. Although there has been much invest-ment on analyzing, capturing and managing trust in web applications, there existsubstantial challenges that hinder effective adoption and utilization of trust-basedmethodologies and technologies. In this work we present state of the art concepts,technologies and methodologies proposed by author on modeling, capturing andenhancing web profiles for trust-based computation and utilization.

The thesis is organized into several parts, starting by introductory part whichpresents the main theme of the research, followed by challenges motivating theresearch, and research questions pursued. Then we consolidate methodologies andresults obtained. We continue by giving a background part with respect to sub-topics of various proposed contributions. It includes a state of the art in trust onthe web, user profiling on the web, in trust and recommender systems, in privacyin recommender systems and in hybrid recommender systems. Each backgroundsection results are concluded by research gaps identified which justifies the aimfor works. The next part explains in detail the contributions of the work followedby conclusions and future work. The manuscripts of the published work and the

9

detailed content of the dissertation are presented in the final part of the work.

Challenges

Difficulty of positioning Trust on the Web of Profiles,Personalization and RecommendationImplementing trust research into user profiles and personalization is a distinct chal-lenge. In order to overcome this challenge, let us first identify the current state oftrust-research. Golbeck [143] categorizes existing trust on the web onto three sub-categories: trust in content, trust in services and trust in people. The focus ofuser profiling solutions is on either content or people. The research on understand-ing trust in any of these contexts resides in two fields: web science and e-commerce.

O’Hara and Hall [273] formulate existing problems associated with trust by identi-fying which languages and ontologies are relevant for presenting the requirementsof on-line trust, How transparency can be embedded into daily usage of informa-tion on the web, and finally, how trust and the web of data can be fused to createa ubiquitous interaction for the user. O’Hara and Hall [273], study several keyperceptions of trust including risk, confidence, credibility and reputation. There isalso still no clear means to allow a balanced of utilization and sharing of personaldata in a trustworthy manner.

Focusing on e-commerce web, Gefen and colleagues present extensive research onfinding the impact of user trust and e-commerce has related several importantperceptions of trust [130]. They have observed that the perceptions of trust can in-fluence one’s adoption of a certain information technology product [25]. Althoughconceptual frameworks, taxonomies and vocabularies are required to guide suchresearch by proposing relevant propositions and ideas, the authors suggest that aresearch methodology needs to be devised to identify a technology that builds trust.This methodology must also emphasize how such frameworks can be combined withexisting ideas to build upon similar models.

Both perspectives are subjective to their respective contexts. What is shared be-tween is the requirements of clear semantics and tools for modeling trust and trust-based products. Thus to be able to position trust effectively on the web of person-alization, we have proposed clear web semantics and ontological tools.

Limited Work on Correlating Trust to Information PrivacyWhile research on identifying and recognizing notions or perceptions of trust hasbeen considerable, less attention has been given to finding correlations betweentrust and information privacy. This becomes specially important in the context ofpersonal web like social networking or e-commerce. Specifically finding correlation


between trust and privacy is increasingly vital. Among the first works on corre-lating trust and privacy, particularly in the context of social networks is Dwyer etal. [110]. In their Privacy Trust model, statistical variables examine the correla-tions among the constructs of Internet privacy, trust in networking sites, trust inmembers of network, information sharing and development of new relationships.For each independent variable, results for Facebook [115] and MySpace [257] arepresented separately, and also combined. Resulting correlations have been incon-sistent. They state that although the privacy metric has strong reliability, there islittle evidence of influence of privacy on information sharing. Such study is widelyregarded as an effective empirical survey as It points out the impact of trust andprivacy in social web. Although this survey concludes without pointing out a clearrelation between trust and privacy. Bèlanger and Crossler [23] provide a compre-hensive review of information privacy research.

Smith et al. [325], complement this survey with an interdisciplinary review of infor-mation privacy research. They identify three major areas in which previous researchcontributions have been made.These are the conceptualization of information pri-vacy, the relationship between information privacy and other constructs, and thecontextual nature of these relationships. Elaborating on second contribution, theypresent a correlation of privacy with other constructs as a measurable commodity,dependent and independent variables. Focusing on studying privacy concerns as ametric, they state that since it is almost impossible to measure privacy itself, andalso almost all empirical privacy research relies on measurement of a privacy-relatedconstruct rather than looking at privacy as an integral concept. This is to mentionthat the focus of privacy concerns as a measurable construct, is personal rather thangroup-based. Dinev and colleagues [96] follow up on their proposal by an empiricalstudy on measuring statistical relationships between privacy and other constructsby surveying users of Web 2.0 sites. Relevant correlations to information privacyare found on anonymity, secrecy, confidentiality and control. As observed, trust isstill not a construct that has been surveyed empirically in their work.

Trust and privacy constructs are context dependent notions and modeling themwithin the context of user profiles demands an extensive study. Namely, it must beclearly defined how these constructs affect the profiling and personalization systems.Through this research we will show that providing users knowledge that the systemunderstands and respects their preferences accurately can boost their confidencetowards the system. At the same time, by finding correct synergy between trustand privacy measures, we can maintain system performance at acceptable levels,while protecting user data.

11

Research Questions

The problem that we consider in this work is the notion of trust-based user profiling.The idea of combining web profiles with trust and mechanisms allowing informa-tion systems to consume and understand such statements and preferences. Thisimproves interaction and communication between individuals and system which inturn boosts the system performances. Following this formulation of problem, thethesis aims at answering the following questions:

• Q1: With increasing importance of trust computing, which languages andmethods shall be used to model notions of trust in user profiles ?

• Q2: How can we manage trust-enabled user profiles for web computing ?

• Q3: What are effective techniques to discover, aggregate and mine trust-based profiles ? How can we maximize the impact of trust-based user profilesin the context of information retrieval and personalization on the web ?

• Q4: How can we correlate notions of trust and privacy in an effective mannerand exploit this correlation to benefit the applications and systems imple-menting these crucial concepts ?

• Q5: How can modern web applications be designed to incorporate trust met-rics and trust-embedded user profiles in their very fabric ?

Proposed Approaches

A majority of proposed solutions by this thesis had impact on profiling and rec-ommendation systems in digital libraries, i.e. EU FP7 Smartmuseum project [292].Highlighted solution starts by modeling adaptive user profiles incorporating userstaste, trust and privacy preferences. This led to proposal of several ontologies de-scribing characteristics and attributes of users and their on-line content, which inturn was used for improving indexing and retrieval of items and profiles across theplatform. To address important obstacles of sparsity and uncertainty of on-lineprofiles, frameworks for data mining and machine learning of profile contents fromsocial networks were proposed. Results of mining populating data from social webtogether with profiles were shown to increase the accuracy of intelligent suggestionsmade by system were shown to increase the accuracy of intelligent suggestions madeby the system to improve navigation of users in on-line and off-line museum inter-faces.

With an ever increasing variety of data on the web, techniques are needed to beable to mine and use such content. This motivates us to take notion of trust-basedprofiles beyond the boundaries of digital libraries and into the social web domain.This is done by augmenting the mechanisms of discovery and recommendation of


popular social recommender systems, e.g. collaborative filtering. This has led us topropose several trust-based recommendation frameworks capable of mining implicitand explicit trust across ratings networks taken from social as well as e-commerceweb.

We focused both on ontological issues as well as management of profiles. Resultingrecommendation techniques have shown to increase accuracy of profiles, by incor-porating knowledge of items and users and diffusing them along the trust network.Leveraging on automated distributed management of profiles we showed that cov-erage of system can be increased effectively. Our results surpassed comparablestate of art techniques, which in turn shows that trust can clearly elevate accuracyof suggestions predicted by system. To assure overall privacy of similar systems,privacy was given a direct focus. Focusing on architectures and metrics for securetrust-based recommendations were proposed. In turn it was shown that a balancebetween accuracy and changes of trust data passed between parties can maintainaccurate output.

Finally focusing on hybrid models of web contents and recommendations led usto study the impact of trust in the context of topic-driven recommendation in so-cial and opinion media. This helped us show that content-driven and tie-strengthnetworks can improve systems accuracy for several computing tasks. The follow-ing main contributions will be discussed in contributions part and detailed out inincluded papers:

• C1: Modeling and Analyzing Ontology-Based Trust Networks;

[C1.1] Proposing a generic trust vocabulary for modeling interac-tions and cooperations of agents, applications, organizations and people onthe social web and a functional ontology for documenting these interactionsand proposing resulting trust networks.

[C1.2] Introducing a benchmarking framework for qualitative andquantitative analytics of ontological trust models and their generative trustnetworks.

• C2: Modeling and Learning Trust-Aware User Profiles;

[C2.1] Novel formalization of trust-aware user profiles. Such for-malization allows encapsulation of structured knowledge representation of asystem with respect to collective behavior of a user across the system. Theuser attributes encompass individual and collective knowledge of system aboutthe user. This together allows system to build a behavioral knowledge of ap-plications with respect to profiled data about the user, including important

13

notions of trust and privacy with respect to context that user is being profiledwithin.

[C2.2] Proposing a greedy heuristic for mining and normalizinguncertainty semantic user profiles where a custom fuzzy reasoner canmine, interpret and map the raw values into normalized values that can lateron be used for recommendation and adaptation tasks.

• C3: Discovering and Aggregating Trust-Aware User Profiling;

[C3.1] Augmenting trust-aware user profile modeling for cross-domain personalization. We have proposed for an ontology-based genericuser model, which imports a generic user model to captures the basic con-cepts of user. This in turn was extended with a social user model containingconcepts needed to capture knowledge about on-line users.

[C3.2] Proposing a semi-supervised profile importing architec-ture which can adaptively discover, aggregate and learn topic-baseduser profiles to support the task of personalization. Framework sup-ports two aims; helps for harvesting the profiles from the network and learninggroupings of profiles according to their shared interest topics via a combinedclustering through classification scheme.

• C4: Architectures and Analytics of Decentralized Trust-Based RecommenderSystems;

[C4.1] Proposing architecture for an ontology-based recommen-dation framework. A generic recommendation framework allows contentand profiles from the web to be imported, mined and used for generatingrecommendations of items and people of interest.

[C4.2] Proposing for metrics and automated management intrust-recommender systems. Leveraging on a social network overlay al-lowing trustworthy neighborhood to be found more effectively using epidemicheuristics for improved recommendation generation.

• C5: Modeling and Evaluating Privacy in Trust-Based Recommendation Sys-tems;


[C5.1] Introduction of a privacy-by-architecture framework forenabling privacy-preserving trust recommendation system. This al-lows for taking measures for preserving privacy during trust calculation andcomputation.

[C5.2]Analyzing balance between accuracy and privacy in privacy-by-architecture design of a trust recommender system. We haveshown that privacy and trust mechanisms, each with their respective config-urations jointly form configurations of the overall framework.

• C6: Modeling and Measuring Trust in Hybrid Recommender Systems;

[C6.1] Proposing a topic-based framework for review mining andsummarization. In this framework we focus on proposing algorithms tomodel reviews using latent topics and star ratings, ranking of reviews to sum-marize all reviews for a product within the top-k results.

[C6.2] Proposing a topic-based framework for social networkmining and analysis of micro-bloggers. Within which a trend corporacan be mined . By using a probabilistic latent topic technique, both collec-tive, and individual models can be defined.

Chapter 2

State of the Art

Trust Ontologies

An ontology [242] can serve as a tool to model and generate a network of users. Thisis done ultimately by describing personal information about each person (realizingthe ego node), and by describing personal information regarding a set of users whomthe user knows or is eager to connect to (realizing the neighbors on the network).Nodes on such a network are identified by their unique identification. We havesurveyed several widely-known ontologies of trust briefly in paper 5. Table 2.1 vi-sualizes a qualitative summary of several ontologies of trust under focus in our work.

Jennifer Golbeck [147], introduces an ontology, that creates an important schemawhich extends FOAF [44] giving the users this possibility to state and representtheir trust in individuals they know. Context was introduced as a property of trust.Trust is context-sensitive, as a result meaning and semantics of trust can changedepending on the context. This notion is represented in this ontology under generaltrust or specific trust or topical trust [147]. Toivonen and Denker [343], study thetrust in the context of communication and messaging. They state that there aremany factors which can have immense impact on the honesty and trustworthinessof the messages we send and receive. The context-sensitivity of trust has beenrealized and taken into account in their work.Inference web [210] at Stanford Uni-versity, has built a semantic web-enabled knowledge platform and infrastructure.This platform is designated to help users on the network to exploit the value of se-mantic web technologies in order to give and get trust ratings to and from resourceson the web. This process is referred to as justification of resources. To this end,a language called PML is used. With respect to metrics used for presenting thetrust computational values and modeling the mathematical notion of trust, thereexist two approaches: presenting a trust metric with discrete values and metricswith continuous values. Brondsema and Schamp [46] model and represent trustand distrust in a similar fashion using continuous values. Having continuous range

15

16 CHAPTER 2. STATE OF THE ART

Table2.1:

Com

parisonam

ongtrust

ontologiesbased

onontology

component

structureTrustO

ntologiesConcept(s)

Relationship(s)Instance(s)

Axiom

(s)Golbeck

Topicaltrust,Agent,

PersontrustR

egarding,(be-

tween

agentandTop-

icaltrust)

trust0...trust10(range

oftrust

metric),

trustSub-ject,

trustValue,trustedA

gent,(subproperty

oftrustedA

gent),trustR

egarding

"APerson

orAgent

(e.g.Alice)

trust-sH

ighlyRe

(trust10)trustR

egardinga

trustedPersonor

trustedAgent

(e.g.Bob)

On

trustSub-ject

(e.g.Driving)"

ToivonenDenker

Person,Topic,

Re-

ceiver,Message

Trusts(betw

eenPer-

sons),ctxT

RUST

S(betw

eenreceiver

andmessage),

trust-sR

egarding(betw

eenPerson

andTopic)

trustRegarding,

reTopic,(trustsA

-boslutelyR

e...

distrustsAbsolute-

lyRe),

ctxTRUST

S,(ctxtrustsA

boso-lutely

...ctxdis-

trustsAboslutely),

trustsRegarding,

Trusts,rePerson,

(trustsAboslutely

...distrustsA

boslutely)

Multiple

axioms

areinferable,

forinstance;

1)Stat-

ingtopical

trust;"A

Person(A

lice)trustsA

boslutelyRe

trustsRegarding

(re-lationship)

theTopic

(Driving)",2)Stating

trustbetw

eentw

opersons;

"aPerson

(Alice)

trustsan-

otherPerson

(Bob)

trustsAboslutely"

PML

Belief,

Element,

Trust,Element,

FloatMetric

Belief

Relation

(usinghasB

elieved-Inform

ationand

hasBelievingA

gentbetw

eenAgent,infor-

mation

andsource),

TrustRelation

(us-ing

hasTrusteeand

hasTrustorbetw

eenAgent,

information

andsource)

Agent,

Source,Inform

ation,hasB

e-lievedInform

ation,hasB

elievingA-

gent,hasTrustee,

hasTrustor,has-

FloatValue

Twokinds

ofAxiom

sregarding

thetrust

andbelief

ofagent

inan

information

froma

sourcecan

beinferred,

forinstance;

Statingtrust;

"FloatTrust,hasTrustee

andhasTrustor

(agent:userâs

browser)

And

hasFloatValuewith

FloatMetric

(0.55)."

Konfidi

Relationship,Item

About

(Betw

eenItem

andRelation-

ship)

About,

Truster,Trusted,

Rating,

Topic,

TrustRelationships

canbe

statedlike

thefollow

ingaxiom

;"A

(trust)Relation-

shipbetw

eentruster

(Alice)

andtrusted

(Bob)

exists,which

isabout

trusttopic

(Cooking)

with

trustrating

(0.95)."

17

of values allows easier propagation of trust values, along the edges on the networks,using inference mechanisms.

Need for an Extended Trust OntologyFollowing the state of art on web ontologies for trust modeling, we have identifiedthese shortcomings in existing work:

• Existing models do not focus on modeling multi-faceted trust [336]. Multi-faceted trust enables presentation of weighted trust in separate relationships,while we have inherently modeled this notion through the concept of relation-ship and its sub-concepts and properties in our proposed ontology.

• There has been less focus on analyzing trust ontologies from structural per-spective. However, structural understanding of inherent network could guidedesign of more fine-grained relations or meta-data describing interrelations ofusers, items and their interest.

Our corresponding contributions to this part of the work can be found in 5.

Ontology-Based User Profiling for Personalization andRecommendation Systems

Information about the user is usually collected in a so-called user model and admin-istrated by a user modeling system, server or component [359]. Whalster et al, [359]define the following two fundamental concepts: A user model is a knowledge sourcein a system which contains explicit assumptions on all aspects of the user that maybe relevant to the behavior of the system. User profiling is either knowledge-based orbehavior-based [248]. Knowledge-based approaches engineer static models of usersand dynamically match users to the closest model. Behavior-based approaches usethe user’s behavior as a model, machine-learning techniques to discover useful pat-terns in the behavior. The difference between user profiling and user modelingrelies in different levels of sophistication [125]. Web ontologies, are used to formal-ize domain concepts allowing description of constraints for generation or selectionof contents which are similar to the interest domain of user. Web technologies arealso used for formalizing the user model or profile ontology. Such models help withdeciding on which resources to be adapted to the user. Web ontologies along withreasoning create formalization that boosts personalization decision making mecha-nisms [106,107].

Web ontologies play a crucial role in profiling and modeling of usage-driven person-alized software systems. Ontologies have been used extensively in personalizationand recommendation research [120,128,322,399]. Standardizing user profile syntax


and semantics allows for the implementation of inter-operable personalized systemsto share information about their respective users and their knowledge. Ontology-based user profiling is thus crucial to systems that can reason across multiple profiles(social semantic systems) or systems that can take advantage of complex inferenceon multiple ontologies representing different knowledge (e.g. Digital Libraries).Thus bringing us to the notion of hybrid models that can combine both notions.

Need for Hybrid User Profiles and Ontologies in Knowledge Servicesand Databases

Hybrid modeling and profiling have been widely discussed in the literature [49].Hybrid user modeling can be defined as combining user attributes and contentattributes for improving personalization effect. Hybrid approaches to user model-ing and profiling, are either focused on combining strategies for profiling and usermodeling [33, 289]. In addition to modeling semantics in profiles, we also need toconsider the structure of profiles [68, 124]. Existing shortcomings were observed inresearch on ontology-based user profiling are listed as follows.

• Existing models do not consider trust and privacy or similar notions areprofile-level knowledge that can be embedded into profiles for presenting user’ssecurity and privacy preferences across devices, databases or domains.

For our respective contributions to this part of work you can refer to paper 6.

Modeling Trust and Privacy in Ontology-Based User Profiles

In the user modeling field, there are several attempts to define a generic user modelwhich contains the definition of user features and of his/her physical and socialcontext, expressed with semantic web language and made available for all user-adaptive systems via Internet. Figure 2.1 visualizes the distribution of projectsutilizing semantics for adaptation and personalization, while collocating them bytheir semantic qualities and knowledge types.

Ontology-based user profiles are becoming widely adopted. Museum and tour guideapplications were influential ones [28,61,207,365]. For instance, within the domainof digital cultural heritage, the CHIP project is definitely a significant stake holder.Considerable amount of research attention has been paid to semantically formaliz-ing the user domain [365], as well as personalization of information retrieval.

Smartmuseum project aimed at building an on-site and off-site distributed informa-tion dissemination and retrieval platform for accessing the cultural heritage digitalartifacts [28]. While profiles play important roles in capturing and storing the un-derstanding of users in such environments, using knowledge modeling techniquessuch as semantic web technologies seem to be a justified approach. Figure 2.1 has

19

Figure 2.1: Modified visualization of works and projects using semantic technologiesplotted with respect to different types of knowledge used (e.g. domain model, usermodel, personalization model, etc.). Original plot by Ilaria Torre [347]

been modified to incorporate several contributions of this work, including Smart-museum project [292]. Smartmuseum project is plotted along moderate seman-tics as well as bordering along side interaction and social networking, similar toCHIP project. Our trust ontology [101], alongside recommendation systems usingit [118, 119], and social user and cross-context ontology [70], is leveled with socialnetwork with respect to use of strong semantics (i.e. OWL statements in our trustontology [100] and SWRL [268] rules in privacy sub-ontology in our social usermodel [70]), similar to FilmTrust project [140].

Need for Emphasizing and Proposing Federated Ontologies ofTrust and PrivacyTo the best of our knowledge, there are no attempts to integrate privacy model ina generic user model. Little attention has been paid to effective incorporation of


trust into user models. Among adaptive Web applications, recommender systemshave been quite successful in utilizing and leveraging social trust and reputation.Golbeck first introduced the notion of ontological modeling of trust in semanticsocial Web [144,147]. Examples of adoption of reputation and trust in user modelsas pointed out earlier have been limited. Grapple project [3] investigates capturingand utilization of reputation to model the trust between users, by allowing the usersto rate each other’s opinions and statements, following the eBay model [305].

Adoption of such a plain model of reputation is neither successful, nor sufficientin computational generic models of users. This is due to several reasons. Thefirst of which is rating is an implicit model of reputation, and representing it asa simple form of property-rating or a vector of ratings strips it from its originalnotion. On the other hand, many systems are already using explicit trust statementsto evaluate users opinions, (such as Epinions or Ciao [83, 113]). Second, sincetrust and reputation convey different semantics on Social Web, frameworks formodeling users should be capable of describing trust and reputation separately.This difference is pointed out when we introduce a trust model capable of describingtrusted peers of a user on a social network and a reputation model capable of storingand presenting the reputation of user across different communities on-line. Existingshortcomings were observed in current research on modeling privacy ontologies inprojects utilizing semantic technologies:

• Limited work on introducing models for social user profiles and cross-contextpersonalization. There is a need to propose a more unified user ontologies forsocial web, specially in the context of personalization and recommendationsystems.

• Existing models of users in the social web, fail to model important dimensionsof social connectivity: privacy, trust and reputation. Since trust and reputa-tion convey different semantics in the Social Web, frameworks for user mod-eling should be capable of describing trust and reputation separately. Therehas been limited attention to integration of privacy models in a generic usermodel. With ever increasing importance of privacy and security in social net-works [7], it is important that explicit semantics be used to model privacypreferences in social applications.

• There is a lack of clear semantics of topic-based relationship presentations inuser ontologies: Explicit and implicit models of reputation are presented insimple form of property-rating or a vector of ratings strips it from its originalnotion and postulation. On the other hand, many systems are already usingexplicit trust statements to evaluate users.

For our respective contributions to this part of our work please refer to paper8.

21

Discovery and Mining of Ontology-Based User Profiles forPersonalization and Recommendation

Since user profiles play a crucial role in the context of web personalization andadaptation, availability of rich and populated profiles is crucial for personalizedsystems. Discovering and sharing interest profiles across domains and systems havebeen focus of many researchers. Availability of profiles in information retrieval andpersonalization are subject to two important tasks: discovery and mining. Ghoshand Dekhil [136] argue that profile construction and discovery on the web can beaugmented to address the sparseness of the profile data, as well as improving thecontent of the profiles. Teevan et al [338], study heuristics for discovering andprocessing the prior interactions (profiles) of users for the task of search person-alization on behalf of the users. Gauch et al [127], give a complete overview ofdifferent models of discovery and retrieval for ontology-based user profiling. Oneproblems is that most of models are either focused on modeling the user profilesrather than discovery or harvesting them, or they are very general for specific andsubjective tasks of recommendation or retrieval.

Gauch and Trajkova have proposed for user ontologies in cross-domain user profil-ing [128, 349]. Issue of discovering and retrieving profiles across multiple domainswith semantic user profiles has been discussed also in [120, 323]. This has beenemphasized in the Smartmuseum profiling and recommendation architecture [310].Figure 2.2 depicts the interface of smartmuseum artifact recommendation interface.Discovery problem aside, dealing with sparsity in such data becomes an importantissue under focus. Researchers approach different methodologies to gather, ana-lyze and generate user profiles. This is usually done through applying machinelearning techniques to web data. Using these techniques has been very appealingfor personalization tasks [253]. Mining web content for personalization has beenattractive to addressing inherent problems of recommender systems [248]. Morespecifically two types of recommenders have been dependent on large number ofmachine learning techniques, namely content-based [284] and collaborative filteringrecommenders [315].

Need for Automated Discovery of Ontology-Based User ProfilesSimilar to our framework is the work by Liu and Maes [223, 351]. The focus onautomation of profile and taste discovery has been pointed out in literature as wellthrough either profile learning [245, 335, 338] or ontology-driven mining [92, 219,278, 396]. First and foremost problem in personalization systems is dealing withsparsity in profiles and Cold Start problem. This problem has been the main focusof the semantics user profiles [12], the ontology-based user profiles [10, 244, 323]and the user models [71]. Cold start problem refers to incapability of system tocope with lack of sufficient data to reason about users. Cold start has been a


Figure 2.2: Architecture of Smartmusuem recommendation system, visualized byRutosalo et al [310].

strong hurdle in performance of web personalization systems, and has remaineduntil recently. Amongst the approaches proposed in dealing with such this issueuser modeling [71], trust [356] and collaborative filtering remain the most successfultechniques. Existing shortcomings that we have identified in this area of researchare as follows:

• Limited attention to automation in profile discovery frameworks; With in-creasing need for back-end data mining and machine learning for decisionsupport and intelligence, solutions are needed for processing imported oraggregated data from social networks or web in general. Emphasis on au-tomation of such process is of benefit to resulting platform where profiles areimported and digested for recommendation in a dynamic fashion.

• There is no completely satisfactory solution to deal with increasingly impor-tant Cold-Start problem; Cold start will degrade the performance of web pro-filing, a new importation and mining frameworks are proposed that combine

23

the power of data mining and machine learning with least effort on supervisionof processing of data.

For respective contributions to this part of please refer to our paper 9.

Need for Mining Ontology-Based User ProfilesFocusing on weighted user profiling methodologies, an important problem to con-sider is uncertainty associated with these profiles. In modern web systems dealingwith uncertainty reasoning in user profiling has become a major problem [346]. Un-certainty evaluation has been subject to inferring individual attributes from groupattributes in profiles [282]. Uncertainty evaluation in Facebook for instance, hasbeen objective to find relationship between Number of Friends and InterpersonalImpressions [346]. Thus, uncertainty reasoning has been proposed, leveraging fuzzyreasoning specifically, for dealing with cold start problem [356]. There are works onfuzzification of each weighted notion, namely trust [13,30,227,258], privacy [264,394]and ranking [153]. However not so many approaches and frameworks consider ap-proaching collective models of afore mentioned fuzzy notions altogether. SubjectiveLogic [221] is one unified framework that allows for collective analysis of trust andits atomic factors such as risk. Closest proposal to our approach is Schmidt etal [75,316] that collectively model trust and reputation in a multi-agent setting. Inthis part of research we summarize the gaps observed as follows:

• Dealing with uncertainty inherent in profile data through explicit reasoningtechniques. By associating profiles with weights we can introduce clear se-mantics and interpretation capabilities to address uncertainty associated withprofiled user data. This is specially the case if such content is user generatedand taken from multiple on-line sources of data. In modern web systemsdealing with uncertainty reasoning in user profiles remains a major problem.

For respective contributions to this part please refer to our paper 7.

Trust Metrics and Ontologies for Recommender Systems

Social recommender systems are suitable candidates for adopting notion of trust-aware user profiling due to several reasons. One of the most important factorsemphasized earlier in introductory part of thesis 12.1, is the fact that consumersincreasingly and visibly express and leverage their trust and privacy for the utilitythey may gain through on-line services, specially recommender systems [25, 350,380]. Trust has been shown to be an effective notion in elevating performance ofrecommendation systems [140, 173, 270, 285, 356]. Examples of adoption of trust-recommendation systems have been increasing both in literature and commerce.Fazeli proposes a trust recommender system for learning and teaching [117,162].


Trust has been the focus of much research since it emerged as a reliable means forimproving recommendation accuracy. Zhou et al, [397] presents a rather thoroughsurvey of approaches to trust-aware recommender systems. Within the context ofrecommender system, we perceive the term trust to denote the confidence a user hasin the recommendations of another. Trust complements social recommenders byaddressing such problems as the reduced computability of similarity between usersand improving accuracy of prediction. Yuan et al. [385], describe trust networks asbeing social networks with user defined trust networks. The authors determine thatthis type of networks hold the property of small-worldliness, which involves hav-ing closely clustered users and small average path lengths between any two users.They then use this finding to define a model for recommender systems that takesadvantage of the small-worldliness of social networks in order to increase both ac-curacy and item coverage. In addition to trust, distrust has also been a focus ofresearch in recommenders. Victor et al. [354] propose a model that uses distrust tocomplement trust. This approach helps deal more effectively with users that haveundesired behavior. The concept of distrust is also used by Verbiest et al. [353].They analyze the effect of path length on trust and accuracy. This is particularlyinteresting to our work since we also observe the effects of using neighbors on theaccuracy and item coverage of our recommender system.

Several approaches, such as Golbeck [143], Kuter et al. [209], Avesani et al. [17],DuBois et al. [109] also exploit underlying mechanism in a network that allows forexplicitly stated trust statements between users. However, not all systems supportsuch features. The ability of users to express their confidence in others is limiteddue to the time and effort required to evaluate other members of the network inorder to form an opinion. Therefore, the ability of recommender systems to infertrusts from limited knowledge is still a desired feature. The technique used to infertrust between users is critical to the accuracy of a trust-based recommenders.

Need for Focus on Ontology and Architecture in TrustRecommendersSemantic technologies have become effective notions in modeling data utilized byon-line services ranging from books, movies to music recommendation platforms[69,140,307]. Golbeck utilizes ontological structures of profiles [140,144] which arelater on used for recommendation generation in FilmTrust framework [140]. Whileusing adjacency matrices for storing trust values have been in favor in a numberof works [236], there is an increasing focus on using semantics for describing users,items and their relationships in recommender systems [122], specially consideringimproved resulting accuracy for recommendation and retrieval [322]. A focus onmodeling trust on item and user level was studied by O’Donovovan and Smyth [270].They model item level trust which is similar to user level trust. Both trust modelscan be used concurrently to offer better results.

25

In addition, to cope with sparsity, decentralization and data mining can be put infocus. Han et al [154] propose a DHT-based (Distributed Hash Table) approach,where the central dataset is organized into "buckets" of users which can be saved onindividual nodes, each user utilizes his most suitable "bucket" to choose neighborswith which to generate predictions. User clustering is suggested as a solution forsolving scalability problems as well as a means of improving accuracy [313]. Sarwaret al. [313] present clusters as groups of users where all the users in a cluster areeach other’s neighbors, whereas in our case, the neighbor relation is directional.A directional neighbor relation is desirable since, while a user’s neighbors will bethe most similar users to it, there might be other users that are more similar to aneighbor.

Similar to the metric considered in our work, is the metric studied and discussed byLathia et al. [214]. Lathia et al. [214] argue that dependence of CF approaches onsimilarity measures hides a number of pitfalls, which originate from the fact thatuser profiles are very empty and limited in breadth. He proposes for trusted k-nearest recommenders (kNR) [214], a trust-learning heuristic that mainly suggeststhe idea that recommenders, who provide useful information, should be rewardedand those who have no information available, should be downgraded. The trust-based collaborative filtering algorithm used in their method requires a centralizeduser-item matrix which might lead to scalability problem as the number of usersincreases. We summarize the gaps identified in the existing research related to thispart of the work as follows:

• There is limited attention to using ontological trust models, specially trust-based profiles in recommender systems. With increasing attention to recom-menders in various fields of commerce and science, need for ontological modelsdescribing various information items of interest, user profiles and their inter-relations is increasing. Thus in order to maximize adoption of trust-basedprofiles fully functional semantics models of recommenders can be proposed.

• There is need for studying correlative and bilateral effects of networks andmetrics of trust; While values of structural studies of resulting trust networkshave been pointed out, it is vital to study how mined networks of trust reshapeand evolve in the face of suggestions generated by networks.

You can read further detailed contributions in paper 10 with respect to this partof the work.

Need for Focus on Metrics and Profile Network Management inTrust RecommendersAfter considering data structure and architecture in trust recommenders, we gavefocus to metrics and profile network management in trust recommenders. Pear-son similarity is a popular weight metric, however using a more complex weighing


measure than just similarity has the potential to offer more accurate results, espe-cially in sparse datasets [236]. Approaches such as those proposed by Golbeck etal. [140–142] take advantage of trust ratings explicitly stated by the users themselvesto infer trusts between nearby members of the network through trust propagation.Focusing on metrics, O’Donovan and Smyth [270] argue that similarity is not suffi-cient in recommenders. They propose trust metrics that measure the degree whichone might trust a specific profile when it comes to making a specific rating predic-tion. O’Donovan uses the known ratings to create an artificial history of predictionsfor each user. By predicting the known ratings of users using all the other usersand counting the amount of correct predictions that each user makes, O’Donovanestablishes a global trust [216] for each user as the ratio of correct predictions tototal predictions of a user [270].

In addition to metrics focus, we have also studied how leveraging profile man-agement could lead to increasing decentralization of recommendation generation.Several works focus on studying decentralization techniques on recommender sys-tems, specially trust-aware ones [236,251,321]. Miller [251] proposes a peer-to-peerrecommender system in which nodes exchange ratings with a neighbor at each stepin order to construct an item to item similarity matrix which can then be usedto make offline predictions. The choice of neighbors as well as determining theneighbors of a user are implementation dependent in this approach. Unlike ourapproach, Miller does not maintain a profile network. This is understandable sincehis proposed system does not need to keep similar profiles easily accessible andonly needs a profile for a one-time computation, after which it can be discarded.Ormandi et. al. [277] determine that using gossip based algorithms to cluster anetwork in the context of recommender systems offers potential for increasing ac-curacy of prediction. However, the aforementioned work does not analyze itemcoverage and does not cover trust-awareness in recommender systems instead fo-cuses on load-balancing. We summarize the gaps identified in the existing researchrelated to this part of the work as follows:

• There is a need for further studies of interrelations of effect of decentralizationmechanisms on performance factors in recommender systems; Since existingwork on applying decentralization heuristics to recommenders has been widelyfocused on addressing problems such as load balancing [277], more connection-centric focus is needed to correlate the positive impacts of decentralization tooverall performance of recommendation generation process.

• There is a need for studying effect on profile (overlay) management on per-formance factors in recommender systems; This is also due to the fact thatmajority of recommenders use matrices to store and retrieve items and pro-files similarity [251] and trust scores [236] indices. This is why by leveragingdecentralized networks or overlays (e.g. DHTs), we can improve speed andcoverage of access to profiles across the network of users.

27

Figure 2.3: Privacy alleviating techniques in personalization systems categorizedaccording to stages of personalization and approaches, taken from Toch et al [342].

See paper 11 for our respective contributions to this part of the work.

Privacy in Recommender Systems

Toch et al [342] provide a survey of user attitudes towards privacy and personaliza-tion as well as technologies that can help reduce privacy risks. They identify threetrend categories to personalization: social-based personalization, behavioral-basedpersonalization and location-based personalization. Three steps are identified byauthors in a personalization process. According to diagram the further you movetowards the lesser capabilities of user to control their information. These steps are


visualized on vertical axis in figure 2.3. This collection also categorizes existingmethods to addressing privacy along two horizontal axes namely privacy-by-policyand privacy-by-architecture. While former focuses on adopting and putting intoaction the so called “notice and choice” principles of fair and sound informationpractices [14, 56, 57, 66, 131, 199, 251, 286], the latter addresses creation of systemsthat minimize the aggregation and consumption of identifiable and traceable per-sonal data [188, 189, 196, 238, 364]. Privacy techniques that focus on user modelcreation step allow user data to be hidden from central services [56, 57, 286], byleveraging technologies such as distributed collaborative filtering [51,251], or to becustomized by the user using configurable user modeling [364]. Techniques subjectto the data collection phase block the system from rendering fine grained profiles ofusers by tracking their behavior across their domain. Solutions such as client-basedpersonalization provide privacy-by-architecture solutions by not allowing systems toaccess user information directly. The adaptation phase and it’s respective privacysolutions are subject to research.

Need for Privacy-Preserving Trust Recommender SystemsTaking measures for preserving privacy during trust calculation and computationhas been of great importance. An absence of privacy protection within the con-text of systems dealing with trust and reputation, can ease attacks by maliciousinsiders, as they might infest the existing trust establishments or alter the trustcomputation results. In the context of recommender systems, Lam et al. [211] givean overview of privacy and security problems with recommenders. These problemsare twofold: the personal information collected by recommenders raises the risk ofunwanted exposure and malicious users can bias or sabotage the recommendationsthat are provided to other users. The latter notion is recognized as an attack onrecommender systems, namely Shilling attacks [54,254]. Attacks on recommendersremain a significant security hole in these systems [81, 271, 302]. O’Donovan andSmyth elaborate on robustness of trust in recommenders and state that variousattack sizes cause prediction shift for a “pushed” item [271]. This is based onan adversarial model that malicious users might find a way to penetrate a recom-mender system using a maximum rating for the pushed item. In such situation,where we are estimating our trust values, attack profile will reinforce the ratings ofeach other profile. This is called the Reinforcement problem [271]. The authors con-clude that trust models can not be used to increase recommendation accuracy, butthey can be used to increase the overall robustness of social systems. Zhang [392]focuses on the same problem and executes various sizes of average Shilling Attackson a trust-aware recommender system. He demonstrates that trust-recommenderexhibits more stability over a traditional kNN-based recommender. Thus, the re-search gaps identified with respect to the work presented can be summarized asfollows:

• There is a need for proposing architectures for enabling and sustaining pri-

29

vacy of trust-aware recommender systems. There has been least emphasis oninvestigating the notion of privacy surrounding the disclosure of individualratings and the protection of trust computation in recommender systems.

• There is a need for more empirical and experimental evaluation of stated bal-ances [196] between perceived usefulness of system (performance and adop-tion) and measurable and feasible privacy utilities.

Our respective contributions to this part of work can be found in paper 12.

Trust and Topic Models in Hybrid Recommender Systems

Topic Models and Hybrid Recommender Systems

While both content and collaborative filtering recommender are dominant tech-niques to building recommendation systems on social web [332], it is important thatone can use social and semantic information for generating informed recommenda-tion [208]. Such systems are considered as hybrid recommender systems [52, 53].Burke defines the term hybrid recommender system as any recommender systemthat combines multiple recommendation techniques together to produce its output.A new breed of such approaches are leveraging latent topic models [82,228,382,384].Latent topic models have a wide range of application from intelligence and knowl-edge extraction from the text for opinion mining [35] or review recommenda-tion [204], to sentiment analysis on micro-blogging sites like Twitter [31, 79, 266].Topic models are generative probabilistic models which utilize vocabulary distil-lations to spot topics within text corpora. Most widely utilized topic modelingtechniques include Probabilistic Latent Semantic Analysis (PLSA) [166] and La-tent Dirichlet Analysis (LDA) [36].

Applications of topic models and hybrid recommenders is presented in the litera-ture [239, 330]. McCallum et al. [239] propose Author Topic model (AT), as threeBayesian hierarchical models to deal with roles with email datasets. The AuthorRecipient Topic model (ART) is a directed graphical one which models social roleas an explicit graphical model through a latent random variable. Argument onnon-topic models being able to handle graph structure has led to an increasinglevel of work on embedding graph and network structure analysis into very fabricof LDA models [73, 362, 391]. Wang et al [362] propose a probabilistic frameworkfor joint analysis of text and links between nodes (e.g., people) in a time-evolvingsocial network. They show how their model is resilient against noisy links on anacademic (co)authorship network.


Correlating Trust and Topic Models in Hybrid RecommendationSystems

As trust has been the sole focus of artificial intelligence domain, multi- or even inter-disciplinary models of trust are very recent [67]. Models of opinionated trust hasbeen put forth in two different techniques. More recently, using natural languageprocessing techniques have been leveraged to summarize, integrate or recommendopinion summaries in form of trustworthy topic sets. Golbeck and Hendler first setforth the concept of topical trust on the web [146], for applications in trust networkbuilding and inference [138] and social recommender systems [388]. Topical trust[146,195] originally sets forth the idea of using topic labels as edge labels on a socialnetwork exemplifying context or nature of a trustworthy relation. With increasingpopularity of tag-based systems on the web, tag models of trust have been recentlyproposed in the context of multimedia. Such trust metrics are mainly proposedfor filtering noisy and unwanted content (here tags). This has led researchers toto differentiate between content models and user models of trust [172]. For socialscientist to be able to leverage topical models for network mining, new models andnew metrics need to be proposed [224]. Cha and Cho [73] extend probabilistictopic models to analyze the relationship graph of popular social-network data, sothey can group the edges and nodes in the graph based on their topical similarity.To do so, they first apply the Latent Dirichlet Allocation (LDA) model and itsexisting variants to the graph-labeling task. Several variants of LDA are proposedand tested along with their hypothesis.

Need for Studying Trust in the Context of Modern HybridRecommendation Models

The existing work focuses on two types of systems: First, content trust models lever-aging trust graphs for improved accuracy of recommendations generated [224,366].Second, frameworks proposing trust metrics that can either propagate, aggregate[178] or rank [375] people and their respective resources [67,187] for improved rec-ommendations. Weng et al. [375], propose a heuristic to measure the influence ofindividual Twitter users taking both the topical similarity and the link structureinto account. They utilize an LDA algorithm to distillate and acquire topic setsfrom Twitter users. This is followed by constructing links between Twitter users.They show that through existing homophily in Twitter, a notion of reciprocity canbe observed. Caverlee et al [67], have proposed SocialTrust++ within which theydevelop and analyze algorithms for and leveraging community-based notion of trust.While they place much emphasis on a community model of trust, in order to modeland mine implicit communities they emphasize on usefulness of probabilistic topicmodeling techniques. They also report that by leveraging LDA-based retrieval,community oriented ranking model results in a significant improvement over otheralternatives [187]. We summarize the gaps identified in the existing research relatedto this part of the work:

31

• There is a need for increased attention to topical/tag-based models of trustin the context of heterogeneous and mixed-mode content for web comput-ing. Limited attention has been paid on exploiting graph and link structuresamong resources and people especially in the context of on-line social media.

• There should be possibility of leveraging collective feature attributes for user/re-sources interlinking and trustworthiness evaluation; While topical models canmeasure feature notions such as saliency, relevancy and polarity, there is alsoroom for exploiting such notions to model distrust, mistrust links betweenusers and their resources on-line.

For our respective contribution to this part of the work please refer to papers13 and 14.

Chapter 3

Detailed Contributions

Modeling and Analyzing Ontology-Based Trust Networks

While social networks have become the most dominant forums on the web, at-tracting a large number of users with diverse background, trust networks formedwithin and across these networks create an extraordinary test-bed to study rela-tion dependent notions such as trust, reputation and belief. Successful adoption oftrust networking across e-commerce web and web applications such as recommendersystems have helped web crowd to realize the importance of networking.

Summary of Contributions

In order to successfully capture, model and present these networks web applicationsand users’ need to understand and agree upon the meaning of trust, we present se-mantics of trust in a fashion that captures the meaning of relationships amongagents on a social network which becomes the first aim and contribution of thiswork. To model semantics of relationships forming the backbone of trust networks,main components of relationships are represented and described using web ontolo-gies [98]. Resulting models of such ontologies are thus referred to as ontology-basedtrust models and their generated instances are called ontology-based trust networks.

While most of the attention that researchers have paid are to modeling semanticsof trust and how to leverage these models for their respective applications, lessattention has been paid to analyzing and studying the structure of these networks.With increasing importance of computational social science [193, 215], more trustscientists are emphasizing the importance of analyzing and studying links and tieson a trust network, thus emphasizing the importance of trust network analysis[67,179,395]. Since ontology-based trust models and their resulting networks followtheir own syntax, structure and semantics, a framework is needed to benchmarkthe ontological trust models. Thus, secondary contribution of this work proposesa framework for benching ontological trust models, both through quantitative and

33

34 CHAPTER 3. DETAILED CONTRIBUTIONS

qualitative metrics. Quantitative aspect of such framework relies on studying thelexicon and logic of underlying vocabulary presented by trust vocabulary. Thisis while quantitative framework aims at benchmarking the structure of generatednetwork instances through corresponding ontologies. We can summarize respectivecontributions by papers [100,101] corresponding to content of paper 5, as follows:

C1.1 Proposing a generic trust vocabulary for modeling interactions and co-operations of agents, applications, organizations and people on the social weband a functional ontology for documenting these interactions and modelingresulting trust networks.

C1.2 Introducing a benchmarking framework for qualitative and quantitativeanalytics of ontological trust models and their generative trust networks.

Contributions Statement

As the main contributor of papers [100,101] I have proposed a generic trust vocab-ulary, and introduced a benchmarking framework for qualitative and quantitativeanalytics of web ontological trust models.

Modeling, Discovering and Learning Trust-Aware UserProfiles for Knowledge Platforms

User profiling remains the most dominant and pivotal methodology in web ap-plications to collect, present personal usage data. While prying any sort of datafrom various sources and databases requires explicit agreement from the users side,analyzing and processing their data is even more invasive to privacy. Increasingattention is paid to capture trust while maintaining a decent privacy guarantee ofexposure and consumption of this data. This creates the possibility of proposingthe concept of combining trust and privacy factors at profile level with usage datagathered.


We have introduced a user profile formalization capable of encapsulating structuredknowledge of collective behavior of a user across the system, including importantnotions of trust and privacy with respect to context that user is being profiledwithin. [102]. By introducing this idea within the context of knowledge-intensivesystems, methods are required to evaluate this approach. Our profiles are made upof semantic user profiles and weighted values for trust and privacy. To be able toanalyze these profiles fuzzy reasoning is adopted where a unified fuzzy reasoning canlearn and explain the raw values that can later on be used for recommendation andadaptation. The developed solution was specialized to context of Smartmuseumproject, but we believe that our solution can be extended and reused in similar

35

domains and contexts. We summarize the contributions by papers [102,104] shapingcontent of papers 6 and 7, as follows:

C2.1 Formalization of trust-aware user profiles capable of storing knowledgeof a system about collective behavior of a user and incorporating trust andprivacy with respect to context that user is being profiled within. This allowsfor effective expression and insertion of trust, privacy and ranking statementswithin the profiled items.

C2.2 Proposing a greedy heuristic for mining and normalizing weighteduncertain trust weights of semantic user profiles where a unified fuzzyreasoning can mine, interpret and map the raw values into normalized valuesthat can later on be used for recommendation and adaptation. Using proposedapproach allows for Cold Start problem to be addressed to an extent wheresystem can be bootstrapped and function uniformly in the face of new usersor sparse profiles that might hinder performance.


I am the main contributor of papers [102,104] and contributions of novel formaliza-tion of trust-aware user profiles as well as greedy heuristic for mining trust weightsof user profiles, presented in 3.

Discovering and Aggregating Trust-Aware User Profiling

So far trust-aware user profiles [102,104] were modeled and described for centralizedknowledge platforms, especially digital libraries. With increasing importance ofsocial web further proposed models are needed. Such proposal can allow us topresent and capture generic characteristics of users in social contexts. Such proposalalso enables profiles access from multiple knowledge platforms on the web. This isbased on the concept of cross system personalization which represents the idea ofmodeling users and keeping them personalized across multiple knowledge platforms[243,262].


We emphasize importance of using the concept of trust-aware user profile modelingover the web. To successfully leverage current proposed model [100], we need toalign our model with a social user model [61,62]. To do so we have proposed an on-tological model of social user, composed by a generic user model component, whichimports existing well-known user model structures and captures the basic conceptsregarding the user; and a social model, which contains social dimensions. In suchsocial user model that trust, reputation and privacy are pivotal concepts gluingthe whole ontological knowledge models together. Existing models of users on thesocial web, fail to model important dimensions of social connectivity: privacy, trust


and reputation.

Social web is possibly the largest repository for user created, tailored or maintainedcontent [206]. Importing existing user data from social web can be of benefit to bothusers and application. On the application side, existing profiles can be populatedwith content that can hopefully address problems such as sparsity and Cold Start.As a matter of fact, we can benefit from an automated framework that can enablediscovering [136], aggregating [11,253] and reusing user data from social web repos-itories for personalization services. We have proposed a framework [103] for har-vesting and mining topic-based interest profiles from on-line social networks. Thisframework combines web mining architecture and profile generation techniques, butwe put more emphasis on the actual profile generation process. While the formerpart helps for harvesting the profiles from the network, the latter part learns group-ings of profiles according to their shared interest topics via a combined clusteringthrough classification scheme. For clustering step we have used a kNN clusteringapproach, while for the classification tasks three sets of Bayesian, kNN and a treeclassifiers are tested. To generate adaptive recommendation results with respectto tasks of relevancy and accuracy, we use a probabilistic topic distribution thatbalances between both tasks. Thus, our contribution is proposing a semi-supervisedprofile importing architecture which can adaptively discover, acquire and learn topic-based user profiles to support the task of mining for personalization. We summarizethe contributions shaping content of papers 8 and 9, as follows:

C3.1 Augmenting trust-aware user profile modeling for cross-domain per-sonalization. We propose an ontological user model composed of a genericuser model component. The model imports existing well-known user modelstructures and captures the basic concepts regarding the user. The socialsub-model, which contains social dimensions of users. This model allows in-formation of users existing in social data and meta-data to be importableto our proposed framework and overall model to be reusable for social webapplications in turn.

C3.2 Automated mechanisms for mining web content for predictive pro-file generation. This framework uses automated discovery techniques togather and pre-process the data. However, machine learning allows for pro-cessed input to be rigorously analyzed through and create effective predictivetopic profiles that can be used for recommendation tasks.


As the main contributor of paper 8 and secondary contributor of paper 9, my contri-bution to former has been ontologies of trust and reputation, while my contributionsto the latter have been design and implementation of the framework described.

37

Ontologies and Management of Profiles in TrustRecommender Systems

Most of the existing frameworks for analyzing trust networks scrutinize the resultingqualities and quantities with respect to trust metrics chosen by system. Realizinghow architectural,e.g. functional and non-functional components of system, canbe developed to take advantage of such user profiles becomes one of the aims ofour work. While knowledge-based recommender systems, e.g. Smartmuseum rec-ommendation framework [309], are mainly leveraging digital library content andconsumers, their domain of application becomes focal thus, limited to just digitallibraries. Modeling ontology-based trust profiles for generic web architectures andservices including recommendation system becomes an attractive task.


The main contribution of this part is proposing an architecture for an ontology-based recommendation framework, allowing both content and profiles from the webto be imported, mined and used for generating recommendations of items and peo-ple of interest. The secondary contribution to develop an extended item and profileontologies for recommender systems, that allows items and user profiles to be im-ported and be modeled for the task of recommendation. An empirical evaluationdemonstrates how trust metric improves the trust network structure by generatingconnections to more trustworthy users.

In another contribution, we proposed a decentralized mechanism technique to aug-ment recommender systems. By using a social network overlay the gossip algorithmalong with an augmented distance function we cluster the users and shape a user’sneighborhood with its most similar neighbors which allows us to find most trust-worthy users most effectively. We showed that our decentralized approach achievesbetter accuracy than two popular centralized models while maintaining comparableitem coverage. Also, the trust computation method in the context of the proposeddecentralized approach performs better than using Pearson similarity and is com-parable to the popular trust metrics [269]. We summarize the contributions withrespect to papers 10 and 11, as follows:

C4.1 Proposing architecture for an ontology-based recommendation frame-work, a generic recommendation framework allows content and profiles fromthe web to be imported, mined and used for generating recommendations ofitems and people. Two components are under focus in this architecture whichare semantic profile manager, capable of managing user models for both itemsand users. Developing ontologies for recommender systems allows anextended item and profile domain ontology to be developed for items anduser profiles to be imported and be modelled according to any recommendersystem taking advantage of ontological models in their architecture.


C4.2 Proposing metrics and automated management in trust-recommendersystems: leveraging on a social network overlay allows trustworthy neighbor-hood to be found more effectively using epidemic heuristics for improved rec-ommendation generation. Resulting gossip-based recommender systems relieson epidemic network overlay algorithms to create and maintain a distributednetwork in which nodes can use local information to generate recommenda-tions.


I am the secondary contributor of both papers 10 and 11. My contributions topaper 10 were ontological framework for recommendation system as well as socialnetwork analysis of resulting trust networks. My interest and contributions to paper11 were analyzing automated management of profiles and networks with respect tovariations of trust metrics. This covers a majority of the contributions presented.

Modeling and Evaluating Privacy in Trust-BasedRecommendation Systems


Within this work we extend the architectural landscape of traditional collabora-tive filtering techniques and trust-aware recommenders to include building blocksrequired for realizing a privacy-preserving trust-aware recommender system. As anexample of such architecture, we implement a framework for applying data pertur-bation techniques to user rating profiles. We conceptualize this balance betweenaccuracy and privacy as a Pareto notion. We show that privacy and trust mecha-nisms, each with their respective configurations jointly form configurations of theoverall framework. According to Pareto optimality perspective, at least a jointsetting of both configurations exists when utilized results in privacy of user databeing maintained, while keeping accuracy decent at the same time. We have ex-tended the architectural landscape of traditional collaborative filtering techniquesand trust-aware recommenders to include building blocks required for realizing aprivacy-preserving trust-aware recommender system. As an example of such ar-chitecture, we implement a framework for applying data perturbation techniquesto user rating profiles. Thus, we introduced a private trust computation process.Then, accordingly, we propose methods for producing private recommendationsbased on trust-based collaborative filtering recommender systems. We ground thisframework at the top of a trust recommender [388]. We have shown how the overalltrust computation can be augmented to accommodate the private trust estimationand prediction generation. We design this framework, having protection and pre-serving users privacy in mind, while still providing accurate recommendations onmasked data using trust-enabled collaborative filtering schemes. We conceptualize

39

this balance between accuracy and privacy as a Pareto notion. We summarize thecontributions by papers [99] shaping content of paper 12, as follows:

C5.1 Introduction of a framework for enabling privacy-preserving trustrecommendation system. This allows us to take measures for preservingprivacy during trust calculation and computation.

C5.2 Analyzing balance between accuracy and privacy in privacy-by-architecture design of a trust recommender system. We have shownthat privacy and trust mechanisms, each with their respective configurationsjointly form configurations of the overall framework.


I am the secondary contributor of paper [99]. My contributions to paper [99] havebeen a proposal of privacy-by-architectural design of trust recommender system aswell as Pareto optimization proposal for establishing balance between privacy andtrust. Thus, contributions presented in list 3 summarize my respective contribu-tions.

Modeling and Measuring Trust in Topical RecommenderSystems

An increasing number of works are focused on analyzing natural language con-tent from social services for the benefit of users and services. While computationaltechniques are being proposed for analyzing spoken text on social networks, opinionmining techniques are increasingly attractive to analyze networks of users informingother like minded ones across the social media. Topic modeling mechanisms [36]are increasingly attractive, due to the success in mining diverse opinions. Thus,an increasing number of researchers are proposing their adoption within social webdomain. Due to their probabilistic nature, it’s possible to build social networksout of resulting mixture of topics and their associated distributions. While thesenetworks have been limited to associating terms and authors, or communities andtags, modeling trust networks have not been of significant attention. Moreover, dueto the probabilistic learning approach, we propose divergence metric as a distancemeasure between nodes on the network, which is novel. Since topic model can cap-ture diverse relations among users, it can allow for aspects like saliency, relevancyand even polarity to be measured amongst networked opinions.


To improve existing review recommendation techniques and at the same time im-prove the ranking used for evaluating helpfulness merits of existing reviews, wepropose a novel approach to model and rank reviews. The two main components of


our system rely on Latent Dirichlet Allocation (LDA) to model the reviews and onKullback-Leibler divergence to generate an adequate ranking. We make use of theassigned star rating for the product as an indicator of the polarity expressed in thereview towards the latent topics. Our framework covers different ranking strategiesbased on users’ needs to adapt to various user scenarios. We evaluated the systemusing manually annotated review data gathered from a popular review site [112].

Following the experiment with review mining and recommendations, we proposedto apply opinion mining techniques to analyze networks of users discovering andconnecting other similar users across the social media. Topic modeling mechanismsare increasingly attractive, due to their success in mining diverse opinions. Thus,an increasing number of researchers are proposing their adoption within social webdomain. Due to their probabilistic nature, it is possible to build social networksout of resulting mixture of topics and their associated distributions. While thesenetworks have been limited to associating terms and authors, or communities andtags, modeling trust networks have not been of significant attention. Moreover, dueto their probabilistic learning approach, proposing divergence metrics as distancesbetween nodes on the network is of novelty. Thus, in this work we are proposinga topic modeling framework, within which a trend corpora can be mined and byusing a Latent Dirichlet Allocation (LDA) technique, both collective, and individualmodels can be defined. Resulting models are eventually used to generate socialnetworks which reflect divergences of collective and individual opinions. We testedthis hypothesis using a Twitter dataset. We summarize the contributions by papers[105,204] shaping content of papers 13 and 14, as follows:

C6.1 Proposing a topic-based framework for review mining and sum-marization, in this framework we have focused on proposing algorithms tomodel reviews using latent topics and star ratings and ranking of reviews tosummarize all reviews for a product within the top-k results. In additiona focus was also given on proposing methods and metrics for annotation ofcommon features and aspects of review texts.

C6.2 Proposing a topic-based framework for social network mining andanalysis of micro-bloggers within which a trend corpora can be mined andby using a probabilistic latent topic technique, both collective, and individ-ual models can be defined. Resulting models are eventually used to generatesocial networks which reflect divergences of collective and individual opinionsModeling a metric for measuring trust in a latent topical network,this metric allows for group and individual links on a social graph be lev-eled according to divergence levels of corresponding distributions between theopinions of individuals. Measuring distances of collective opinion of groups,or individuals on a trending ground, can be modeled through informationdivergence.

41


I am a secondary contributor of paper [204] and a main contributor to paper [105].My contributions to paper [204] have been annotation mechanism proposal for au-tomated feature detection and extraction from reviews as well as understandingusage of latent topic models for web content mining. My contributions to pa-per [105] have been proposing the architecture of the framework, metric of trustand analyzing the resulting generated trust networks. Thus, the summary of myrespective contributions is presented in list 3.

Chapter 4

Discussions and Conclusions

Discussions

Here we summarize contributions of the work and discuss them with respect to ourresearch questions. The list of contribution are as follows:

• C1: Modeling and Analyzing Ontology-Based Trust NetworksWe proposed and developed reasoning frameworks capable of modeling andmost importantly capturing trust-networks of interactions and cooperationsof agents, applications, organizations and people on the social web.

• C2: Modeling and Learning Trust-Aware User ProfilesWe proposed and developed modeling and Learning profile techniques capa-ble of encapsulating structured knowledge representation of a system withrespect to collective behavior of a user across the system, user attributes en-compassing individual and collective knowledge of system about the user aswell as capturing and storing notions of trust and privacy.

• C3: Discovering and Aggregating Trust-Aware User ProfilingWe propose a semi-supervised profile management architecture which canadaptively discover, acquire and mine topic-based user profiles. This archi-tecture encompasses both user modeling and profile mining, within which ageneric user model is proposed for modeling social web users for task of cross-system personalization, as well as techniques for clustering and classificationof social web profiles for recommendation tasks.

• C4: Architectures and Analytics of Decentralized Trust-Based RecommenderSystemsWe have developed architecture for an ontology-based recommendation frame-work, a generic recommendation framework allows content and profiles fromthe web to be imported, mined and used for generating recommendations of

43

44 CHAPTER 4. DISCUSSIONS AND CONCLUSIONS

items and people of interest. We also proposed for metrics and managementof decentralized trust-recommender systems leveraging on a social networkoverlay allows trustworthy neighborhood to be found more effectively usingepidemic heuristics for improved recommendation generation.

• C5: Modeling and Evaluating Privacy in Trust-Based Recommendation Sys-tems

We develop a privacy-by-architecture framework for enabling privacy-preservingtrust recommendation system. This allows for taking measures for preservingprivacy during trust calculation and computation.

• C6: Modeling and Measuring Trust in Hybrid Recommender Systems

We propose and develop a topic-based framework for review mining and sum-marization, in this framework we have focused on proposing algorithms tomodel reviews using latent topics and star ratings, ranking of reviews. In an-other work we developed a topic-based framework for social network miningand analysis of micro-bloggers within which a trend corpora can be mined andby using a probabilistic latent topic technique, both collective, and individualmodels can be defined.

Following summarized list of all contributions in this section, we analyze howrespective contributions 3 of this work map onto the proposed research questions1. Table 4.1 visualizes mapping between proposed questions and respective contri-butions.

As we can see the contributions give answer to all questions stated. Two contribu-tions, (C2 and C4) provide answers to more than at least two questions. This canbe explained due to the sensitivity of the aim as well as amount of contributions. Tojustify the former, tasks of modeling, learning and populating the profiles are moretime-consuming thus resulting contributions become larger in content and numberof publications. In the case of proposing recommenders, this can be explained dueto fact that recommender systems are a full-fledged software components and withavailability of sufficient and sound data experiments can be tailored, which in turncan result in variety of publications and results. To justify the latter, for instancecontribution C2 has been a large aim of Smartmuseum project which spanned be-yond the boundary of project in terms of resource and research, thus resulting inseveral contributions that could give answer to several questions that this researchwas thriving to answer to.

Leaving the frequency of mapping questions and respective contributions asides, tounderstand where this research has given less focus to, we need to observe thosequestions that are less frequent. As it is observable this dissertation has madeseveral contributions to questions Q3 and Q4 1, which are respectively aiming to

45

Table4.1:

Correlatin

gresearch

questio

nsan

dcontrib

utions

Approach

Questions

Con

tributions

C1.

Mod

eling

and

Ana

lyzing

Ontolog

y-Based

TrustNetworks

Q1.

wha

tlang

uage

and

metho

dsshallb

eusedto

mod

elno

tions

oftrust

forvario

ustasksof

web

compu

ting?

C1.1.

Gen

eric

trust

vocabu

lary

and

ontology,C1.2.

Ben

chmarking

fram

eworkfora

nalyzing

oftrustm

od-

elsan

dtheirtrustne

tworks.

C2.

Mod

eling,

Disc

overing

and

Learning

Trust-Aw

areUserPr

ofiles

Q1.

With

increasin

gim

portan

ceof

trustcompu

ting,

which

lang

uages

andmetho

dsshallb

eused

tomod

elno

tions

oftrustinuser

profi

les?

,Q2.

How

can

we

man

age

trusten

abled

user

profi

lesforweb

compu

ting?

C2.1.

Ontolog

iesof

trust-aw

areuser

profi

les

inKno

wledg

eApp

lications,

C2.2.

Greed

yhe

urist

icformining

and

norm

alizing

weigh

ted

uncertain

trustweigh

tsof

user

profi

les

C3.

Disc

overing

and

Agg

rega

ting

Trust-Aw

areUserPr

ofilin

gQ3.

Wha

tareeff

ectiv

etechniqu

esto

discover,ag

gregatean

dminetrust-

based

profi

les?

Q3.

How

can

we

max

imizetheim

pact

oftrust-ba

sed

user

profi

lesin

thecontextof

infor-

mationaccess,retrie

vala

ndpe

rson

al-

izationon

theweb

C3.1.

Ontolog

iesfor

trust-aw

areuser

profi

lesforcross-do

mainpe

rson

aliza-

tion,

C3.2.

Semi-s

upervised

profi

leag

gregationan

dminingarchite

cture

C4.

Architectures

and

Ana

lytic

sof

Decentralized

Trust-Based

Recom

-men

derSy

stem

s

Q3.

Wha

tareeff

ectiv

etechniqu

esto

discover,ag

gregatean

dminetrust-

based

profi

les?,Q3.

How

can

we

max

imizetheim

pact

oftrust-ba

sed

user

profi

lesin

thecontextof

infor-

mationaccess,retrie

vala

ndpe

rson

al-

izationon

theweb

?,Q4.

How

canwe

correlateno

tions

oftrustan

dprivacy

inan

effectiv

eman

nera

ndexploitt

his

correlationto

bene

fittheap

plications

andsystem

sim

plem

entin

gthesecru-

cial

conc

epts?

C4.1.

Architecture

for

ontology

-ba

sed

trustrecommen

datio

nfram

e-work,

C4.2.

Decentralized

trust-

recommen

dersystem

sarchite

ctures

C5.

Mod

eling

and

Evalua

ting

Pri-

vacy

inTr

ust-Based

Recom

men

da-

tionSy

stem

s

Q4.

How

can

wecorrelateno

tions

oftrustan

dprivacy

inan

effectiv

eman

nera

ndexploitt

hisc

orrelatio

nto

bene

fittheap

plications

andsystem

sim

plem

entin

gthesecruc

ialc

oncepts?

C5.1.

Privacy-by

-architecturefram

e-workforprivacy-preserving

trustrec-

ommen

datio

nsystem

.,C5.2.

Ana

-lyzing

trad

e-off

betw

eenaccu

racy

and

privacyin

privacy-by

-architecturede

-sig

nof

atrustrecommen

dersystem

.C6.

Mod

elingan

dMeasurin

gTr

ust

inHyb

ridRecom

men

derSy

stem

sQ3.

How

can

we

max

imize

the

impa

ctof

trust-ba

sed

user

profi

les

inthe

context

ofinform

ation

ac-

cess,r

etrie

vala

ndpe

rson

alizationon

the

web

?,Q5.

How

can

mod

ern

web

applications

bedesig

ned

toin-

corporate

trust

metric

san

dtrust-

embe

dded

user

profi

lesin

theirvery

fabric?

C6.1.

Topic-ba

sedfram

eworkforre-

view

miningan

dsummarization,

rec-

ommen

datio

n,C6.2.

Topic-ba

sed

fram

eworkforsocial

netw

orkmining

andan

alysis


answer:

Which web applications can benefit from trust-based user profiles ? how canwe maximize the impact of this technology in the context of information accessand retrieval,e.g. recommendation and personalization systems, on the web ? andwhile personalization systems evolve and expand onto various components of webinformation systems, how can modern web retrieval and personalization systems bedesigned to incorporate trustworthiness and trustworthiness metrics in their veryfabric ?

This is while the question Q2 and Q5 1 which seek to answer following questions,seem to have paid less attention to:

how can modern web retrieval and personalization systems be designed to incor-porate trustworthiness and trustworthiness metrics in their very fabric ? how doesone also manage such profiles for web computing ? what are effective techniques todiscover, aggregate and mine such profiles?

Finally with studying the current importance as well as impact of some contribu-tions proposed we believe that following questions demand more attention:

Is there a coherent niche between surrounding notions of trust such as context,reputation, interest and privacy ? how can we correlate these notions in an effectivemanner to benefit the applications and system leveraging such notions ?

Conclusions

Following the course of this dissertation you were provided with a collection ofmanuscripts summarizing the research behind establishing notion of trust-baseduser profiling.

Within the course of this dissertation we have introduced and elaborated on thenotion of trust-based user profiling, the concept of embedding web profiles withtrustworthiness metrics and mechanisms allow information systems to consumeand understand such statements and preferences in order to improve interactionand communication among individuals and system which in turn boosts the systemperformances in various stages. To approach the formalization of profiles, we startedby evaluating existing semantics and vocabularies for modeling trust on the web,which in turned allowed us to present and reason upon generated trust-networks.While formalizing such profiles at one hand, another challenge is realizing importantand closely related notions such as privacy preferences of users. Thus, such profilesare designed in a way to incorporate preferences of users allowing target systems tounderstand privacy concerns of users during their interaction as well. Since the aim

47

of profiling is understanding, analyzing and improving user interactive systems, aneffort was invested across multiple works and projects to incorporate trust profileswithin information retrieval, access and personalization systems. A majority ofcontributions of this work had impact on profiling and recommendation systemsin digital libraries, i.e. EU FP7 Smartmuseum project. Highlighted contributionsstart from modeling of adaptive user profiles incorporating users taste, trust andprivacy preferences. This in turn led to proposal of several ontologies for user anditem domain, which in turn was leveraged for improving indexing and retrieval ofitems and profiles across the platform. In order to address important obstacles ofsparsity and uncertainty of profiles hindering any profile processing system, frame-works for data mining and machine learning of profile contents from social networkswere proposed. Results of mining and population of data from social web togetherwith profiles were shown to increase the accuracy of intelligent suggestions madeby system to improving navigation of users in on-line and off-line museum interfaces.

With ever increasing amount and variety of data on the web, mechanisms and tech-niques are needed to be able to mine and utilize such novel content. This in turnmotivated us to take notion of trust-based profiles beyond the boundaries of digitallibraries onto social web domain by augmenting the mechanisms of discovery andrecommendation of popular social recommender systems, e.g. collaborative filter-ing. This has led us to propose several trust-based recommendation techniquesand frameworks capable of mining implicit and explicit trust across ratings net-works taken from social and opinion web. We researched ontological issues andmanagement of profiles. Resulting recommendation techniques have shown to in-crease accuracy of profiles, by incorporating knowledge of items and users anddiffusing them along the trust network, while leveraging on automated distributedmanagement of profiles. We showed that coverage of system can be increased ef-fectively, surpassing comparable state of art techniques. In both cases, trust hasshown to clearly elevate accuracy of suggestions predicted by system. To assureoverall privacy of such value-laden systems, privacy was given a direct focus wherearchitectures and metrics were proposed and it was shown that a balance betweenaccuracy and perturbation techniques can maintain accurate output. Finally, fo-cusing on hybrid models of web contents and recommendations brought us to studyimpact of trust in the context of topic-driven recommendation in social and opinionmedia, which in turn helped us to show leveraging content-driven and tie-strengthnetworks can improve systems accuracy for several computing tasks.

Future Directions

Taking into account the study of mapping between resulting contributions of thisthesis and questions we aimed at answering, we believe the following future contri-butions can be of interest:


• Proposing research and development of other information retrieval and per-sonalization systems that can incorporate trustworthiness.

• Proposing effective management frameworks for trust profiles and in generalhow trust management and its antecedents can be applied to trust profilingnotions.

• Proposing and analyzing empirical and experimental studies of correlationbetween trust and closely related notions such as privacy and risk.

• Introducing more examples of research and development on applications thatcan leverage the synergy of trustworthy computing notions.

Part II

Included Papers

49

Chapter 5

Methods and Metrics for ModelingOntology-Based Trust

(original version)N. Dokoohaki and M. MatskinStructural Determination of Ontology-Driven Trust Networks in Semantic SocialInstitutions and Ecosystems, International Conference on Mobile Ubiquitous Com-puting, Systems, Services and Technologies (UBICOMM ’07), IEEE Computer So-ciety, pp. 263-268, Nov. 2007.

(extended version)N. Dokoohaki and M. Matskin, Effective Design of Trust Ontologies for Improve-ment in the Structure of Socio-Semantic Trust Networks, International Journal OnAdvances in Intelligent Systems, vol. 1, no. 1942 - 2679, pp. 23-42, 2008.

51

Effective Design of TrustOntologies for Improvement in theStructure of Socio-Semantic TrustNetworksNima Dokoohaki1, Mihhail Matskin2

1Software and Computer Systems (SCS), Information and Communica-tions Technology (ICT), KTH - Royal Institute of Technology, Forum120, 16440- Kista, [email protected] and Computer Systems (SCS), Information and Communica-tions Technology (ICT), KTH - Royal Institute of Technology, Forum120, 16440- Kista, [email protected]

Abstract

Social ecosystems are growing across the web and social trust networksformed within these systems create an extraordinary test-bed to study relationdependent notions such as trust, reputation and belief. In order to capture,model and represent the semantics of trust relationships forming the trust net-works, main components of relationships are represented and described usingontologies. This paper investigates how effective design of trust ontologiescan improve the structure of trust networks created and implemented withinsemantic web-driven social institutions and systems. Based on the contextof our research, we represent a trust ontology that captures the semantics ofthe structure of trust networks based on the context of social institutions andecosystems on semantic web.

53

54 PAPER A

5.1 Introduction

Semantic web is described to be a web of knowledge having properties such as het-erogeneity, openness and ubiquity. In such an environment where everyone has theability to contribute, trustworthiness of these people and their contributions are ofgreat importance and value. As stressed, trust plays a crucial role in bringing thesemantic web to its full potential.

A trust network can be seen as a structure capturing metadata on a web of indi-viduals with annotations about their trustworthiness. Considering social networkas our context, a trust network can be seen as an overlay above the social networkthat carries trust annotations of the metadata based on the social network, such asuser profiles and information. Social networks are gaining increasing popularity onthe web while semantic web and its related technologies, are trying to bring socialnetworks to their next level. Social networks are using the semantic web technolo-gies to merge and integrate the social networking user profiles and information.Such efforts are paving the path toward semantic web-driven social ecosystems.Merging and integrating social networking data and information can be of businessvalue and use to web service consumers as well as to web service providers of socialsystems and networks. Ontologies, at the core of semantic-web driven technologieslead the evolution of social systems on the web. Describing trust relations and theirsub- components using ontologies, creates a methodology and mechanism in orderto efficiently design and engineer trust networks.

"Structure of a given system is the way by which their components interconnectwith no changes in their organization" according to [230]. Determining the struc-ture of a society of agents on a trust network structure within a semantic socialsystem, can help us determine the organizational structure of a system. Havingthis capability we can determine an organization’s certain factors such as flexibil-ity, change capacity, etc. In this paper we investigate how effective design of trustontologies can improve the structure of trust networks created and implementedwithin semantic web-based social systems. To address the efficient design of trustnetworks on semantic web-driven social systems, we have engineered and analyzeda trust ontology [100]. Our trust ontology is based on the main concept of Re-lationship, that models the main element of trust networks, and two concepts ofMain Properties and AuxiliaryProperties, which model properties of relationships.

In order to effectively design an ontology for trust, we have introduced a frameworkfor comparing and evaluating trust ontologies. As an experiment, several ontologiesof trust have been evaluated according to our framework. To understand the processof engineering the ontology itself, all phases and steps taken during the processof building our proposed trust ontology have been mentioned in details. As anexperiment, we have studied the structure of the trust network to describe howa trust ontology can serve as the framework for engineering efficient and scalable

5.2. BACKGROUND 55

trust networks. Same experiment data have been used to create network of othersimilar works structure-wise to get a deeper knowledge of the network structurewith respect to ontology design disciplines. The contents of this paper are organizedas follows: following the background study and discussion on related research insection 5.2, state of art in trust ontologies is presented in section 5.3, our trustontology is introduced in the section 5.4, in section 5.5 trust networks analysis ispresented and discussed. Finally we conclude in section 5.7 and we discuss thefuture research in section 5.8.

5.2 Background

Within the context of social semantic systems, there has been an extensive amountof efforts based on both academic and practical approaches in order to design andengineer trust networks, but none of the existing works in the field were designedbearing structural and design issues in mind. In this section we introduce the tech-nologies that we have incorporated and considered in our approach.

We divide the foundation of our work into two main topics, namely: semantic socialnetworks and trust. In this section we also give a detailed and thorough overviewinto each field. Each overview is divided into subsections where each of the sub-stantial topics is further studied and discussed.

Socio-Semantic Ecosystems OverviewIn 1967, Stanley Milgram introduced "Small World Hypothesis" [250], which waspublished by American Sociologist. Social networks became popular in 1990s. Asocial network is generically defined as a set of people gathered together throughconnections or links, according to [108]. Web has become a ground for bringingthe notion of society of people into life. A web-driven social network needs to beaccessible using a web browser and within this network people should be able to ex-plicitly (or implicitly) state their connections and their links to individuals or groupof individuals, according to [152]. Web-based social networks continue to evolve,while what is most important today is that connections on these networks, are notsingle dimensional anymore and today you can model and state different aspectsof relationships, such as trust. In 2005, according to [152], there were 115,000,000accounts within social networks scattered across about 18 online networking com-munities. It’s important to consider that not all these accounts correspond to asingle individual. Many people have multiple memberships across multiple net-works, at the same time. Size of the social networks will continue to grow everydayas people realize the "hidden" values of social networking day by day [217]. Thisgrowth will continue in size aspect of web grounded social networks and will notstop and as many have predicted [72, 217], the so called "email scenario" will takeplace, where the number of advertisements and SPAM messages will increase so

56 PAPER A

drastically that by some point of time these networks will literally collapse. Thereis a strong and growing demand for fusion of the data from different social networkson web. Many are interested in sharing their profiles, while others are interested inmerging their data from multiple networks.

Two main reasons can be stated and discussed here:First and foremost, great amount of this data which is scattered throughout

all these sites are not shareable and are inaccessible from other networks. Second,as stated many users have different accounts across different networks and if theirdata merge, then many of these accounts might become a single account.

In addition to individuals and users on the web, social networks have become thetarget of the businesses and industries. There are many businesses and enterpriseswhich sell packages of social networking capable software to their users. So thevalue of social networking exceeds beyond the borders of individuals and businessesnow.

Vision of semantic web-driven social institutions

Social metadata fusion, in the form of sharing or integration brings business valueto entities living within such ecosystems. The vision of "Semantic Social Network(SSN)" [108], describes the fusion and integration of social data across social net-works, located on a web of semantics. This vision is based upon two importantdimensions: First, semantic descriptions of social data about people available onthe web in public, expressed in a formal metadata language such as XML or RDF,with explicitly described links to other people on same or different networks. Sec-ond, semantic references to those descriptions described and stored in a formalmetadata language such as RDF or XML [108]. There were several attempts tobring this vision into life. One of the most important and influential ones is FOAF(friend-of-a-friend) project [44].

FOAF and SIOC: bringing the vision into life

FOAF project creates an RDF vocabulary for describing people and the relationshipbetween them. In this way it can be used as the "glue" in between semantic web andsocial ecosystems, according to [46]. As described, current Web communities aredistributed all around the web, with no links in between them, according to [42]. Inorder to bring semantics to online communities, SIOC [29] (Semantically-InterlinkedOnline Communities) tries to create the so called "glue" through SIOC ontology[29, 38]. SIOC aims to enable the integration of Web community information andcreates the possibility of describing and presenting the social web of data using RDF.We can think of FOAF as an enabler for describing semantic web of individuals,while SIOC enables describing semantic web of communities of individuals. SIOC

5.2. BACKGROUND 57

utilizes the FOAF vocabulary for expressing personal profile and social networkinginformation [42].

Modeling social networks on semantic web

Social Network Analysis (SNA) [319,367], is the science of studying and analyzing anetworked setting and it has been applied to settings of networks of health, innova-tion, etc. Network analysis provides the theoretical as well as practical backgroundfor studying how to analyze the network participation effect on certain groundssuch as an individuals or groups behavior. Ontologies can be used to model andcapture the structure of formal semantics of social networks. Wennerberg [377]describes how the structure of a network can be modeled using a semantic webontology. Ontologies model, present and document the concepts and properties ofa certain domain. Having the social nature of the networks as the domain of thestudy, ontologies can capture the concepts of relationships, individuals and theirrespective properties. Inference mechanism gives ontologies the ability of inferringnew information using rules which could be of great importance in social context.

A set of existing efforts on modeling social network on the semantic web could bementioned here. Cantador et al. [58], model a social semantic network by utilizingontology as a basis for clustering the user profiles in a social networking community.The ontology represents the domain of user’s cognitive patterns, such as interestsand preferences. Resulting ontological instances, take the shape of a semanticnetwork of interrelated domain concepts and user profiles. A similar effort [8] usesontology at the core of a semantic web-enabled application. This ontology generatesa social network of users and their interests. Generated ontological networks areused in order to detect and filter the Conflict of Interest (COI) relationships in anacademic context, comprising authors and reviewers of papers. In a similar effortwith the same context, Mika [249] uses ontologies, in the context of a semantic web-driven application system and Flink [249], for modeling, capturing and visualizingthe social network of researchers.

Trust overviewBeing the key to any interaction procedure in human societies, trust has been thesubject of studies to many fields of research and science such as sociology andpsychology, as well as of course computer science. Because of its importance andsignificance, trust has been harvested as a field of research in for example decen-tralized access control, public key certification, reputation systems for peer to peernetworks, and mobile ad-hoc networks. Despite the fact that there has been a vari-ety of definitions for trust, there has not been an agreement on a generic definitionof trust. Researchers mostly have defined trust, depending on the context and theorientation of the paper they have written or the experiments they have been con-ducting. As a matter of fact most of these definitions are specific to the context of

58 PAPER A

the work being done.

Lack of consensus on generic trust definition makes us realize the importance ofhaving a definition which is context-neutral and general enough to be applied todifferent fields of research and different contexts. Trust is a complex issue, relatingto fairness and straightforwardness, honesty and sincerity of a person or the servicethis person might offer. Grandison [151] defines the trust in the following manner;

"Trust is the firm belief in the competence of an entity to act de-pendably, securely, and reliably within a specified context". "Distrustmay be a useful concept to specify as a means of revoking previouslyagreed trust or for environments when entities are trusted, by default,and it is necessary to identify some entities which are not trusted", ac-cording to [151]. Distrust is defined as "the lack of firm belief in thecompetence of an entity to act dependably, securely and reliably withina specified context" [151,152].

Trust components, properties and sources

Trust is presented in the form a relationship between two parties. These two par-ties, often individuals or agents representing those individuals, are represented astrustor or source, which is defined to be the entity which seeks trust or trust relatedoperations such as evaluation in other entity, trustee or sink, which is the entity thatis trusted or it has been requested for trustworthiness-related evaluation. Trust isseen as having a purpose or a context. For instance, Alice trusts Bob as a doctor,but she might not trust Bob as a car mechanic, adopted from [1,151]. In addition, atrust relation might also have a trust metric, which can be quantitative or qualita-tive, characterizing the degree to which the trustor trusts the trustee. This qualityor quantity represents the intensity and level of trust. This quality and quantitycan be evaluated by using an algorithm or mechanism which derives trust, accord-ing to the metric. For instance, Alice might trust Bob as a doctor very much, whileshe only moderately trusts Martin as a doctor, adopted from [151] [1]. So far wehave realized trust as a computational value depicted by a relationship, describedinside a specific context and measured by a metric and is evaluated by a mecha-nism. Some important properties of trust are stated and discussed [151] [1]. Forinstance, subjectivity (difference in judgments of two people on the same entity’strustworthiness) or transitivity (If transitive, when Alice trusts Bob and Bob trustsCherry, Alice will trust Cherry, adopted from [151] [1]). One of the most impor-tant subjects of discussion on properties and components of trust is the differencebeing made between trust in performance and trust in recommendation [151] [1].First, there is a difference between trust in an entity to perform an action (trust inperformance), and trust in an entity to recommend other entities to perform thataction (trust in recommendation). This is the distinction between Cherry trustingBob as a dentist, and Cherry trusting Bob to recommend a good dentist, according

5.2. BACKGROUND 59

to [151] [1]. Another difference is based on existence of recommenders. There is adifference between the trust that is directly observed by trustor from trustee andthe trust that is conveyed and inferred from the recommenders’ trust. As a result,this difference can be sampled between Cherry trusting Bob as dentist, resultingfrom Cherry’s own direct observation and evaluation from Bob, and Cherry trustingBob as a dentist, based on the fact that she trusts Shawn as a recommender for agood doctor and on the fact that Shawn trusts (and perhaps recommends) Bob tobe a good dentist, adopted from [151] [1]. During the observations made by [169],a set of sources of trust are identified, in both atomic (direct trust) and compound(social trust) forms. Trust is the experience gained from an interaction betweentwo individuals. So the actual experience is the source of trust. Considering theexperience, or source of trust between two certain persons and individuals, this typeof trust can be referred to as inter-individual trust or what is commonly referred toas direct trust, according to [169]. We can consider a setting of individuals acrossa web or network. If we consider this society of nodes and present the trust in thissociety and in this setting, then we are dealing with a new type of trust originatingfrom the experiences gathered by a group of nodes or individuals. This new typeof trust has its own source, from trust propagation in social settings or networks.This type is called relational trust, social networks driven trust or in simple form,social trust, according to [169].

Trust computation in Web of Trust

Most of the models proposed for modeling trust on semantic web are more focusedon probabilistic views of trust. They model trust using probabilities assigned aslabels to the edges of the networks according to specified trust metric. In orderto derive and infer trust, edges are traversed and probabilistic trust values aregathered along the edges and using mechanism adopted, the trust value along thetrust path will be computed and inferred. This setting is referred to as a Web ofTrust. There are two reasons for making web of trust a candidate for adoption totrust in semantic web computation scenarios. First, both systems are open. Second,trust is considered as being transitive in both settings. Web of Trust was a systemthat was introduced under the context of security and privacy systems, for instancePGP [404]. In this setting everyone can sign each other’s key and act as certificateholder or certificate authority. Openness states the demand and need for metrics.Need for metrics, establishes and proves the relativity and computability of trust.The need for scalable trust metrics has been discussed and studied extensively[180] [18]. When metrics are applied all the links can carry them and trust can beinferred [147]. Under the assumption of trust transitivity and by enforcing metrics,pathways of trust can be formed and web of trust can be crawled and walked [63].As stated, semantic web is a similar scenario in which each agent that forms a nodeon a network is connected to other nodes, agents, and these links and connectionsform a web of trust. In order to allow everyone, represented by an agent, to evaluatethe statements of others in this open and heterogeneous environments, mechanisms

60 PAPER A

and algorithms are developed or adopted to allow everyone to infer and evaluatetrust in others using the trust metric-labeled links on the networks of trust.

Trust networks

The work in this field is mostly focused on the mathematical notion and presen-tation of networks but the amount of the practical work is limited. Most of theworks in this field do not consider design of larger infrastructures and ecosystems.Trust networks are described as weighted graph structures with directed edges.The edges in the generated graphs represent connections and relationships betweenindividuals. Watts introduces the properties of a small world network [369]. Hedescribes a model called ß-model [369] in order to model, construct and generatethe structure of social systems. Many social systems have used this model withintheir infrastructure [90] [123] [261] [369] [368] [147]. Golbeck has done an extensiveresearch effort on trust networks on semantic web, [147] [146] [139] [140]. She hasconstructed an ontology of trust, combining RDF and FOAF vocabulary to describerelationships comprising trust networks. She has created applications on resultingnetworks of trust based on her ontology. These applications range from email filter-ing, TrustMail [147] [146], to web-based recommendation systems, FilmTrust [140].Brondsema and Schamp [46] have created a system called Konfidi [202] that com-bines a trust network with the PGP Web-of-Trust (WOT). The system implementsa metric and mechanism for inferring the trust on the networks formed. The gener-ated network creates trust pathways in between email sender and receiver that canbe crawled and using trust mechanism and metric, trust values are inferred [46].

5.3 Evaluating Trust Ontologies

This section gives an overview in some of the most important and influential worksin modeling and designing trust ontologies. After giving a state-of-art overview inthe observed ontologies, a framework for comparing and evaluating trust ontologiesis introduced and the studied approaches are compared accordingly.

State-of-art in trust ontologies

As introduced earlier, Friend-Of-A-Friend (FOAF) [44] represents a vocabulary andintroduces an ontology for describing a web of connected individuals. This ontologycan serve as a tool to model and eventually create a network of society of users bydescribing personal information about each person (realizing the node itself) and bydescribing personal information regarding a set of users whom the user knows about(realizing the neighbors on the network). Nodes on such a network are identifiedby their email address and email serves as their unique identification.

5.3. EVALUATING TRUST ONTOLOGIES 61

Golbeck’s trust ontology

Jennifer Golbeck [147] introduces an ontology, that creates an important schemawhich extends FOAF by using foaf:Person, giving the users this possibility to stateand represent their trust in individuals they know. Metric used to express trustis a value on the scalar range of 0-9, in which each scale represents a trust level.These levels are set as properties under the domain of foaf:Person. These levelscorrespond to: Distrusts absolutely, Distrusts highly, Distrusts moderately, Dis-trusts slightly, Trusts neutrally, Trusts slightly, Trusts Moderately, Trusts highly,Trusts absolutely, according to [147]. Context was introduced as a property oftrust. Trust is context- sensitive, as a result meaning and semantics of trust canchange depending on the context. This notion is represented in this ontology undergeneral trust or specific trust or topical trust, according to [147]. For instance,Alice might trust Bob greatly on driving cars but might distrust Bob totally on re-pairing cars, adopted from [147]. In order to depict general trust within Golbeck’strust ontology, trust ratings (in the form of trustsHighly or trustsModerately) aredescribed as properties in range of a person class under the range of another person.To describe specific trust and topical trust, other sets of properties are introduced.These properties correspond to the nine values above, but are used to represent trustregarding a specific topic (for instance "distrustsAbsolutelyRe," "trustsModerate-lyRe," etc), expressing the level of trust regarding a certain topic such as drivingor dishwashing. The range of these properties is the "trustsRegarding", which hasbeen defined to combine a person and a topic of trust. The "trustsRegarding" classhas two properties: "trustsPerson" presenting the person being trusted (trustor),and "trustsOnSubject" presenting the subject that trust is stated towards, accord-ing to [147]. By having this ability we can query for trust about a person on aspecific subject and it is possible also to infer trust on result trust network alongthe edges where given topic creates the connection and we can crawl along thesepaths to infer the trust value eventually.

Toivonen and Denker’s Message and Context Ontology

Toivonen and Denker [344], study the trust in the context of communication andmessaging. They state that there are many factors which can have immense im-pact on the honesty and trustworthiness of the messages we send and receive. Thecontext-sensitivity of trust has been realized and taken into account in their work.The work focuses on drastic changes that many issues, namely reputation, credi-bility, reliability, trustworthiness and honesty could have, and how they affect theprogress of establishing and grounding trust, according to [344]. As a result of thework being done, a set of ontologies have been defined to capture context-sensitivemessaging and trust. An ontology is developed to capture and denote the role ofcontext-related properties and information. This ontology captures the domain ofmessage communication and exchange and describes how the context informationis actually attached to the messages. This ontology is constructed mainly to visu-

62 PAPER A

alize how trust is related to message and communication. It is important to notethat this ontology extends the topical trust ontology of Golbeck [147], introducedearlier, and it relates the notion of trust to communication and messaging context.Basic idea behind this extension is that:" The topic of a message can have impacton its trust level" [147]. As a result, this trust ontology could be seen as an ex-tension to topical trust ontology realizing the fact how trust can be fused withinmessages exchanged in the context of a communication environment. This conceptis modeled and presented using trustsRegarding property. Links and connectionsbetween persons are modeled by the Trusts property. Sub-properties of these tworelationships conform to trust levels of Golbeck’s ontology [147]. In order to modelthe relation of trust to the context, the ctxTRUSTS property is used. If we considerthe environment of a simple communication setting, we see the sender, receiver andthe communication network mediating them. The messages exchanged in betweenparties always have contexts, attached to them which in turn allow the compu-tation of ctxTRUSTS properties through Trusts and trustsRegarding properties,according to [344] [167].

Proof Markup Language’s Trust Ontology

Inference web [334] at Stanford University, has built a semantic web-enabled knowl-edge platform and infrastructure. This platform is designated to help users on thenetwork to exploit the value of semantic web technologies in order to give and gettrust ratings to and from resources on the web. This process is referred to as jus-tification of resources. To this end, a language called PML is used. PML [370](Proof Markup Language) contains a term set for encoding the justifications andis designated to work in a question answering fashion [241]. PML is designatedto help software agents to filter the resources on the web of semantics by proofchecking them and justifying the credibility of these resources, on behalf of theusers. PML ontology contains three sub-ontologies including: provenance ontology,justification ontology and most importantly trust ontology which captures honestyand trustworthiness statements pertaining to resources. The trust ontology [370] isone of the most important components in PML ontology and we briefly describe thestructure of this ontology. The approach presented here is modeling close notionsof trust and belief and how it affects the credibility of resources on the web. No-tions of belief and trust, with respect to their close semantics, have been presentedclosely in this ontology. Ontology structure presents the trust and belief relationsbetween a source and a sink (which are both realized and presented using agents)with respect to information from document source under investigation by respec-tive agents. The belief relation shows the belief of an agent about the source. Thespecific belief has a status (e.g. believes, disbelieves, ignorant). The trust relationshows an agent’s overall beliefs about information from the specified source. Themetric defined for trust and belief is probabilistic and for both elements a valuebetween range of 0 and 1 has been designated.


Konfidi’s trust ontology

With respect to metrics used for presenting the trust computational values andmodeling the mathematical notion of trust, there exist two approaches: presentinga trust metric with discrete values and metrics with continuous values. Brondsemaand Schamp [46] model and represent trust and distrust in a similar fashion usingcontinuous values. Having continuous range of values allows easier propagation oftrust values, along the edges on the networks, using inference mechanisms. Theyrepresent the relationship as the class and main concept of the ontology. Eachrelation is directed from source (truster) to sink (trustee). Properties of relationsare wrapped under the concept of trust item. The most important feature of thiswork is, like Jennifer Golbeck’s ontology [147], they have incorporated the notionof "Topical trust" in their ontology. It is used as an attribute and property, whichallows to state different features and properties of a relationship. Trust topicsand trust values are stated as properties of the trust relationship. In order todescribe trust relationships, an ontology is presented using RDF, which in turneases extending the FOAF vocabulary and profiles. Using the RDF properties,and taking into account that relationship can be described using FOAF vocabularyand ontology, then trust relationships can be described using trust ontology. Othertechnology that has been integrated is WOT [272] [43] (web-of-trust), that is usedto describe web-of-trust resources such as key fingerprints, signature and signingcapabilities and identity assurance [46] [43]. Ontology’s RDF schema is made of2 classes or concepts and 5 attributes or properties. As mentioned, the primaryconcept is Relationship between two people. Like most trust ontologies, there aretwo properties that are required for every Relationship, and they form the endpointsof every relationship; truster and trusted using FOAF vocabulary, both truster andtrusted have foaf:Person objects as their targets. Using WOT vocabulary, FOAF-defined Persons should also contain at least one wot:fingerprint property specifyingthe PGP, web-of-trust fingerprint of a public key held by the individual the Personrefers to. Most importantly, this property serves for two reasons; first assures theidentity of these people described on the both ends of relationship, and it also saysif one of the people does not hold any keys then system can ignore instantiating arelationship between them.

Comparison and analysis

In this section we will compare some of the most important afore mentioned on-tologies. We will try to point out common and shared points between mentionedontologies, and we will also try to address strong and weak points among them.Table 5.1 compares the ontologies reviewed so far based on the components of theontologies.

To further analyze the study we have done so far let’s consider a set of analysissubjects that affect the discussion on the comparison between ontologies. Depend-ing on the context and the subject of the study certain approaches are used and

64 PAPER A

Table5.1:

Com

parisonam

ongtrust

ontologiesbased

onontology

component

structureTrustO

ntologiesConcept(s)

Relationship(s)Instance(s)

Axiom

(s)Golbeck

Topicaltrust,Agent,

PersontrustR

egarding,(be-

tween

agentandTop-

icaltrust)

trust0...trust10(range

oftrust

metric),

trustSub-ject,

trustValue,trustedA

gent,(subproperty

oftrustedA

gent),trustR

egarding

"APerson

orAgent

(e.g.Alice)

trust-sH

ighlyRe

(trust10)trustR

egardinga

trustedPersonor

trustedAgent

(e.g.Bob)

On

trustSub-ject

(e.g.Driving)"

ToivonenDenker

Person,Topic,

Re-

ceiver,Message

Trusts(betw

eenPer-

sons),ctxT

RUST

S(betw

eenreceiver

andmessage),

trust-sR

egarding(betw

eenPerson

andTopic)

trustRegarding,

reTopic,(trustsA

-boslutelyR

e...

distrustsAbsolute-

lyRe),

ctxTRUST

S,(ctxtrustsA

boso-lutely

...ctxdis-

trustsAboslutely),

trustsRegarding,

Trusts,rePerson,

(trustsAboslutely

...distrustsA

boslutely)

Multiple

axioms

areinferable,

forinstance;

1)Stat-

ingtopical

trust;"A

Person(A

lice)trustsA

boslutelyRe

trustsRegarding

(re-lationship)

theTopic

(Driving)",2)Stating

trustbetw

eentw

opersons;

"aPerson

(Alice)

trustsan-

otherPerson

(Bob)

trustsAboslutely"

PML

Belief,

Element,

Trust,Element,

FloatMetric

Belief

Relation

(usinghasB

elieved-Inform

ationand

hasBelievingA

gentbetw

eenAgent,infor-

mation

andsource),

TrustRelation

(us-ing

hasTrusteeand

hasTrustorbetw

eenAgent,

information

andsource)

Agent,

Source,Inform

ation,hasB

e-lievedInform

ation,hasB

elievingA-

gent,hasTrustee,

hasTrustor,has-

FloatValue

Twokinds

ofAxiom

sregarding

thetrust

andbelief

ofagent

inan

information

froma

sourcecan

beinferred,

forinstance;

Statingtrust;

"FloatTrust,hasTrustee

andhasTrustor

(agent:userâs

browser)

And

hasFloatValuewith

FloatMetric

(0.55)."

Konfidi

Relationship,Item

About

(Betw

eenItem

andRelation-

ship)

About,

Truster,Trusted,

Rating,

Topic,

TrustRelationships

canbe

statedlike

thefollow

ingaxiom

;"A

(trust)Relation-

shipbetw

eentruster

(Alice)

andtrusted

(Bob)

exists,which

isabout

trusttopic

(Cooking)

with

trustrating

(0.95)."


implemented. If the subject of study is considering ontologies for knowledge man-agement then, it is preferred to use an algorithm to compare ontologies, since suchontologies may be heavy and may contain a large number of concepts and properties.As a matter of fact we can use weight of ontology as the basis of comparison. As alltrust ontologies convey the same meaning and that is representation and modelingtrust relationship, Context seems to be an important issue. So, we can comparetrust ontologies depending on the context they have been modeled and consideredin. Since a model should also ease and facilitate the inference and computation oftrust, then inference should be also an important topic to consider while analyzingtrust ontologies. Trust ontologies are used to generate trust networks and theyserve as the gear to rotate the automation of trust network generation, inferenceand maintenance, therefore we can consider comparing ontologies based on the easeof implementation as well. Ontologies should allow expressivity of trust statements.As axioms represent the trust expressions and statements on the social communityof trust, then we can also consider the semantic expressivity of the axioms inferredbased on the respective trust ontologies. Semantics of trust should be easy to un-derstand and should allow inference and justifications. The more trust ontologiesincorporate and integrate technologies and vocabularies that create expressive andreferenced, the more they will be easy to implement. Importing technologies andvocabularies make ontologies rich. As a matter of fact we can also consider basingour justification based on the number or technologies used in an ontology.

Weight

Considering the size of ontologies, Konfidi is the lightest ontology by having onlytwo main concepts and 5 properties and only one single relationship. PML has5 main concepts, but there are 2 types of relationship existing with 8 instances,making PML trust ontology the second in the place. While Golbeck’s ontology hasone single main concept (topical trust) and two other derived concepts (person andagent), 16 properties and one relationship, making it the third place holder. Trustontology of Denker/Toivonen has 4 main concepts and 3 types of relationships,making it the heaviest ontological representation of all. The reason for the excessivesize of the number of properties of Golbeck and Denker ontologies, is the trustmetric used; if the discrete scale between 0 to 10 was not chosen, and a probabilisticapproach was used then the mentioned ontologies would be way lighter, bringing thetotal number of elements to 11 in Golbeck and to 14 in Denker/Toivonen, make themthe top place holders at first and second place. As a matter of fact we can concludehere that the choice of trust metric and the approach toward computational aspectof trust measurement could affect the size of ontology drastically.

Context / Domain Dependence

As described context is one of the most important subjects to consider while build-ing a trust model for a domain of study. We also have to consider that there are

66 PAPER A

main elements that affect the construction of trust ontologies that could alter theirstructure. We want to consider construction of an ontology that could be basedon the main axes of trust, semantic web and social network. Considering the mainaxes and elements that affect the structure of ontology, could create a drasticallydifferent ontology with a set of different components. For instance if we considerthe trust in service-oriented environments, we have to consider trust as a notionclose to security, rather than belief and judgment. In that context trust is moreclose to reputation, while trust in the context of semantic web and semantic webdriven social communities is more close to belief and justification. As a result,context has a considerable impact of the constructing elements of trust ontologies.Among the ontologies considered, Denker/Toivonen is the most context-dependantontology, as the context of the trust study is communication and message-exchange.Taking a look into trust concepts incorporated into this ontology, we realize thatthe notion of trust relationship is tangled up in communicational concepts (Com-munication network, Message) make it completely dependent to communicationcontext although the rest of the trust components are very well-engineered. Sincethe trust ontology of PML is an axis of a triangle of provenance, justification andtrust ontology, all of the mentioned ontologies are incorporated and imported intoeach other to take advantage of the technical facilities of ontologies description andconsumption.

This feature makes trust axis of PML ontology, dependent to other three ontologiesand incorporating such ontology demands incorporation of the other two ontologies.At the same time this ontology is dedicated to evaluate and express the trust andbelief of an agent into a piece of information taken from a source of information onthe web. This feature makes it hard to express and conclude the trust between a setof persons, since the other pair should be described by agent as well, but it makes iteasy to derive and justify the statements of a person and state the belief and trustin the statement made by a person (for algorithms, in order to test the inferencecapability of trust example on a social network). In general, the approach thatontology, according to [46]. PML follows is "Trust for Question Answering" [386].As a result, PML trust ontology seems less context dependant in comparison toDenker/Toivonen and more customizable to the need for modeling trust, in general.While Konfidi makes representation of trust in the context of It is important toconsider that trust inference capability is an social semantic networks fairly easyand straightforward, at important factor that affects the implementation aspects ofthe same time it is extensible and useful to different contexts trust representation.and the future needs. Using the Konfidi’s ontology, you can state a statement oftopical trust between any set of resources or nodes (described by URI) on a semanticsocial network. Golbeck’s ontology seems the most essential and fundamental workon describing and stating trust using ontological modeling and representation forthe consumption on semantic web. Both Konfidi and Golbeck’s ontologies areamong the most context and domain independent ontologies and that makes themeasy to be customized and implemented in other domains of interests, demanding


for modeling of trust. We can state that the more ontology has components thatdirectly expresses the trust relationships and has less components and propertiesrelated to other domains, the more context-independent it will be.

Inference Capability

One of the most important issues while considering capturing of a domain insidethe structure of ontology, is the reasoning based on that ontology. Considering thesubject of discussion, it should be possible to infer trust values easily using thecorresponding trust ontology. As described, choice of trust metric plays a crucialrole in the design and composition of ontology. Given a set of entities (for instancetwo persons located on a network), ontology should facilitate the inference of com-putational trust value for the given entities. There are certain factors that affectthe efficient inference based on ontology such as the complexity and size of trustnetwork generated. The lesser trust network generated is complex the lesser theinference mechanism implemented needs to be complex. Golbeck’s ontology wasused for generating a network of semantic data, and was also used within a seman-tic web social network.

Research has shown great inference capability for this ontology [147] [146] [139][140].Golbeck has studied the inference mechanisms and has created and imple-mented inference algorithm to study the trust inference based upon her trust on-tology on two sets of trust networks, one a website for movie ratings and recom-mendations [145] and the other for spam filtering [146]. This makes Golbeck’s trustontology the only ontology widely used, implemented and inferred upon. Konfidiis also tested against network of semantic data, and has shown good performance.Konfidi uses trust strategies to implement different sorts of inference mechanismsand The inference capability of PML is implemented and has proven to be veryeffective as it is designated toward automatic resource evaluation.

Semantic Expressivity

Axioms that are inferred from trust ontologies express the semantics of trust. Themore clear and expressive these axioms become the easier they will describe thesemantics of trust within the implemented and stated context. Golbeck’s and Kon-fidi’s respective ontologies state the semantic trust relationships very easy to un-derstand and very expressive; for example using Konfidi; "Relationship betweentruster (Alice) and trusted (Bob) exists, which is about trust topic (Cooking) withtrust rating (.95)." and using the Golbeck’s ontology;" "A Person (Alice) trust-sHighlyRe trustRegarding a (Bob) on trustSubject (Driving)", adopted respectivelyfrom [147] [46]. As Denker/Toivonen use Golbeck’s approach, but the axioms gener-ated are less expressive as multiple contexts are taken under consideration and finaldriven axioms should have the notions of context, trust, communication. Consider-ing all intermediary relationships for example, a trust relationship between person

68 PAPER A

and topic could be described as; "a Person (Alice) rePerson trustsAboslutelyRe(trust metric) trustsRegarding (relationship) reTopic Topic (Driving)", adoptedfrom [147], which shows less expressivity than previous axioms. As described inthe table, PML has the less expressivity among all, but this is traded off with theinference capability of the ontology, as the inference should be consumed by soft-ware agents. There seems to be a tradeoff between the expressivity of inferencecapabilities of ontologies; as the ontology becomes consumable by software agents,the less expressive the inference products become.

Size of Trust Networks

We discussed that the trust network should be automatically generated during run-time so we can analyze and evaluate and finally infer and compute the trust valuesbased on the generated network. As the size of the corresponding networks grows,the harder the crawling and walking the trust paths becomes. So, it is importantto consider that the network generated could be analyzable and inferable. This hasto do directly with the structure based on which trust concepts and properties arepresented and described. For example Konfidi describes the topic and rating, asthe extra edges on the network tree. The more topics we incorporate the largerthe depth of the network generated becomes, so in order to increase the efficiency,authors of Konfidi’s trust ontology state that the extra information attached toedges could be saved separately, according to [46].

As the semantic concept of trust relationship has been described very efficiently(using a small set of necessary elements, e.g. only one main concept), the networksgenerated are very well-formed. It is logical to state that the efficient design ofontology directly results on the efficient design of the networks generated and used.As our ontology is introduced in next section, we use network size prospect to ana-lyze the networks generated using our own ontology. adding other new features oftrust relationships when needed; such as the date of initiation of trust relationship,terms of relationship, etc. Integrating different vocabularies, enriches the structureof the ontology, reduces the number of ontology components and eases the inferencebased upon the respective ontology. Considering standardized vocabularies and on-tologies, not only reduces the number of elements, but also eases future adoptionof new properties of implemented vocabulary-driven features.

Vocabularies Incorporated

As mentioned before, Golbeck’s trust ontology was indeed a milestone in the fieldof the work being done for representing trust and belief in statements done on asemantic web-driven community and society. She not only introduced a method ofrepresenting trust on semantic web and semantic web-powered societies, but shealso introduced the notion of topical trust and subjective trust. By enabling thesubjective trust we can state and represent how a sink and a source trust each other

5.4. ENGINEERING AND CONSTRUCTION OF TRUST ONTOLOGY 69

based on a specific subject and then measure this trust in subject and topic accord-ing to a specific trust metric. Most of other works within trust representation onsemantic web and semantic web-driven social networks either base their trust modelcompletely or partially on Golbeck’s trust ontology. Denker and Toivonen incor-porate the subjective and topical trust as well into their ontology. They also usethe trust range of Golbeck for contextual trust and personal trust representation.Konfidi also incorporates the topical trust. Although not standardized, topical orsubjective trust is a requirement for any kind of model capturing the trust rela-tionships. All of the studied ontologies take advantage of friend-of-a-friend (foaf)vocabulary.

Golbeck and Konfidi use the foaf vocabulary to describe the two sides of trustrelationship. PML uses it to describe the agent that assesses and evaluates theinformation. Among studied ontologies, Konfidi incorporates and integrates themost number of vocabularies and technologies. In addition to foaf and topical trustvocabularies, Konfidi also incorporates relationship vocabulary [91] and it also usesWOT [272] (web of trust) vocabulary. Using the relationship vocabulary leavesspace for adding other new features of trust relationships when needed; such asthe date of initiation of trust relationship, terms of relationship, etc. Integratingdifferent vocabularies, enriches the structure of the ontology, reduces the number ofontology components and eases the inference based upon the respective ontology.Considering standardized vocabularies and ontologies, not only reduces the num-ber of elements, but also eases future adoption of new properties of implementedvocabulary-driven features.

5.4 Engineering and Construction of Trust Ontology

The same as all engineering sciences, in order to engineer an artifact, an iterativeprocess should be considered where each step proliferates and extends the previousstep in the loop to construct the artifact under focus. Ontology engineering andlearning is a semi-automatic process, consisting of six main interrelated phases,according to [40] [89]. These phases include: domain understanding, data under-standing, task definition, ontology learning, ontology evaluation and refinementwith human in the loop, respectively taken from [40] [89]. We use this approach inorder to construct and build our trust ontology. We can state that our experiencenot only can serve as a methodology and mechanism for ontology construction butalso, considering the domain of our problem, it can serve as a guide to engineeringand construction of trust representations and protocols using ontologies.

Determining the Domain and Scope

Considering the domain of problem, we are engineering an ontology, which servesas the representational structure of the relationship visualizing trust and trustwor-thiness of a set of individuals based on a social network. We can state that this

70 PAPER A

ontology rotates on four main axes; Trust, Relation, Social network, Semantic Web.So, we can state that the domain of our ontology is representation of trust withina social network based on semantic web.

Understanding and Learning the Data

Domain and scope of ontology create boundary that captures the data relevant tothe ontology under consideration. Since our ontology serves as a representationalmodel, then we understand that the focus will be put on the data that are repre-sented, and that is trust relationships. As relationships are compound data madeof couple of atomic subcomponents, then atoms of relationships will form the dataof our ontological domain. Relationships are described between entities and theseentities are individuals on the social network, connected with trust relationship to-gether. We consider persons on the social network, so data about people will serveas our data. Data about people on the social semantic network are described withinFOAF files, which are described using RDF. At the same time Web-of-trust alsoprovides data about the identity of people on our network as well as availability oflinks between these people on the network of relationships. As such relationshipsshould describe trust as well, properties of trust are also among the pieces of infor-mation that are also useful to create data for ontology, such as the measurementand metric value used to describe the value of trustworthiness. In order to be ableto describe the subject of the trustworthiness evaluation, we need also a subjectlist, so, available subjects and topics can be mentioned as available data on thedomain. The data needed within this ontology is information that composes trustrelationships and properties that relationships have. The main data would be peo-ple’s relationships and properties describing them and their relationships; here asmentioned metadata FOAF profiles of people can compose such relationships.

Defining Tasks

Available data describe not only information about people that make the atomsof relationship molecules but also the properties of relationships. When domainis specified and the data available are recognized and learned, usages and func-tionalities of ontology being constructed is specified. Taking into consideration thedomain and scope of ontology, which is representation of trust relationships and thedata available that are information about people creating the relationships, we canstate that the task of such ontology would be clearly, describing and representingtrust relationships. Every relationship has a set of main properties, which describethe nature and purpose of relationship. These properties specify the details of trustrelation. Each trust relationship has a topic or subject (topical or subjective trust).In order to make trust computable, on any existing edge on the network thereshould be a value.


This value represents the trust metric used for the representation of trust relation-ship. So, we can consider Value also as a main property of relationship. Now thatwe have learned the main elements of ontology, it appears most of the trust on-tologies share the same components described so far. Relationship described usingontologies have a set of auxiliary properties, as well. Using this component we canput more details on the relationship and we can give it more weight and mass. Itis important to realize that only properties that have less importance than mainproperties, are described using these properties. These properties are used to givemore weight to Relationship. Using a separate element for auxiliary propertiesleaves space for future extensions that are needed to add to the network Trustrelationships are context-sensitive. Context describes whether this relationship isdescribed inside a personal network or a business network. By using context, wecan make networks of different types. Using this element we can create simplenetworks and hybrid networks. For instance, simple networks are either a personalnetwork (such as Orkut [276]) or a business network (such as LinkedIn [220]). Wecan have a relationship in the context of a personal network. We can also have asimple trust relationship in the context of a business network or perhaps a businessenvironment. When the source is from a personal network and is connected to asink from a business network we have connected two networks of simple type, creat-ing a hybrid network. As a matter of fact context type can give more details aboutthe type of network where this relationship is described in. This auxiliary elementgives more details about the type of network the relationship is based upon.

Ontology Learning

Using the knowledge acquisition and learning capabilities with the help of our con-struction and development environment, we are able to learn the ontology. As themain component of the ontology is relationship, that represents the connection be-tween entities on the network then relationship itself serves as the main componentand concept of this ontology. We can think of relationship as a composite objectmade of subcomponents that reside within the relationship and describe the prop-erties of relationship. Each relationship describes an edge on the network, this edgeexists between a set of nodes. These nodes represent starts and ends of directededges (of relationship). hasTruster and hasTrustee respectively represent the twoimportant properties of relationship on a ne twork. So far we have learned theelements of relationship on a social network. Considering the reason that a rela-tionship can be established based upon, we have also incorporated a Goal propertydescribing the reason that a relationship was based upon. A relationship can havea goal that describes why respective relationship is formed. For instance on a socialnetwork, usually the goal for establishing a relationship is friendship, or on a busi-ness network, it is seeking business partnership. The most important subsidiary andoptional property that we have considered in our ontology is having a recommenderas the initiator of the trust relationship establishment. hasRecommender is an aux-iliary property describing a person on the network that has recommended trustee,

72 PAPER A

or the sink of relationship, to the truster. In other words, we have described notionof "trust in recommendation" in order to shape and form a relationship, initiatedfrom truster, ended up in trustee, based on guarantee of trust recommender. Usingsuch property we can create networks of different strengths; we can have networksof weak links and strong links.

Figure 5.1: Structure of our trust ontology, 3 main concepts of trust ontology aswell as two edges connecting them together [98,100].

A strong link is a relationship based upon the recommendation of an entity. Themore recommenders a relationship has, the stronger this relationship become. Aweak link is a relationship that has no recommenders. When speaking in terms oftrust, in context of information systems, a way of achieving trust is using a recom-mender. Considering the transitivity property of trust, the trust in recommenderis used in certificate authorities to achieve trust in a third-party. We can takeadvantage of this property in semantic-web driven social networks to create strongpaths in order to use them as the path for aggregating and computing trust val-ues along the network. By specifying auxiliary properties we follow two importantgoals; Adding more details about relations to ontology and giving more meaningand details to specification of relationships, as well as leaving space for adding moreelements describing the other aspects of relationships that may be needed in future.


Ontology Evaluation

As ontology development is a semi-automatic approach and demands involvementof both human and machines, in this phase as well as previous phase we take ad-vantage of using an automated tool in order to build and evaluate our ontology.In this phase we build and evaluate the ontology learned in previous phase. Byevaluating the ontology, we estimate the quality of the modeled solution to theaddressed tasks defined in previous sections. It is worth mentioning that in most ofthe phases of ontology engineering the role of human wouldn’t be completely fadeand human will participate in almost all of the phases of ontology development. Inorder to model and describe the elements and components of the ontology we useProtege ontology 1 editor and knowledge acquisition system. Figure 5.1 visualizesthe structure of our trust ontology.

As shown our ontology has 3 main concepts or classes that capture the structureof the trust relationships on the networks. Relationship is the main element andconcept of our ontology. MainProperties and AuxiliaryProperties are the othermain components of our ontology. We have two associations that connect bothMainProperties and AuxiliaryProperties to Relationship. These associations arehasMainProperties and hasAuxiliaryProperties. Relationship always has a sink anda source, which we have described here as truster and trustee. Both hasTruster andhasTrustee are defined on the range of foaf:Agent which enables us to describe rela-tionships in the context of semantic social ecosystems. This agent can be a person,an organization or just a software agent. Each Relationship has to have a trusterand a trustee and at least one main property. Without these mentioned elements,a relationship is partial and partial relations are undefined using our ontology. Inorder to ensure having at least these mentioned elements, we have put restrictionson ontology subcomponents. Restriction defines a blank node with restrictions. Itrefers to the property that is constrained and defines the restriction itself.

Cardinality constraints define how many times the property can be used on aninstance of a class. We have minimum, maximum, exact cardinalities. We haveused two exact cardinalities on hasTrustee and hasTruster, in order to state havingexactly one truster and one trustee for a relationship. We have also used minimumcardinality for hasMainProperties to make sure having at least a topic and a valuefor each relationship, and since we can have more than one topic to base the relationupon, we have used minimum cardinality (at least). MainProperties element hastwo main properties; Subject and Value. We have described these two propertiesusing data type properties, in OWL (Web Ontology Language). Subject takesstring value. It is recommended that subject taxonomies or topic ontologies bedefined, so we can use a common namespace for describing topics and subjects.Each relationship can have multiple main properties, which means it can be aboutdifferent topics and subjects, but each main property has to have one and only one

1Protege, http://protege.stanford.edu/

http://protege.stanford.edu/

74 PAPER A

topic and only one value. For instance in the relationship between Alice and Bob,Alice can completely trust Bob on Driving (Subject="Driving", Value="0.95"), andalso can distrust Bob on Cooking completely (Subject="Cooking", Value="0.10").This constitutes two distinct main properties in relationship between Alice andBob. But we cannot have multiple subjects and values in the MainProperties ofAlice and Bob on Cooking, for example. In order to enforce this property we haveput restriction on both properties of value and subject. By using exact cardinalityrestriction we have enforced having exactly one subject and exactly one value foreach item of trust within a relationship. Finally, AuxiliaryProperties concept ofdomain has 5 properties and also leaves space for more properties whenever needed.AuxiliaryProperties has an object property and 4 data type properties. It hashasRecommender, which is the element describing the strength of relationship andis defined on the range of foaf:agent that lets us to state which node on the networkis the recommender for the establishment of this relationship. ContextType isdefined as a string data type property that states the context of the trust network,the relationship is based on. Goal of the relationship is also defined using a stringdata type property. DateBegin and DateEnd are described using Date data-typeproperty. Clearly we don’t need to have restrictions on any single property ofAuxiliaryProperties concept.

Discussion

As modeling trust is the main target of our work, a brief discussion on the notionof trust and how we have modeled the trust in our approach seems necessary. Asdiscussed, trust is a context-sensitive issue. While considering the context of thetrust ontology and trust analysis, we realize that this context is a multi-dimensionalentity composed of two substantial and main dimensions; semantic web and socialnetworks. Trust in the domain of social semantic networks, has three relativelyclose notions such as belief, provenance and justification. Some of these notionshave very close and sometimes overlapping meaning to trust. Among mentionednotions, belief seems to be a very close notion to trust. It seems that belief andtrust go hand in hand. Discussion on modeling belief has a long background. Thework on belief goes back to Willard Van Orman Quine’s "web of belief" [295]. Areminiscent of web of trust is created by [97] and is weaved into semantic web.They define web of belief as following "by cognitively viewing knowledge as indi-viduals’ rational beliefs about the world, individuals share knowledge and form adistributed knowledge network, which is called the web of belief, where rationalbelief links individuals with world facts and trust interlinks individuals as externalinformation sources." [97].

In our work, we have only considered modeling trust and distrust. Consideringmodeling other notions described takes a great effort and deal of modeling, as eachone of these mentioned notions demand their own properties and eventually theirown ontologies. As a matter of fact, as we have generalized the notion of trust


relationship in our approach to Relationship, then we have provided enough spacefor future extension. We can build belief ontology that can be imported within ourtrust ontology and certain elements of these ontologies can be shared and consumedwhenever needed. Aside from such possibility then there is a need for future re-search for defining the nature, usage and representation of belief and judgment insemantic social networks. Using our ontology, we can describe trust in other peo-ple on the network regarding a certain topic. Taking into account the discussionswe had in previous section, what we are describing here is trust in performance.When we state that "Alice trusts Bob regarding Driving", this means that, "Alicetrusts in eventuality of performance of Bob to some extent, when the act of drivingis performed". Trust in performance describes that truster states the trust in theperformance of act of trustee, when this act is performed. This trust uses a proba-bilistic approach to describe trust relationships, so we can say how much someonetrusts the other on a range between 0 and 1. For example, as shown previouslywe can state, Alice trusts Bob completely regarding a topic. This amount of trustis mapped to a floating point value between 0 and 1, so we can state range of 0.9to 0.99, is a range showing that you completely trust the person you are express-ing trustworthiness about. Considering the discrete range of Golbeck’s ontology,which is between trust0 to trust10, then we realize that we are having an implicitmapping from a range of discrete values to a range of concrete values. Choice oftrust topic is also considerable for improvement in future works. As we stated, wehave modeled Specific Trust and we have clearly eliminated the notion of generaltrust. It is important to point out that a relationship should have at least a topic.One of the important notions that we can consider discussing here, is distrust. Forinstance, "Alice distrusts Bob regarding babysitting to some extent (0.65)", usingour ontology it can be also stated like "Alice trusts Bob regarding babysitting tosome (complementary) extent (0.35)", adopted from [147] [46].

As it is clear we have modeled distrust, implicitly. We have assumed that there is atradeoff between trust and distrust on the same topic. We can also model feelingsusing our trust model. If we take all of the evaluation values for a relation, andaverage it, we can derive the amount of feelings between the trustee and truster.We can derive negative or positive feelings. If there are certain number of trustitems (or MainProperties; subjects and values) for a relationship, for instance atleast 3, we can consider taking average of the values and deriving a general feelingof truster for trustee. For instance, if Alice has low trust values for Bob in all ofthe subjects in their relationships, then we can state that she has negative feelingsfor him, or vice versa. Although, there are many certain properties that shouldbe considered that affect feelings of people for each other and trust is only one ofthem. Therefore, we can state here that more elements are needed to give us thisability to create feelings statements in our ontology. We want to be able to choosetwo nodes, a source or truster and a sink or a trustee (trusted), and gather trustvalues on a path between them on the network and eventually compute a valuerepresenting the trust of truster in trustee. In order to address this problem; we

76 PAPER A

have made sure that each relationships on the network has a value, and we haveintroduced recommenders. Our ontology ensures that if there is a relationship (alink on the network) between two nodes, then this link has a value, although thisvalue doesn’t reflect the general trust value of trust between truster and trustee.In addition to using recommenders, we can use our ontology to create a networkof recommendation on the network of trust. We can use recommended links forour trust inference. As we described, recommendation can state the strength of anexisting link, so we can use such "recommended link" for our inference along thepaths. Theoretically, such paths are stronger and can give better values than otherpaths that do not have recommenders. One of the main challenges in this context isdealing with distrust values, when encountered on the network. Values of distrustdrop the aggregated values along the paths on the network, and there is no certainprocedure or methodology to address dealing with this problem.

5.5 Trust Network Analysis

We begin by analyzing a network of small size. This gives us the ability to easily,visualize and realize the structure of modeled relationships. Then we move tonetworks of larger size where we introduce two types of trust network structures;hybrid and meshed networks.

A Small Size NetworkLet us begin with the smallest network size, possible; a network of two people, witha single relationship, containing a main property and an auxiliary property. Let usconsider modeling following relational semantics for this atomic network:

Alice trusts bob in driving a lot.

Using our OWL trust schema and ontology, this network will be presented in RDFformat as following;

<foaf:Person rdf: ID="Alice"/><foaf:Person rdf:ID="Bob"/><Relationship rdf:ID="Relationship_Alice_Bob">

<hasTrustee rdf:resource="#Bob"/><hasTruster rdf:resource="#Alice"/><hasMainProperties><MainProperties rdf:ID="MainProperties_Alice_Bob">

<Subject rdf:datatype="&xsd;string">Driving</Subject><Value rdf:datatype="&xsd;float">0.95</Value>

</MainProperties></hasMainProperties>

</Relationship>

5.5. TRUST NETWORK ANALYSIS 77

Figure 5.2: hybrid network. Two connected networks of different contexts; a per-sonal and a business network. Hybrid networks, contains 8 people and 12 relations.8 links are interconnections (local), and 4 links are acting as glue connecting twonetworks (foreign).

Hybrid Trust Networks

Here, we will consider 2 groups of people, representing two networks of differentcontexts. Each group of four people is interrelated and interlinked, forming a simplenetwork. At the same time a set of these people are connected outside of their ownlocal networks, to other foreign network. These relations work as glue connectingnetworks of different context, creating Hybrid networks. In hybrid network depictedin Figure 5.3, people located on one network, are shaping a personal context andtheir goals are more or less establishing friendship relations, while people on theother network are members of a business network, and their goals are establishingbusiness partnerships and relationships and they could be colleagues in an officeenvironment. It is also considerable to think of the business network as a business-value adding network, or a service oriented environment. In that case, then fourlatter members can be software agents, which can also be described using ourontology.In order to consider the structure and size of the network generated, a circularrepresentation of network is given in representation Figure 5.3.

78 PAPER A

Figure 5.3: a circular representation of hybrid network subject to study. (Networkcontains 48 hybrid nodes and 92 edges)

Figure 4 visualizes the RDF trust network depicted in Figure Figure 3. Figure 4 isvisualized using Welkin 2.

Meshed Trust NetworksThe motivation for studying larger networks of trust, was considering real-worldscenarios of network formations. Such networks are complex, combined networksof different sizes and different contexts. We call these networks, Meshed networksnetworks.Meshed networks are considered networks, where every node is connected to allother nodes on the network.As such, assumption is unrealistic, and there is only a subset of nodes available thatare fully connected to all other nodes, fully we consider partial and fully connectedmeshed networks networks. Taking idea from networking topologies, partial meshednetworking networks are trust networks where each node is at least connected to asubset of nodes it has data exchange with. On the other hand, a fully connected

2SIMILE Welkin. http://simile.mit.edu/welkin/

http://simile.mit.edu/welkin/

5.6. STRUCTURAL COMPARISON 79

Figure 5.4: A partial meshed network made up of two connected hybrid networks.This network contains 16 people and 26 relations.

meshed network is a trust network where each node is connected to everyon e. In theformer, inferring trust values between a pair of nodes on the network seems difficultbut, finding a path between a set of nodes on the network is guaranteed. Usingour ontology, recommendations can find efficient paths on the network. Figure 5depicts a partial meshed network of people from different contexts and with differentgoals perhaps, and can be thought of two hybrid networks integrated and mergedtogether. Figure 6 is a visualization of the RDF network for trust network depictedin Figure 5.

5.6 Structural Comparison

In order to emphasize the importance structural determination of trust networks,in this section we consider comparing the structure of the trust networks generatedbased on three different ontologies; our ontology, Golbeck’s and Konfidi’s. In thelast subsection we discuss in details the results of comparison. For the sake ofcomparison, we have divided the experiment datasets into two sizes; small sizednetworks and large sized networks.

80 PAPER A

Figure 5.5: A circular representation of meshed partial network. (Network contains98 nodes and 198 edges)

Trust Networks of Small SizeBased on our structural point of view, Table 5.4 lists the number of nodes and edgeson the compared networks.

As it is clear, in general the nodes and edges on the networks generated usingGolbeck’s ontology is quite smaller than networks generated using our ontologyand Konfidi’s. At the same time in both cases our network has a smaller number ofnodes and edges than Konfidi’s networks, although the difference is not that much.

Trust Networks of Large SizeWe described and defined hybrid and meshed networks. At the same time, wemodeled these networks using datasets that to some extent reflect the structure ofsuch networks. The same datasets were also injected into the structure of two othertested ontologies to consider the structure of the resulting trust networks. Basedon our structural point of view, Table 5.4 lists the number of nodes and edges onthe networks.

5.6. STRUCTURAL COMPARISON 81

Table 5.2: Networks of 4 people and 4 relationships. (Increase in size)

Trust Networks Golbeck Ours KonfidiNodes 15 20 22Edges 28 34 37

Table 5.3: Networks of 4 people and 6 relationships. (Increase in depth)


Table 5.4: Hybrid Network (network of 8 people and 12 relationships).


Table 5.5: Meshed network (Networks of 16 people and 26 relationships).


Table 5.4 shows the number of nodes and edges on the networks representing thehybrid network. Network generated using Golbeck’s ontology has less nodes andedges than both of ours and Konfidi’s. Although, network generated using ourontology has less number of edges and nodes in comparison to Konfidi’s. Tabletable 5.4 shows the number of nodes and edges on the networks representing meshednetworks. Again, Golbeck’s network has less number of nodes and edges than ournetwork and Konfidi’s network. Our network has greater number of nodes thanboth, Golbeck’s and Konfidi’s networks, but lesser number of edges than Konfidi’s.

Trust networks of larger size

We continued our study by modeling and presenting the trust networks of largersizes. We also expanded our sample partial meshed network and increased thenumber of people in the networks and their corresponding relationships randomly.The structure of the resulting networks was studied from the perspective of numberof edges and nodes, the same structural perspective used for comparison between

82 PAPER A

networks of small and large size. In our experiment we expanded the sample partialmeshed network of 16 people and 26 relationships. The number of people and theircorresponding relationships were sampled and plotted at each sample increase toreflect the progress of expansion across the network structure. These data weregenerated using all three ontologies being evaluated. Figure 5.6 5.7 5.8, depicts theeffect of seamless increase in the size of trust networks of larger size from structuralpoint of view.

Detailed analysis of structural comparisonsIn this section we further analyze and study the results of our experiment andcomparisons. As shown in Tables 5.4 and 5.4, trust networks modeled, describedand presented using our ontology and others are compared based on the numberof nodes and edges (structural perspective). Comparison shows that in networks ofsmall size, our ontology shows average performance in comparison to other ontolo-gies, meaning that trust networks generated have average sizes, in comparison. Butas the size of the networks increases, certain aspect of trust network size increasesmore than other compared network, showing less efficient performance. This de-crease in efficient performance is also well-depicted in networks of larger size inFigure 5.10. There are a set of reasons, which can be stated here.

Clearly, the main reason, for size increase in networks, is the number of elements in-corporated within the structure of ontology. Golbeck’s ontology uses only one mainelement, Konfidi uses two main elements, while our ontology uses three main con-cepts. Her trust schema has a very efficient design. Such design has certain aspectsthat reduce the size of the networks described using that ontology; first, defininglevels of trust (trust0...trust10) and trustRegarding on the range of foaf:agent letsyou describe the trust directly as the properties of agents and on the trust network.Such efficiency in design lets you describe relations very easily with lesser elements,as seen in results. Konfidi’s trust ontology has more or less the same structure likeour ontology. Our ontology has one more element than Konfidi’s, however we haveseen networks of smaller size generated by using our ontology have less complexstructures than the ones generated by using Konfidi’s ontology, 5.11.Figures 5.11 visualizes the structure of the networks generated using our ontology.The emphasis on the visualizing was put on the gravity of the instances on thenetwork toward their originated main elements. An efficient structure will depictthe overall organization of the ecosystem and its sub-ecosystems. Our networkshows better clustering of elements among the two other samples.The second reason would be efficient design of the ontology. Golbeck’s ontology isindeed, a mile stone in the work on trust in semantic web, from different perspec-tives. The third reason is the AuxiliaryProperties element of our ontology. As weincorporated an extensibility element for describing secondary and optional prop-erties, we will incorporate extra nodes and more importantly extra edges into thenetwork. In most of the test data for the comparison section, we have auxiliary

5.7. CONCLUSION 83

property elements with at least one sub-element filled. For instance, when describ-ing hybrid networks, all relationships have AuxiliaryProperties with ContextTypeproperty of either simple social network, or simple business network, or hybridnetwork. It should be mentioned here that none of the other compared ontolo-gies, have any element for describing extra properties; extending Golbeck’s trustontology seems to be very hard and needs drastic changes because of its architec-ture, and Konfidi doesn’t have any elements for describing extra properties. Takinginto account this information, if we eliminate the AuxiliaryProperties element, thenthe size of our network becomes even more efficient than both other ontologies, incertain situations.

5.7 Conclusion

We analyzed the modeling and representation of trust relationships across the net-works within semantic web-driven ecosystems. In order to capture, model andrepresent the semantics of trust relationships within semantic web, main compo-nents of relationships are represented and described using ontologies. To analyzethe methodologies and mechanisms used to described trust relations, we studiedand analyzed a set of trust ontologies, specially Jennifer Golbeck’s and Konfidi’strust ontologies, which share the same context with our research context. At theend, we engineered and analyzed a trust ontology based on the context of our re-search, social networks and semantic web. We constructed a trust ontology in whichrelationship is the focus of ontology, as ontology captures the semantic of trust re-lationships, and two other elements state the properties of trust relationships.

In comparison to previous works, there are certain new features that our work in-troduces to trust ontologies in this context; using our AuxiliaryProperties, we giverelationships more weight and meaning. We have introduced the hasRecommenderproperty that can determine the strength of the links on social network and can beused for finding the suitable inference path on the network. We claimed that deter-mining the structure of trust networks could be possible by efficiently designing andengineering trust ontologies that such networks are based upon. We also demon-strated this fact by using the same datasets on both our ontology and two otherontologies. Results of our experiment fairly prove our claim. Having more elementsthan other ontologies, networks generated based on our ontology show average sizeand structure. Also our trust networks shows far more manageable structure andarchitecture as the size increases, in comparison with two other compared ontolo-gies. As a conclusion, we can state that ontologies are very promising technologies.Utilizing ontologies in modeling and representing trust in semantic web-enabledsocial systems seems to be a highly efficient methodology and mechanism.

84 PAPER A

5.8 Future Work

Studying the social phenomena within computer science and especially semanticweb, demands more attention. I believe by having a liaison between social sciencesand computer sciences, more fruitful results can be achieved, that can help bringingsocial ecosystems into life on the web. Number of vocabularies, used to describethe elements of ontologies should increase. There is a vocabulary to express rela-tionships [91], but there is no standard vocabulary to express for instance, commonsubjects and topics of a relationship, while we can describe vocabularies using wecan easily describe a vocabulary for this matter. The application domain is verylimited and one of the most important future works on this field is spotting cer-tain fields that demands further attention. Current applications are just limitedto Spam filtering and user rating systems across web sites on internet. One of themost important future works is spotting further applications for social trust, wheretrust relationships can be modeled and expressed using ontologies.

5.8. FUTURE WORK 85

Figure 5.6: Increasing the size of Golbeck’s trust networks. The diagram depictsthe increase in range of nodes and edges, starting from network of 20 people and18 relations, ending at a network of 108 people and 104 relations.

Figure 5.7: Increasing the size of Konfidi’s trust networks. The diagram depictsthe increase in range of nodes and edges, starting from network of 28 people and32 relations, ending at a network of 96 people and 66 relations.

Figure 5.8: Increasing the size of our trust networks. The diagram depicts theincrease in range of nodes and edges, starting from network of 24 people and 20relations, ending at a network of 112 people and 64 relations.

86 PAPER A

Figure 5.9: A clustered visualization of the structure of a meshed trust networkbased on Jennifer Golbeck’s ontology. This network contains 49 nodes and 132edges.

Figure 5.10: A clustered visualization of the structure of a meshed trust networkbased on our trust ontology. This network contains 98 nodes and 198 edges.

Figure 5.11: A clustered visualization of the structure of a meshed trust networkbased on Konfidi’s trust ontology. This network contains 86 nodes and 211 edges.

Chapter 6

Trust-Aware User Profiling:Modeling and Learning

N. Dokoohaki and M. MatskinPersonalizing Human Interaction through Hybrid Ontological Profiling: CulturalHeritage Case Study, 1st International Workshop on Semantic Web Applicationsand Human Aspects (SWAHA), Collocated with 3rd Asian Semantic Web Confer-ence 2008 (ASWC ’08), 2008, pp. 133-140.

87

Personalizing Human Interactionthrough Hybrid OntologicalProfiling: Cultural Heritage CaseStudyNima Dokoohaki1, Mihhail Matskin2


Abstract

In this paper we present a novel user profile formalization, which allowsdescribing the user attributes as well as history of user access for personal-ized, adaptive and interactive experience while we believe that our approachis applicable to different semantic applications we illustrate our solution inthe context of online and onsite museums and exhibits visit. We argue thata generic structure will allow incorporation of multiple dimensions of userattributes and characteristics as well as allowing different abstraction levelsfor profile formalization and presentations. In order to construct such profilestructures we extend and enrich existing metadata vocabularies for culturalheritage to contain keywords pertaining to usage attributes and user relatedkeywords. By extending metadata vocabularies we allow improved match-making between extended user profile contents and cultural heritage contents.This extension creates the possibility of further personalization of access tocultural heritage available through online and onsite digital libraries.1

1This work has been done within the FP7-216923 EU IST funded SMARTMUSEUM project.

89

90 PAPER B1

6.1 Introduction

Lessons learned from the adoption of Web technologies helped researchers realizethe importance of human factors. From user perspective, Semantic Web faces manychallenges in successful deployment [157]. Interactivity and usability of technologiesare dependable on infrastructure. Semantic Web enabling infrastructure continuesbeing developed and utilized. Heterogeneity of resources and users across Inter-nets and intranets hinders the mass adoption of these tools. Considering increasingproliferation of semantic Web driven technologies and tools, personalization tech-niques could build a bridge between the users and Semantic Web [106]. In short,personalization is customizing the information content or the adapting the visual-ized experience of the system to the user’s preferences and interests. Personalizationand personalized systems came into life, as a result of research and study into suchdrastic increasing problem across Internet and intranets world-wide. By construct-ing personalized information systems which wrap existing legacy systems, theirdatabases and their contents, we can facilitate access towards existing informationcontent decentralized across different databases through personalized informationretrieval. In order to personalize the process of information retrieval, on behalf ofthe user personalized system creates a model of usage behavior and further developsuch model by creating a profile of the user which documents the history of usageand attributes of the usage. Through this observation, system understands thepreferences of the users. By understanding the preferences of the users, system cantailor the information to users’ needs. In order to construct a correct model ofthe user, we need to understand from which dimensions and perspectives users willobserve and experience the system. As a matter of fact, a profile that documentsthe experience and behavior of the user will incorporate certain attributes whichpertain to these dimensions and perspectives. In order to point out features of suchprofile structure and format, we take a generic approach which allows incorporationof all sorts of user attributes as well as allowing the documentation of history ofexperience of user as well. We incorporate ontological description of user data inorder to ease the interoperability of content, when shared across multiple systems,as well as adding meaning and semantics into the concepts of usage domain. Alongwith the content, we also describe how this format and structure allows the con-textual information to be documented and presented.

While we consider our approach to personalization as general enough to beused in different semantic web applications our case study mostly concerns systemsproviding access to cultural heritage. In this context we extend cultural heritage

The overall objective of the project is to develop a platform for innovative services enhancingon-site personalized access to digital cultural heritage through adaptive and privacy preservinguser profiling. Using on-site knowledge databases, global digital libraries and visitors’ experien-tial knowledge, the platform makes possible the creation of innovative multilingual services forincreasing interaction between visitors and cultural heritage objects in a future smart museumenvironment, taking full benefit of digitized cultural information.

6.2. BACKGROUND 91

metadata keyword sets to include attributes pertaining to user behavior. By includ-ing usage attributes in the vocabularies along with existing metadata we providethe possibility of query extension to include a subset of these keywords which arealready documented in ontological user profile. When querying digital culturalheritage content, these keywords are matched against museum digital metadatacontent and allow personalized matchmaking of items and user profiles.

The rest of the paper is organized as follows; in the second section, backgroundis presented 6.2, in the third section the profile structure 6.3 is introduced, in theforth section we introduce the extension of a sample metadata vocabulary withuser attributes 6.4 and we also describe the process of matchmaking of profiles andextended metadata 6.4. We finally conclude in the fifth section 6.5.

6.2 Background

Nora Koch [200] describes a user profile as a simple user model. A user profile is acollection of personal information. The information is stored without adding furtherdescription or interpreting this information. It is comparable to a getting-settingmechanism of classes in object-oriented programming, where different parametersare set or retrieved. User profiles represent cognitive skills, intellectual abilities,and intentions, learning styles, preferences and interactions with the system. Theseproperties are stored after assigning values to them. These values may be final ormay change over time. Depending on the content and the amount of informationabout the user, which is stored in the user profile, a user can be modeled. Thus,the user profile is used to retrieve the needed information to build up a model ofthe user. Koch also describes a user model as the representation of the system’sbeliefs about the user. The "real world" user is perceived by the system through thehuman computer interface. According to Wahlster and Kobsa [359], informationabout the user is usually collected in a so-called user model and administrated by auser modeling system. They define (in the context of a dialog system) the followingtwo fundamental concepts: "A user model is a knowledge source in a system whichcontains explicit assumptions on all aspects of the user that may be relevant tothe behavior of the system. These assumptions must be separable by the systemfrom the rest of the system’s knowledge." User profiling is either knowledge-basedor behavior-based according to Middleton et al [247] . Knowledge-based approachesengineer static models of users and dynamically match users to the closest model.Questionnaires and interviews are often employed to obtain this user knowledge.Behavior-based approaches use the user’s behavior as a model, commonly usingmachine-learning techniques to discover useful patterns in the behavior. Behaviorallogging is employed to obtain the data necessary from which to extract patterns,according to Wahlster et al [359]. Fröschl [125], states that difference between userprofiling and user modeling relies in the different levels of sophistication. He statesthat, in general, the profiles contain "raw material" gathered and acquired from

92 PAPER B1

a user while, when such data is processed it will be used to build up a model ofuser, creating a sophisticated perception of user. Hybrid modeling and profilinghave been discussed extensively [48, 387]. Hybrid user modeling can be definedas combining user attributes and content attributes for improving personalizationeffect. Profiles created and composed based on matching extended user profilesand items, are referred to as hybrid profiles. Hybrid approaches to user modelingand profiling, are either focused on combining strategies for profiling [289] or oncombining user models [32]. We have introduced a hybrid model which is expressiveenough to allow different sort of information about the user (such as attributes andpreferences, observational data, user context) to be documented and presented. Wehave mostly considered combination of usage attributes and content attributes forpersonalization of access.

6.3 Profile Structure

In this section, we define and introduce a structure that could be used for savingand retrieving different types of information that document both behavior andknowledge aspects of the user. We define a user profile as a structured collectionof personal information about the user that has certain perspectives or dimensionswhich covers different aspects of the personal attributes of the user. Profile content,documents the personal information about the user as well as history and evidenceof the experience of the user who is being profiled. Profile has depth (hierarchy)as well as length (flat structure), allowing us to create a level measure for detailsincorporated into the profile.

Profile Segments

In addition to user details we need a structure which could record the history userexperience as well as the weight of the information presented. Since privacy is acrucial and very important aspect of user profiling, we would like to also incorporatesecurity information describing the privacy of the profiled information as well astrusted arguments pertaining to profiles. In order to incorporate such information,along with the weight of the information and security credentials protecting andenriching such information we will divide the structure into different sections. Werefer to each section of the records a segment. The first segment of profiled mate-rials is context segment. We assume that all surrounding facts could be consideredcontext and information contained within them could be considered contextual in-formation. In order to make such format more generic, we can avoid insertingvalues directly to context segment and instead we can give a reference to existingcontextual information. For instance we can define a context ontology that docu-ments contextual concepts and it becomes populated when contextual informationis available. Instead of using solely attribute and value pairs, we use RDF (resourcedescription format) described triplets of predicate, subject and objects to describe

6.3. PROFILE STRUCTURE 93

Figure 6.1: Structure of User Profiles. Each profile could be composed of multiplerecords. Structure of a single record is depicted on left side, while a hierarchicalhigh-level presentation of a user’s profile is depicted on right side.

information contained in the profile. This segment will contain the actual materialwhich is profiled.

For instance, "interest of user in science of art" can be described as tuple:

(hasInterest, user,"Science_of_art")

Here subject is used to describe the attribute and object is used to describethe value and the predicate describes the semantics of the relations of concepts.Using RDF-triplets eases the later extraction and mapping of RDF data to lowerlevel formats such as XML or higher level formats such as OWL. Each profilerecord contains a weight segment which can specify the weight of information beingprofiled. For instance if the profiled information is about user’s cognitive patternssuch as interests, then the weight would describe the intensity of interest of userin the object atom specified in RDF triplets, meaning for instance, how much user

94 PAPER B1

is interested in a certain artifact or artist. In addition to the profiled informationand its designated weight we incorporate a segment for security credentials. Thesecredentials can have different semantics in different cases and scenarios. But ingeneral, trust describes the trust, belief and confidence of the user towards thepiece of information profiled while privacy describes the privacy of the piece ofinformation recorded. For instance, privacy could have values between range [−1, 1]where positive values could describe the positive consent of the user towards sharing,while negative value could describe the negativity of the user towards disclosure ofthe content recorded to outside world. Using such weighted information user canspecify which atomic piece of information he/she would like to disclose or not todisclose to outside world. Trust can be interpreted differently depending on thecase scenario. For instance, trust in an artwork could document the originality ofwork being experienced by the user, while trust in profiled information documentsthe trust of user in the information piece documented and profiled or just trustof the user in system. Such atomic representation of trust could also be used fordescribing the trust in individuals while describing relationships between users.

6.4 Extending Metadata with Human User Metadata

In order to be able to extend existing keywords used for describing cultural heritageto include user attributes, first we need to understand and identify attributes thatdocument user behavior.

Constructing a Smart User Model Ontology

We have identified four major categories attributes: attributes that document theuser’s demographics such as age, languages and gender, attributes that documentpreferences of the users such as system, device and personal preferences, attributesthat document abilities and disabilities of user such as hearing abilities or walk-ing disabilities and attributes that document social aspects of the users such asrole and relations. In addition to user attributes we have identified and modeledattributes that document the context of use [262], as context plays an importantrole in personalizing the user’s experience. Attributes considered for documentingand presenting context are for instance, location, environmental attributes suchas humidity, and visit, such as goal of visit or companion of the visit. Attributesmentioned so far pertain to individual usage, rather than group usage. As a matterof fact, we have also identified and documented attributes such as grouping usersbased on age, knowledge or culture. We have considered utilizing these attributesfor target grouping the users. Tours are group activities and we have also consideredintroducing them in our attribute sets, such as tour name, fee and size. In orderto categorize these attributes, we have created super categories for these attributesand we have referred to each super category as perspective. For creating this struc-ture, we have partially used ontological structure and organization of GUMO [160]

6.4. EXTENDING METADATA WITH HUMAN USER METADATA 95

Figure 6.2: Attribute Categories. identified user attributes are categorized intosuper and subcategories that allow development and design of ontological classifi-cation.

(general user model ontology), and UserML [159] to describe the attributes andperspectives which pertains to users.

Extending Schemas with Human Usage KeywordsFigure 6.3 is an example of how cultural heritage metadata can be extended usingthe attributes identified previously. In order to describe metadata keywords describ-ing the attributes of the visual artworks, such as place of creation, artist name andmaterial of artwork, we have used Getty vocabularies [132, 134, 358]. ULAN [134]has been used for documenting the artist name and relating artist, AAT [132] hasbeen used for documenting the material of the artwork and TGN [133] has beenused for documenting the place the artwork is created. We have utilized VRACore [358] concepts and properties to describe work of art dimensions as well as

96 PAPER B1

Figure 6.3: Extended CH Metadata with Human Keywords: Schema has beenextended to include new concepts describing human usage domain. When theseconcepts become instantiated, they extend the artworks with individual, group andcontextual attributes.

relationships in between the concepts, such as relation between artwork creator andartwork itself.

The concepts of our user model ontology has been defined on the namespaceSUM(smart user model) while the edges are defined on the common namespace ofSM (smart museum). We have extended the cultural heritage ontological schemawith group attributes distinguishing user groups’ based on age and knowledge, al-lowing us to describe the recommended target group that this artwork could beuseful to. Extension of Contextual attributes used here has allowed us to statethe recommended companion for the visit. We have also used visitor typologythat allows us to state the type of visitors [327] (greedy, busy and selective). Inorder to recommend the user with an existing tour that includes the artwork wehave extended concept set with a tour attribute, tourname. When schema becomes

6.5. CONCLUSION 97

instantiated and populated with the values, then in addition to instances that doc-ument the artwork’s name and properties we have instances that document humanuser side that interacts with this artwork. For instance, in our example we haveVenere as a visual artwork painted by Sandro Botticelli in Florence using Temperaon Canvas. In addition to this legacy metadata, we have added that this artworksuites selective and greedy visitors, who look for lots of artistic and cultural details.This work can be subject to any audience with different knowledge backgrounds.This work suites best adult and teenage target groups, while this fact is also pre-sented by recommending visit with parent. This work is included in Virtual Uffizitour which can be recommended to user if interested. Matching Profiles againstExtended Personalized Metadata In order to further facilitate personalized accessto digital content, a filtering methodology should be implemented. We have con-sidered item-user filtering for access personalization.

By taking advantage of extending user profiles with user model keywords andattributes, we can allow item-user matchmaking to be implemented. This processinvolves expanding the query with additional human user keywords, which describethe profile of the user. Since a similar subset of these keywords is used to extendthe schema for describing the artworks, then the query is personalized according touser’s profile. As a matter of fact, slices of the user profiled are used to expand thesemantic query to digital content. The results of the query represent the matchingof user’s partial interests in digital cultural heritage and provide the user with morepersonalized access to digital cultural content. In this process instances of culturalmetadata schema are matched against the user profiled record instances.

6.5 Conclusion

We have presented a novel ontological user profile structure and formalization thatallows documentation and presentation of user information. This structure canrecord and present the weight dimensions and context dimensions for user profileson an item-wise basis. We presented this profile in ontological format. A user modelhierarchy of keywords or attributes were developed to allow expansion of legacycultural heritage metadata for personalized access to cultural contents. As a futurework, we intend to create algorithms for self-adaptive profile management basedon our profile, as well as developing and implementing services allowing onsite andextra-site systems to utilize our profiles for providing users with recommendationsand personalized information.

Chapter 7

Trust-Aware User Profiling:Modeling and Learning

N. Dokoohaki and M. Matskin,Reasoning about Weighted Semantic User Profiles through Collective ConfidenceAnalysis: A Fuzzy Evaluation, Atlantic Web Intelligence Conference (AWIC ’10),in Advances in Intelligent Web Mastering 2, vol. 67, no. 5, V. Snášel, P. S.Szczepaniak, A. Abraham, and J. Kacprzyk, Eds. Springer Berlin Heidelberg,2010, pp. 71-81.

99

Reasoning about WeightedSemantic User Profiles throughCollective Confidence Analysis: AFuzzy EvaluationNima Dokoohaki1, Mihhail Matskin2


Abstract

User profiles are vastly utilized to alleviate the increasing problem of socalled information overload. Many important issues of Semantic Web liketrust, privacy, matching and ranking have a certain degree of vagueness andinvolve truth degrees that one requires to present and reason about. In thisground, profiles tend to be useful and allow incorporation of these uncertainattributes in the form of weights into profiled materials. In order to inter-pret and reason about these uncertain values, we have constructed a fuzzyconfidence model, through which these values could be collectively analyzedand interpreted as collective experience confidence of users. We analyze thismodel within a scenario, comprising weighted user profiles of a semanticallyenabled cultural heritage knowledge platform. Initial simulation results haveshown the benefits of our mechanism for alleviating problem of sparse andempty profiles.

101

102 PAPER B2

7.1 Introduction

Increasing overload of information scattered across heterogeneous information ecosys-tems and landscapes, has increased the importance of user profiling. Profiling isseen as a facilitator and enabler for personalization. Personalization is a method-ology used for filtering the information on user behalf. As a result, profiles areincreasingly implemented and utilized to allow intelligent information systems todisseminate selected and filtered information to individual or group sets of users,based on gathered personal information, stored in their respective profiles. Rea-soning about uncertain knowledge is increasingly important. There has been astrong emphasis on the problem of reasoning in the face of uncertainty in SemanticWeb [170]. Fuzzy Logic [194] has become an important focus area to Semantic Webresearch community [311]. While strong attention has been given to present fuzzyontological concepts [311], and reason about them [218], still how to process andinfer the uncertain degrees and truth ranges, is of interest in many fields. Withinthe profiling domain certain concepts such as trust, privacy and ranking carry vagueand uncertain semantics. While ontological fuzzy languages can be used to presentthese concepts, analyzing the fuzzy degrees of each of these notions, as well as pro-cessing them is of our interest.

We have proposed a profile format [102], through which we consider trust, privacyand rank as weights to items the user has visited and they are stored in profiledrecords. We record values for each of these three weights, creating a multiweightedprofile structure. We have used RDF as the language for presentation of profiledinformation. Since profile is used to reflect both interests of the user and storeprevious experiences of the user, we create a hybrid notion of user profile. Con-fidence is defined as the state of being certain. As certainty of an experience isaffected by situation-dependent measures of usage, we can consider these weightsas parameters affecting usage confidence. As a result, we can take each of weight-triple values and process them to model and evaluate the confidence of user duringprofiled experience. To this end we take a fuzzy approach, through which we pro-cess each three-weight values of profiled records and infer confidence values. Wedemonstrate our model in the context of the SMARTMUSEUM scenario [292], aphysical exhibition of art, in which users interact with a personalized ubiquitousknowledge platform that uses profiling for providing users with their informationservices of their choice.

The organization of the paper is as follows: following background study in section7.2, our framework is presented in section 7.3, a simulation of our framework ispresented in 7.4, while we conclude in section 7.5 and present a futurework insection 7.5.

7.2. BACKGROUND 103

7.2 Background

User profiling has its roots in human studies. A user profile is defined as gatheringof raw personal material about the user, according to Koch [200]. User profilesgather and present cognitive skills, abilities, preferences and interaction historieswith the system [128]. According to Gauch et al. [128], User profiling is eitherknowledge-based or behavior-based. Knowledge-based approaches construct staticmodels of users and match users to the closest model. Behavior-based methodsconsider the behavior as a modeling base, commonly by utilizing machine-learningtechniques [37] to discover useful patterns in the behavior. Behavioral gatheringand logging is used in order to obtain the data necessary to detect and extractusage patterns, according to Kobsa [198]. Personalization systems are based onuser profiles, according to Gauch et al, [128]. A category of personalization tech-niques is based on cognitive patterns (such as interests, preferences, likes, dislikes,and goals) a user has. These methods are known as filtering and recommenda-tion techniques [198]. They filter resources based on features (mostly metadata)extracted and gathered from a resource or according to ratings (generally weights)of a user of similar profile, according to Weibelzahl [373]. Ontologies, at the heartof Semantic Web technologies, are used to formalize domain concepts which allowdescribing constraints for generation or selection of resource contents belonging thedomain the user is keen towards, as well as being used to formalize the user modelor profile ontology that helps making decision which resources to be adapted (forinstance, shown or not shown) to the user. Ontologies along with reasoning createformalization that boosts personalization decision making mechanisms, accordingto Dolog et al, [106, 107]. Ontological user profiles are becoming widely adopted.For instance, within the domain of digital cultural heritage, CHIP project is def-initely a significant stake holder. Considerable amount of research attention hasbeen paid to semantically formalizing the user domain [365], as well as personaliza-tion of information retrieval. Hybrid ontological user models are consumed to learn,gather, store and use personal user data, according to which semantically-enrichedart works are recommended to, during both on-line and on-site visit to exhibition.We have considered utilizing hybrid user models [102], which incorporate a semanticpresentation of personal information about users as well as incorporating notionsof trust, privacy and ranking for items the user has interest towards in the form ofweight-descriptors. Fuzzy logics have been considered as a means for mining, learn-ing and improving user profiles [232,259]. Fuzzy notions of trust [116,201,260,316],privacy [393] and ranking [299] have been proposed. In the context of, e-commerceMulti-agent settings, a fuzzy framework for evaluating and inferring trustworthi-ness values of opinion of agents has been proposed, by Schmidt et al, [316]. Agentsstate their evaluations about a particular (trustee) agent, agent being evaluated,with respect to agent initiating the transaction (truster). We have adopted andutilized the framework to our problem. At the same time we have adopted theprivacy approach, proposed by Zhang et al., [393] to privacy and ranking, whilefor trust evaluation, we have taken approach proposed by Schmidt et al, [316]. In

104 PAPER B2

addition, uncertain notions of confidence modeling have been proposed [201]. Inthe context of PGP key-chaining, Kohlas et al. [201], proposes a naive approach toconfidence evaluation based on uncertain evidence. Considered as a close notionto trust and belief, confidence is modeled, as an important element in the fuzzyinference mechanism.

7.3 Fuzzy Confidence Framework

In this section we present our approach for modeling and evaluating overall con-fidence from the three-weight descriptors of trust, privacy and rank, assigned tose mantic user profiles. We refer to the inferred resulting values as overall confi-dence of the users. Before we present the process first, we describe the presentationformat for the profiled records containing values of trust, privacy and rank, anddescribe the motivation for using them. We limit the application of the profiles tothe scenario that we are eager to apply our framework towards.

Presenting Profiled Weight DescriptorsIn addition to interest capturing (known as a traditional approach in profiling), weassign extra weights for capturing trust, privacy and rank to user and customerprofiles. As an example, in the SMARTMUSEUM [292] case, the three-weightdescriptors (privacy, trust and rank) are gathered in form of sensor data, givendirectly by users from mobile devices which they carry during exhibition, or areunobtrusively gathered without their consensus from environmental sensors such asRFID tags, GPS location services and Wireless networks. These weights representthe perception of users with respect to their experience, in our case exhibitionand visit from art and cultural artifacts [102, 292]. Rank or score presents theamount of interest a user has with respect to his/her visit. Privacy presents thesecrecy of users with respect to the disclosure of their personal information. Trustdescribes the self-assurance of the experience of users. As a result, user has theability to tell the system how much his/her current experience is secret and cool.Main motivation for processing these weights to profiled items is that using theseextra weights we can alter and perhaps improve the behavior of the system. Ifservices provided by platform can be seen as system’s behavior sets, perhaps theseweights could alter system’s behaviors to an extent that system provides better andimproved services. Raw values of privacy, trust and ranking are gathered duringthe exhibition visit from the interactive interfaces implemented on smart handhelddevices. Software interface depicts the three values in form of a scale which usercan change in the preferences section. During the visit, experience data (mainlyitems visited and weigh values assigned) are retrieved from the handheld devicesby SMARTMUSEUM servers and are stored in user profiles. The structure ofthe profiles [37] has been specified flexible and generic enough to accommodateontological (RDF triplets) data about visited artifacts, context of the visit andweight descriptors. As an example, following profiled slice:

7.3. FUZZY CONFIDENCE FRAMEWORK 105

<http://www.smartmuseum.eu/ns/context/weather#rainy,visited,St.JeromeWriting,atDate 20081210,0.8,0.6,0.5>

Conveys the following semantics;

In a rainy weather (context), at a certain date (20081210), anony-mous user (subject) visited (predicate) Saint Jerome writing (ob-ject) artwork and liked it very much (rank value = .80) and usertrusts moderately his/her own experience (trust value= .60) andhas average secrecy (privacy value= .50).

Fuzzy Confidence Modeling ProcessIn order to evaluate the overall confidence of a user, we extract the weight values(privacy, trust and rank) from user profiles described previously, and we processthem accordingly. The process is made of two main phases; pre-processing andpostprocessing. The following steps are taken in order to evaluate the overall con-fidence of the user: In the pre-processing phase, first step involves application ofweighting methodologies to raw values. This step includes fuzzification of each ofthe weightdescriptors. We take different approaches per each weight-descriptor,depending on the usage and semantics of each of weight-descriptors. Second stepinvolves defining membership functions where per each fuzzy-weight input we de-fine a membership function which translates the linguistic fuzzy rules and axiomsinto fuzzy numbers and values, as members of fuzzy sets. The final step involvesapplication of fuzzy rules where fuzzy rules, which are defined and embedded inthe fuzzy rule-base, are applied to fuzzy and weighted sets. In the post-processingphase, first step involves feeding input values to membership functions, where fuzzysets are created as a result of this process. Second step involves application of de-fuzzification methodology to fuzzy sets. In the following sections, we describe eachstep in more details.

Fuzzification Phase

In this phase, each weight value will be taken separately and converted into fuzzyvalues which are fed into fuzzy inference engine afterwards. Weight Fuzzification(Secrecy, Self-Reliance and Opinion Weighting) As stated, privacy in our modelgives the users this ability to specify the secrecy of their experience. This allowssystem to treat their personal data according to their choice. Within a similarfashion, an uncertain privacy model is introduced by Zhang et al. [393], in whichprivacy is defined as a role that allows the user to manage personal informationdisclosure to persons and technologies, with respect to their privacy preferences.This motivates us to adopt this approach to our own problem. With respect toself-reliance, trust in our approach allows the user to describe if they can relyon their own experience. To model such form of uncertain trust model, we haveadopted approach proposed by Schmidt et al., [316]. The agent-oriented approach

106 PAPER B2

undertaken allows modeling trust as a weighted factor. In the case of rating, wetake the same approach for privacy [393], with major difference that the importance(sensitivity) of certain item has direct relationship with resulting weighted ratingand most importantly, the raw rating has direct effect to weighted resulting ratingvalue. Meaning that, for instance the more important an item is the higher therating of a user is. As the focus of this work is on confidence values, we advise thereader to refer to [316,393] for detailed description of formulas used for calculationof weighted values.

Defining and applying membership functions and fuzzy rules

Before the fuzzy values are fed to inference engine, membership functions shouldbe formed and values need to be grouped according the degree of membership ofeach input parameter. Existing membership functions for fuzzy values compriseof Exponential, Sigmoid, Trapezoid, Gaussian, and bell-shaped [311]. We take thesimple approach of Triangular shape [194, 311] with our three fuzzified sets (fortrust, privacy and rank). Fuzzy rules [194], allow the combination and specificationof the output model from the inference engine. We utilize the fuzzy rules in ourapproach to characterize the confidence output model. We would like to describethe degree of user’s confidence with respect to the scores he/she has assigned asweights for trust, privacy and rank. An example for a rule in our confidence modelcould be:

If the (Fuzzy) Trust Value is High ANDthe (Fuzzy) Ranking Value is High ANDthe (Fuzzy) Privacy Value is High ThenConfidence is High.

In this case AND operator narrows down the output result of the rule, as itrepresents a conjunction of membership functions. Finally, membership functionstrans- late the fuzzy rules into fuzzy numbers. Fuzzy numbers are then used to giveinput to the fuzzy expert system.

Defuzzification phase

For defuzzification methodology, several approaches exist. Existing approachesinclude center of gravity method, the center of area method, the mean of maximamethod, first of maximum method, the last of maximum method [194], bisectorof area method, or the root-sum-square method. Root-sum-square is consideredhere as the main method. Other methods could be considered and evaluated,consequently but for the sake of simplicity we only consider this approach in thispaper. In the defuzzification phase, the calculated membership function resultsare taken, grouped according to fuzzy rules, raised by power of two, and summedfollowing the consequent side of each asserted fuzzy rule. Let FR, be a fuzzy rule.Following the defuzzification approach, formula 7.1, is used to defuzzify the values:

7.4. SMARTMUSEUM SIMULATION 107

FRm =∑

FRm2 (7.1)

Let m = [−, 0,+], which represents the labels used to group the rule sets. Thisallows us to distinguish between rules, describing negative (low), neutral and pos-itive (high) confidence outcomes. Now that we have defined the outcomes, we canapply weights and scale the weighted output. For instance, if positive confidenceweights are more in favor of our approach then more weight can be given to posi-tive confidence rather than negative or neutral. As a matter of fact, W0 representsneutral weight, W+ represents positive weight, W− represents the negative weight.Adopted from Schmidt et al., [316], formula 7.2 allows us to create a weightedoverall confidence output:

C(Ux) = FR−W−+FR0W0+FR+W+

FR−+FR0+FR+(7.2)

Where C represents the evaluated Confidence and Ux represents the user beingevaluated. We can derive a Collective Confidence Factor (CCF), where we considerthe confidence degree of other users with respect to the same information itemand we calculate and process the confidence value of a user with respect to anitem, bearing in mind overall derived confidence. Let us define View as the inferredconfidence of the user with respect to a certain item. If we consider a user’s self-viewas internal, we can define an Internal View, while other users’ views can be seen asExternal Views. By taking into account this assumption we can assign weights toaverage confidence of others and to a user’s confidence, altogether. Adapted fromOTV (Overall Trustworthiness Value), proposed by Schmidt et al., [316], we formu-late CCF using formula 7.3:

CCF (Ux) = WIV iewC(Ux) +WEV iew

∑li=0

FR−W−+FR0W0+FR+W+

FR−+FR0+FR+l (7.3)

Where CCF is Collective Confidence Factor,Ux is user being evaluated, C(Ux) isthe confidence of user with respect to the item being viewed, WIV iew representsinternal view weight while WEV iew represents external view weight, and l repre-sents total number of users, for whom we have considered their confidence values.At this stage we can scale the resulting values on a specific confidence scale andexpand the range of resulting CCF values.

7.4 Smartmuseum Simulation

In an experimental evaluation, taking into account a SMARTMUSEUM setting, wesimulated 100 weighted user profiles. Weight values are intentionally sparse; weightsassigned to profile slices contain blank values in order to reflect the real-worldscenarios where users dont provide much input data into system, or sensors arefaulty. We considered artifacts of two physical museums as items being experienced

108 PAPER B2

Figure 7.1: Linear presentation of crisp trust values (left) and weighted trust values(right)

by exhibitors. In order to apply our model and demonstrate it in the context ofour laid out scenario, we follow the steps, described previously in section 3. In thefirst step we fuzzify all input raw values for three weight-descriptors at hand. Weconsider three main qualifiers for preferred outcome. Fig. 7.1 depicts an excerpt ofweight values for raw and fuzzified trust values of 10 simulated users. Crisp trustvalues (left) and weighted trust values (right) for 10 users. Simulated raw valuesare intentionally sparse, as depicted with broken lines in left diagram. We haveconsidered high confidence, neutral confidence and low confidence:

If Trust,Privacy&Rank all High, Confidence High.If Trust,Privacy&Rank all Average, Confidence Neutral.If Trust,Privacy&Rank all Low, Confidence Low.

7.4. SMARTMUSEUM SIMULATION 109

Now that fuzzy sets are formed we apply defuzzification methodology. This allowsus to filter negative results, in the case negative values were available. In our sce-nario all input values are positive. As fuzzy sets are grouped based on the preferredoutput, we can scale the output and gain more flexibility using weights. We candefine weights for each type of output. Weight degree is taken from the range of[0, 1]. We have given the maximum weight to positive values, while neutral valuesare considered more important than low values. Now we’re able to evaluate theconfidence. Fig. 7.2. depicts the resulting confidence evaluation for user/item inour scenario.

Now that confidence values are derived, we can infer CCF for the users we haveevaluated so far. More flexibility can be gained through giving weights to internaland external Views of user information that we have processed. Since we don’thave any preference over difference of processed user’s view (WIV iew internal view)and other user’s views (WEV iew external view), then we assign equal weight to bothviews. Confidence values and Collective Confidence Factors are depicted in Fig. 7.2.

Results were generated for 100 user profiles. For Confidence Evaluation weight set,W = [W+ = 1,W0 = 0.05,W− = 0.5] while for Collective Confidence Factor weightset, W = [WIV iew = 0.5,WEV iew = 0.5]. Horizontal axis represents users, whilevertical axis plots confidence degree distribution. We tried to generate values thatrepresent real-world user inserted values, in many cases either one or two, or inone or two exceptions, all three values were kept empty. This reflects the sparsityproblem of training data for profiling (in general personalization) services suchas recommendation, or matchmaking. Such problem hinders the performance ofpersonalization services by creating infamous problem of Cold-Start. By comparinginput (raw) values with resulting confidence degrees, we realize that results are notuniform and that is justifiable with respect to different preferences or interests ofusers. In certain cases values have improved, while in many cases values haven’tchanged. We observed that empty values in many situations haven’t changed andthis is mainly because of the naive rules considered, where we weigh positive andneutral outcomes higher than low outcomes. Simple approach could be consideredto address empty or zero values, by using an offset for trust fuzzification. Althoughin comparison between pure confidence values and collective confidence factors, werealize that considering collective opinions while evaluating the confidence of anindividual user over a certain item, could give more improved results. As seen inFig. 7.2., CCF values are more uniformly distributed over diagram in comparisonto pure confidence values. The uniformness in distribution of values in CCF comesfrom quantification of others confidence while calculating one’s confidence. Theother reason can be seen as the flexibility given by incorporating further weight forViews with respect to user being evaluated or collective Views of other users. As aresult we can use CCF values instead of classic, pure confidence values for boostingpersonalization services. All and all, we have managed to replace all empty valueswith a single value (although zero) and at least sparsity is alleviated with respect

110 PAPER B2

to that.

7.5 Conclusion and Future Work

We have introduced a fuzzy approach to modeling and analyzing confidence basedon weights assigned to profiled information of users stored in semantic profiles.Based on our approach weights can be processed through a fuzzy reasoner and cre-ate a weighted outcome based on factors affecting the context of calculation. Wehave tested our approach with simulation data from a real-world scenario, whereexhibitors of visual art experience personalized services of distributed knowledgeplatforms. We have introduced a classic and a collective notion of confidence wherevalues could be used to improve quality of adaptive personalized services or allowus to detect similar individual or group behavioral patterns. As a future work, wewill use resulting confidence degrees to improve personalization services providedby the knowledgeplatform, such as recommendation, matchmaking, and etc. Wewould like to also see how collective notion can be used to enable group-based ser-vices such as group recommendations.

7.6 Acknowledgements

This work has been done within the FP7-216923 EU IST funded SMARTMUSEUMproject. The overall objective of the project is to develop a platform for innovativeservices enhancing on-site personalized access to digital cultural heritage throughadaptive and privacy preserving user profiling.

7.6. ACKNOWLEDGEMENTS 111

Figure 7.2: Stacked linear presentation of (top) confidence and (bottom) collectiveconfidence

Chapter 8

Trust-Aware User Profiling:Discovery and Aggregation

F. Cena, N. Dokoohaki, and M. Matskin,Forging Trust and Privacy with User Modeling Frameworks: An Ontological Analy-sis, First International Conference on Social Eco-Informatics (SOTICS ’2011), 2011,pp. 43-48.

113

Forging Trust and Privacy withUser Modeling Frameworks: AnOntological AnalysisFederica Cena 1, Nima Dokoohaki2, Mihhail Matskin3

1Department of Computer Science, University of Torino, Torino, [email protected] and Computer Systems (SCS), Information and Communica-tions Technology (ICT), KTH - Royal Institute of Technology, Forum120, 16440- Kista, [email protected] and Computer Systems (SCS), Information and Communica-tions Technology (ICT), KTH - Royal Institute of Technology, Forum120, 16440- Kista, [email protected]

Abstract

With the ever increasing importance of social networking sites and ser-vices, socially intelligent agents who are responsible for gathering, managingand maintaining knowledge surrounding individual users are of increasing in-terest to both computing research communities as well as industries. Forthese agents to be able to fully capture and manage the knowledge abouta user’s interaction with these social sites and services, a social user modelneeds to be introduced. A social user model is defined as a genericuser model(model capable of capturing generic information related to a user), plus socialdimensions of users (models capturing social aspects of user such as activi-ties andsocial contexts). While existing models capture a proportion of suchinformation, they fail to model and present ones of the most important di-mensions of social connectivity: trust and privacy. To this end, in this paper,we introduce an ontological model of social user, composed by a generic usermodel component, which imports existing well-known user model structures,a social model, which contains social dimensions, and trust, reputation andprivacy become the pivotal concepts gluing the whole ontological knowledgemodels together.

115

116 PAPERS (C1)

8.1 Introduction

Social intelligence according to the original definition of Edward Thorndike is "theability to understand and manage men and women, [..], to act wisely in humanrelations" [20]. Some authors have restricted the definition to deal only with knowl-edge of social situations, where social intelligence is an aggregated measure of socialawareness, social progressiveness and interests for new experiences. With the ad-vent of social web, users have the possibility to exploit their social intelligence alsoin virtual environment, by using available social networking sites and services tomaintain contact with other people as well as for sharing contents and experiences.We define ”socially intelligent agents" as software agents which are responsible forgathering, managing and maintaining knowledge surrounding individual users inthe social web. With the ever increasing significance of social networking sitesand services, socially intelligent agents are of increasing value to both computingresearch communities as well as industries. For these agents to be able to fullycapture and manage the knowledge relating to a user’s interaction with these socialsites and services, a social user model needs to be defined and introduced.

A "social user model" is defined as a generic user model (with generic informationabout a user [50]), plus social dimensions of users (with social aspects of user, suchas social activities, relationships with other users, groups they belong to and so-cial contexts). While existing models (see Section 8.2) capture a portion of suchinformation, they fail to model ones of the most important dimensions of social con-nectivity: privacy, trust and reputation. To this end, in this paper, we introducean ontological model of social user, composed by a generic user model component,which imports existing well-known user model structures and captures the basicconcepts regarding the user; and a social model, which contains social dimensions.In this model, trust, reputation and privacy become the pivotal concepts gluing thewhole ontological knowledge models together.We adopt the definition of "privacy"as defined by Westin [378] as "the right of an individual to determine the amountof information available to other". Privacy is particularly relevant in adaptive sys-tems, since they gather a lot of personal information to provide adaptive services,and in social web, where users share a lot of data to other people. With respect totrust, we consider Golbeck’s definition [139]. According to this point of view, trustbetween two individual exists if the truster executes an action upon understandingthat trustee’s actions in future will lead to a good outcome or utility for truster.Since our view of trust is reputation-based, it allows us to profile behavior of twoindividuals in a single relationship. To be able to port such profile across severalapplications, we have proposed to model reputation separately to profile the be-havior or performance of an individual in several contexts.

Our ultimate goal is to propose a model that i) can be used as a reference to modelusers in the social web context, ii) can be used directly by the adaptive applica-tions, for example, using some mechanism that, given a user, is able to populate

8.2. USER MODELING ON SOCIAL WEB: STATE OF THE ART 117

such model on the fly according to the user information available on the Web. Inthis paper, however, we focused on modeling privacy and reputation/trust in socialcontext, and thus in particular we described our models for such concepts.

The paper is structured as follows. In Section 8.2, we present existing approachesfor modeling users in social web systems, focusing on how they deal with privacy,trust and reputation. Then, in Section 8.3, we describe our application scenario.Section 8.4 briefly describes our framework and all the components involved: userdata, domain, context, actions, privacy and trust. Section 8.5 focuses on the privacymodel, while Section 8.6 focuses on trust and reputation model. Finally, Section8.7 concludes the paper presenting possible future directions from this work.

8.2 User Modeling on Social Web: State of the Art

In the user modeling field, there were several attempts to define a generic usermodel which contains the definition of user features and of his/her physical andsocial context, expressed with semantic web language and made available for alluser-adaptive systems via Internet. In fact, a commonly accepted top level ontol-ogy for user and context models is of great importance for the user modeling andcontext research community. The major advantage is the simplification of usingand exchanging user model and context data between different user-adaptive sys-tems. The most known (and adopted) models are the General User Model Ontology(GUMO) [161], the Unified User Context Model (UUCM) [244], and Friend of AFriends (FOAF) [45]. GUMO includes basic user dimensions, such as demographicdata, user knowledge, emotional state and personality aspects, user skills, capa-bilities, user interests, preferences, user goals and plans, etc. Moreover, GUMOalso models the environment by representing data like location, time, device, etc.However, the current version lacks of modeling of social data, even if the authorsstarted to work on it [161].

UUCM models several features of the user and his/her situation: cognitive charac-teristics (area of interest, competence, preference), usage data (current task, taskrole, task history), social data (relationships the user is involved in), environmentdata (device, current time, language, location). FOAF focuses more on social datathan on user and usage data, since it mainly aims at describing the links betweenpeople and the things they create and do over the web. FOAF is weak in defin-ing other user features, such as interests and preferences, knowledge and expertise.Only interests are represented, by means of the "interest" property, which representsan interest of a user through indicating a document whose topic(s) is of interest forhim/her. Describing interests in full is a complex undertaking: FOAF provides nosupport for characterizing levels of interest.A recent attempt to model users in the social web has been done by the Grappleproject [2]. Within this project, the Grapple User Model Framework (GUMF) is

118 PAPERS (C1)

defined for storing, retrieving and sharing information about users between com-ponents of the framework. In the framework, the Grapple User Modeling On-tology [293] is proposed, in order to describe all the possible statements about auser, and concepts like creator of the statement, rating of the statement, temporaland spatial dimensions. Most of such existing UM frameworks fail to capture andpresent privacy policies as well as user’s trust statements. GUMO simply has theattribute gumo:privacy which defines the default privacy status for each class ofuser dimensions. UUCM and FOAF do not explicitly model privacy. In GUMFprivacy is modeled only with a property (hasPrivacyPreference) which expressesthe level of privacy concerns of the users. However, privacy in user modeling is acrucial, multidimensional and complex aspect, that cannot be expressed by meansof a single property. Personalized interaction and user modeling bear significantimplications on privacy, due to the fact that personal information about user needsto be collected to perform personalization [364].

Moreover, Social Web context is particularly challenging for privacy, since socialapplications gather a lot of data about the user and his/her activities. Thus, theconcept of privacy should be decomposed in several dimensions. A first theoreticalattempt to define all the privacy dimensions involved in the user modeling processhas been made by the Unified Model for Privacy Preferences [186], a formal modelwhich defines the main categories of information in social web context. However,to the authors’ knowledge, there are no attempt to integrate such privacy model ina global user model.

At the same time, little attention has been paid to effective incorporation of trustand reputation into user models. Among adaptive Web applications, recommendersystems have been quite successful in utilizing and leveraging social trust and rep-utation. Golbeck first introduced the notion of ontological modeling of trust insemantic social Web [144,147]. Later on, Golbeck and Ziegler [401] pointed out theimportance of profile similarity as a metric to infer reputation-based trust valuesin a social network and they utilized resulting trust values for improving word-of-mouth style recommendations. Following the Golbeck’s ontology, functional modelsof social trust are proposed. Dokoohaki and Matskin introduce a functional, yetvery light-weighted ontology of trust [101]. The semantic model captures the se-mantics of relationship concept, where topic and metric of trust is documentedunder MainProperties of relationship concept, while the context of relationship(e.g. date of relation initiation, goal of it, etc.) is kept under AuxiliaryPropertiesconcept. This trust ontology was used later on by Zarghami and Fazeli [388], asthe main knowledge model of a trust-based recommendation system. Ontologies ofreputation have been proposed as well. Casare and Sichman [65] have introduceda functional ontology of reputation to model reputation of intelligent agents. Sincethey utilize legal norms, they model social control mechanisms for software agents.As a result, such model becomes suitable for utilization in Social Web as well.Chang et al. [76] propose a basic reputation ontology and an advanced reputation

8.3. UNDERSTANDING IMPORTANCE OF SOCIAL USER MODELS INCROSS-SYSTEMS PERSONALIZATION 119

ontology. They also distinguish between the entities towards which reputation ismodeled for. Since the major focus is on e-commerce agencies, this model is notentirely suitable for modeling reputation of social users. Main argument for bothprevious models is lack of quantifiable semantics leading to lack interoperabilityin between them. Reputation interoperability can be enabled through utilizing se-mantic technologies [9]. Alnemr et al. [9] propose a functional reputation ontologythat can serve as a vocabulary to be utilized in several applications. In this work,reputation is modeled as a complex object Reputation Object (RO). While RO cap-tures the semantics of reputation assertions, ReputationValues represent the metricfor reputation object instances, while the context of reputation is described usingthe Criteria concept, that documents the provenance of the facts surrounding theseassertions, such as algorithms used for gathering and computing the values.

Examples of adoption of reputation and trust in user models as pointed out earlierhave been limited. Grapple project [2] investigates capturing and utilization ofreputation to model the trust between users, by allowing the users to rate eachother’s opinions and statements, following the eBay model [305]. Adoption of sucha plain model of reputation is not successful, nor sufficient in generic and unifiedmodels of users, due to several reasons. First of all, rating is an implicit model ofreputation, and representing it as a simple form of property-rating or a vector ofratings strips it from its original notion and postulation, according to Alnemr [9].On the other hand, many systems are already using explicit trust statements toevaluate users, such as Epinions [236]. Second, since trust and reputation conveydifferent semantics on Social Web, then frameworks for modeling users should becapable of describing trust and reputation separately. This difference is pointed outwhen you introduce a trust model capable of describing trusted peers of a user on asocial network, e.g. Facebook or LinkedIn, as well as a reputation model capable ofstoring and presenting the reputation of user across different communities on-line,such as reputation of a user as a reviewer on Amazon, or reputation of a user asblogger in a blogging community such as Twitter.

8.3 Understanding Importance of Social User Models inCross-Systems Personalization

The aim of this section is to better address the advantages of bringing trust andprivacy together to improve system’s adaptation. We present a brief use case wherewe describe how our social user model can work in a social web environment. Tomhas a strong interest in art and he loves dancing tango. He lives in Turin and hejoins iCITY [61], a social community dealing with events and attractions in the city,in order to get suggestion about what to do in the city. Tom use many of the mostpopular social site, like del.icio.us, Flickr, Facebook and Linked-in. All these socialapplications collect a lot of data about his current interests, preferences, activities,which make available to other users and other applications. Thus, Tom wants to

120 PAPERS (C1)

control the release of such information to other people: for example, he wants thatonly friends who share the passion for tango with him can see the news about tangohe posted on Facebook wall. Furthermore, among such people, he wants that onlythe people he trusts more can see his score in the latest tango competition, likeher friend Jill. Tom is planning a weekend in Florence, and he would like to visitthe Institute and Museum of the History of Science in Florence. Smartmusuemapplication [310] is available for such museum. Smartmusuem is able to collectall the information about Tom the social web applications he interacted with madeavailable and, using them, to build a user model of Tom on the fly. This informationcan be used to initialize the adaptation process. This model also considers thepreferences Tom declared about the release of information to other applications:his personal information can be delivered only to trusted applications which areforbidden to use them for commercial purpose. In particular, the information iCITYmaintains about the events Tom has seen, the tags he inserted and the topics he isinterested in could be very useful for the museum system to quickly identify his focusof interest and offer him a personalized visit to the museum. Since iCITY agreedon that privacy policy, after the interaction, Smartmuseum will send to iCITYsome novel information about Tom that can be used to update the current usermodel of the application. This scenario can serve as a guideline for re-use of userinteraction data generated by one application into another across similar domains.In this way, we illustrated how three user modeling problems can be solved, i.e.(1) cold-start problem in Smartmuseum, that can initialize the user model andstart the recommendation from a point closer to user’s interests, (2) maintainingan integrated user profile, which reflects larger scope of user interests and activities,(3) the release of information (to other applications and to other people) take theuser’s preferences for privacy and trust into account. In this paper, we focused onthis third advantage. In the current situation, this scenario is far to happen, dueto lack of integration among social applications and user data, and due to the lackof policies which integrate trust and privacy.

8.4 Our framework for user modeling in the social web

Since modeling the users on the Social Web is a very complex task, an investmentis needed for putting these separate pieces together. At the same time, we also aimat bridging the space left by the previous work by considering privacy, reputationand trust, the most crucial concepts within Social Web as the key missing conceptsand dimensions surrounding the notion of user on the Social Web. To this end,we have proposed for a user modeling framework within which any user modelcan be imported and extended with social dimensions and enriched with privacypreferences, reputation and trust assertions. Our model of social web users willcontain the following models:

• User model, the description of user features according to existing de-factostandards such as GUMO [158], UUCM [244].

8.5. PRIVACY MODEL 121

• Domain models specific for the domain, such as standard domain vocabulariesas AAT [132], ULAN [134] for artworks, etc.

• Context model, which describes both the physical context (e.g., place, time,etc) and the social context (e.g., relations with other users and roles).

• User Activities model, which describes the actions of the users (such as ATOMmodel [197]).

• Social data model, which describes the social data: service data, discloseddata, incidental data, behavioral data, derived data (following Schneier model[317]).

• Privacy model, which describes the main privacy concepts for a user to beable to specify his/her own privacy preferences and policies.

• Trust and Reputation model, which describes main trust concepts betweentwo individuals as well as expressing reputation towards a single or a groupof individuals.

All such models have been represented as OWL ontologies. In the following, wewill describe in more details the Privacy model (see Section 8.5) and the Trust andReputation model (see Section 8.6), since they are the main contributions of thepaper.

8.5 Privacy model

According to Kim et al. [190] the most important piece of a privacy-respecting Se-mantic Web is a privacy ontology that enables agents to exchange privacy-relatedinformation using a common language. The privacy ontology should be able toclearly define the various dimensions of privacy (e.g. privacy of personal behaviorvs. privacy of communications), and contain enough parameters and index termsto enable specification of a privacy policy in a standard machine-understandableformat. It should be descriptive enough to specify the highest known standards ofdata protection and privacy. Following former suggestions, we have defined a light-weight privacy ontology in OWL-2 which describes the main concepts of privacy ina social context, and the relations among such concepts. We took inspiration fromthe Unified Model for Privacy Preferences [186] (see Section 8.2).

We also use some of the concepts OWL-S privacy ontology [181], a simple andeasy-to-use ontology for expressing privacy policies as well as a protocol to supportmatching of such policies among Web Services. However, we developed our ownontology, since our point of view is the user in a social context, and not the providedservices, as in that case. Our goal was to have a model that is platform independentand can be used in different contexts, able to cross the borders of social platforms

122 PAPERS (C1)

(the so called Walled Garden of the Social Web [186]), and expressed by the meansof semantic web language to promote interoperability among applications. As wewill see, some portion of the ontology has been imported from OWL-s ontology, forre-usability purposes. We have defined the following main concepts1.:

• Who (the recipient of data): individuals (friends, family members, colleagues,companions, etc); agents; organizations*, business*, government agency*

• What (the data that are the objects): user model**, context model, domainmodel (link to some domain ontology), social model***.

• When (retention time): week day (working days, week end), day hours(morning, afternoon, evening)

• Where (place the data are physically stored): address, location information(link to some geo ontology).

• Why (purpose why the data are collected): to be processed (for adapta-tion purpose, for marketing purpose, for inference purpose, for data miningpurpose), to be sold, to be transmitted.

• How (process made to the data): data protection techniques; privacy ac-tions*, privacy policies*.

Figure 8.2 visualizes the privacy ontology representing the taxonomy of the involvedconcepts.This ontology model allows then to define privacy policies according to such infor-mation. A set of SWRL rules for describing privacy policies can be defined for eachspecific users; in particular, "what" and "who" associations have been chosen as afirst domain. An example of rule is the following (we omitted prefixes to enhancereadability): it can express the fact that a user can let his/her colleagues see whereshe is or access her calendar activities only between 8am and 5pm on the weekdaysbut not over the weekend.

Location(?x) ∧ Tasks(?y) ∧Day(?v, Working_days) ∧→ can_be_disclosed(?c, Collegue)

The choices about privacy policies are largely subjective, and cannot be definedat priory, but it depends of course on the users preferences and situations conditions.Therefore, privacy policies are not a priority at this stage of the project, and theyneed further investigation.

1Notice that the dimensions signaled with: * means that they are imported from the OWL-sontology; ** imported from the GUMO ontology; *** imported from the Grapple model.

8.6. TRUST AND REPUTATION MODEL 123

How

What

When

Where

Domain Model

Social Model

User Model

Privacy Action

Privacy Policies

Why

Context Model

Privacy

Entity

retention_time

Address

Data mining

Marketing

Selling

Adaptation

Who

Reputation Model:ReputationValue

Trust Model:TrustValue

rdfs:subClassof

rdfs:subClassof

rdfs:subClassof

rdfs:subClassof

rdfs:subClassofrdfs:subClassof

rdfs:subClassof

has_privacy

rdfs:domainrdfs:domain

rdfs:range

rdfs:range

Figure 8.1: The privacy ontology

8.6 Trust and reputation model

Artz and Gil [15] categorize the notion of trust in computer science domain intothree main categories: policy-based trust, reputation-based trust and general mod-els of trust. While Semantic Web has benefited from research of all three subcat-egories, it is well-accepted that a Social Web model of trust is reputation-based.Golbeck first referred to such model as a Web of Trust [147]. A Web of Trust isa directed-edge network between a group of entities (or resources), within whicheach link carries a trust value and, assuming a transitivity of trust, reputation canbe collected and inferred for each single individual across such network. Withinthe context of Web of Trust [147], reputation can be defined as a measure of trust,within which individuals can gather and maintain reputation of other individualsacross the network. To express trust and reputation information we have usedontologies allowing for expression and quantification of trust for use in algorithmsto make a trust decision about any two entities [101], e.g. Tom trusts Jill highlywith respect to dancing. We propose for a combined model of social trust andreputation, bearing in mind the details described previously. To model the trustwe adopt the concepts of Dokoohaki’s ontology [101], and for reputation we adoptthe concepts of Alnemr’s ontology [9]. We fuse two sub-ontologies together usinga new concept, called Context for modeling both trust relations and reputationconcepts, through which contextual details of trust and reputation can be capturedand stored. While ontologies of trust have allowed for expressing trust between twoindividuals, it is important to be able to express collective knowledge of trustedopinions about an entity as well. This form of reputation demands a model capableof documenting reputation assertions on its own without pointing to provenance of

124 PAPERS (C1)

the assertion of trust [9], e.g. Jill is well-known for her skill in dancing. While trustontology enables us to model a trust network of social inter-relations, extended on-tology of reputation enables us to model assertions of reputation seperately as well.This way we can fully capture the semantics of reputation-based trust on socialweb. Following previous discussion, we model trust and reputation using conceptsbelow:

• Trust (Main concept of trust): Abstract trust (relationship).

• Relationship (Connection between two trusting peers): Relationship is themost important concept of our trust model. Relationship always has a sinkand a source, which we have described here as truster and trustee entities.We have used two exact cardinalities on hasTrustee and hasTruster, in orderto state having exactly one truster and one trustee per each relation.

• Entity(Truster) (Source of trusted relation): We distinguish between sourceand target of trust as a trust network is always a directed graph [147]. Wedistinguish between source and target of trust as a trust network is always adirected graph [147].

• Entity(Trustee) (Sink of trusted relation): Same as Truster, the target orsink of trust relationship. We need both entities to be able to determine thecredibility of statements issued.

• Trust Topic and Value (Main properties of trust): Every trust relation isestablished surrounding a topic and is quantified using a metric. Followingthis assumption, we use main properties concept to model the subject andvalue of trust. Restrictions allow us to assign a single value and subject foreach single relation subject to trustworthiness modeling.

• Context (Context of trust): Contextual properties of trust is realized usingthis concept. Defining context for trust relations allows us to specify func-tional or non-functional auxiliary properties of trust in our model. In the caseof functional properties, for instance the algorithms used to gather and com-pute the trust values can be presented. For instance, we might use spreadingactivation [402] for gathering trust values, or T-index [388] for computing thetrust values. Having context allows us to record the time, date or locationthat such relationship was established or the type of social network this rela-tion was created, such as business in the case of Linkedin. We use this conceptto merge Trust model to Reputation model by defining Context as superclassof Criterion (see figure 8.2).

• Reputation (Reputation assertion): A Reputation assertion about an entity.Using this concept we can assert and define reputation for any entity (person,organization, group). The model adopted here allows us to define completelymention the trust statements used to .

8.6. TRUST AND REPUTATION MODEL 125

Relationship

Reputation

An Entity(Truster)

An Entity(Trustee)

TrustValue

Criterion

has_Trustee

has_reputation

TrustTopic

Trust

hasReputationValues

has_trust

has_criteria

Context

auxiliary_properties

has_Truser

has_subject

*

1

*

1

hasTrustvalue

1

1

*

rdfs:subclassof

isDefinedby

1..*

Collecting Algorithm

Computation Algorithm

collected_by

calculated_by / inferred_by

11

CurrentValue

HistoryList

1

*

ReputationValue

PossibleValues

hasRangepartOf

Figure 8.2: Trust and reputation ontologies

• ReputationValue (reputation metric): Reputation of an entity (truster) isquantified and stored using instances of this concept. We can use the currentvalue to represent the current reputation score while collection of reputationvalues asserted can be stored in history list. This allows us to gather and storeall explicit (trust) or implicit (votes) statements towards an entity. Gatheringprovenance about an entity’s reputation history allows us to later on assessthe credibility of statement issuers. concept of PossibleValues allows us todefine different ranges and values for reputation and store them together.

• Criterion (Context of reputation): Contextual properties of reputation isrealized using this concept. Defining context for reputation assertions allowsus to specify functional or non-functional auxiliary properties of reputation inour model. In the case of functional properties, the algorithms used to gatherand compute the trust values can be presented. For instance, we might usea simple web crawler for gathering trust values, or we might utilize Sum orBayesian functions for computing reputation scores [255]. Similar to trust,having criterion allows us to record the time, date or location that reputationwas asserted.

Figure 8.2 visualizes the trust and reputation ontologies, representing the taxonomyof the involved concepts.

We aimed at proposing an interoperable model for embedding trust and reputationinto any user-centric adaptive system, as well as sharing statements and assertionsof trust and reputation across multiple systems. Thus any model of trust andreputation modeled for social context, should be capable of being aligned with our

126 PAPERS (C1)

model. Taking this into account we avoid making choice between metrics for eithertrust or reputation. This should also be mentioned that choice of metric is alsoheavily dependent on application, user behaviour as well as data at hand. As aresult choices of metric or algorithms are not a priority at this stage, and we willinvestigate further which metrics or mechanisms suite best for similar scenarios.

8.7 Conclusion

In this paper, we have presented an approach for modeling the user in Social Web.The goal of our research work is to study how to put together all the standardsand initiatives separately made by different entities in order to provide a completemodel of a user which interacts with social web context. More in details, the maincontribution of our work is to propose a model of user in a social context:

• that can be used as a reference to model users in social web context;

• which contains explicit modeling of privacy and trust dimensions, that usuallyexisting models do not consider all together;

• that can be directly used by socially intelligent agents and by adaptive sys-tems, populating and consuming it using real user data.

In our future work, we are planning to exploit the model in a existing social recom-mender systems, and evaluate the impact in recommendations and the final usersatisfaction.

Chapter 9

Trust-Aware User Profiling:Discovery and Aggregation

(original version)N. Dokoohaki and M. Matskin,Quest: An Adaptive Framework for User Profile Acquisition from Social Communi-ties of Interest, 2nd IEEE International Conference on Advances in Social NetworkAnalysis and Mining (ASONAM ’10), vol. 0, pp. 360-364, 2010.

(extended version)N. Dokoohaki and M. Matskin, An Adaptive Framework for Discovery and Miningof User Profiles from Social Web-based Interest Communities, Chapter in The In-fluence of Technology on Social Network Analysis and Mining Book, T. Özyer, Ed.Springer Wien, 2012.

127

An Adaptive Framework forDiscovery and Mining of UserProfiles from Social Web-BasedInterest CommunitiesNima Dokoohaki1, Mihhail Matskin2


Abstract

Within this paper we introduce an adaptive framework for semi- to fully-automatic discovery, acquisition and mining of topic style interest profilesfrom openly accessible social web communities. To do such, we build anadaptive taxonomy search tree from target domain (domain towards whichwe are gathering and processing profiles for), starting with generic conceptsat root moving down to specific-level instances at leaves, then we utilize oneof proposed Quest schemes to read the concept labels from the tree and crawlthe source social network repositories for profiles containing matching andrelated topics. Using machine learning techniques, cached profiles are thenmined in two consecutive steps, utilizing a clusterer and a classifier in orderto assign and predict correct profiles to their corresponding clustered corpus,which are retrieved later on by an ontology-based recommender to suggest andrecommend the community members with the items of their similar interest.Focusing on increasingly important digital cultural heritage context, using aset of profiles acquired from an openly accessible social network, we test theaccuracy and adaptivity of framework. We will show that a tradeoff betweenschemes proposed can lead to adaptive discovery of highly relevant profiles.

129

130 PAPERS (C1)

9.1 Introduction

As web of interrelated content is gradually giving its place to web of interpersonalcontent, classical problems of information retrieval continue to persist. Much of thiscontent lies within the heart of social web. At the same time publication means arebecoming more and more easily available to both human and machine publishers.As these publication mediums increase their production pace day by day, it be-comes harder for human readers to find and retrieve the exact content which theyare looking for.

Adaptive Web and its myriads of approaches, specifically recommendation tech-niques and approaches have proven to be good candidates in dealing with retrievalof relevant information, by providing users with suggested contents of their taste.One infamous problem for recommendation techniques to function properly is thesparsity of the usage data. One might want to build a recommender at the topof a content library already available, to provide users with suggestions while lackof enough user data hinders the functionality of the system. To deal with thisproblem, one can propose for discovery of interested users, acquisition of their ex-plicit interests and processing their profiles for generating suggestions of items thatmight be of their interest to buy or view. As lots of such users document and sharetheir daily activities within social networks these days, social web can be a possiblerepository to discover such users. At the same time users utilize topic style profilesto express their interests.

While individual profiles might contain topics of different and sometimes conflictingtopics. Some social networking sites and services, such as LiveJournal [226], pro-vide means for community formation, where individuals of same interest gather toshare information items of same interest. As a result, community profiles seem tobe more suitable for our task as they provide a rather focused type of topic terms,as of compared to individual profiles. To build such a framework, we propose fora semi- to fully- automatic framework mainly composed of a crawler and a learner,in which the crawler harvests the source repository for profiles and caches them.After data normalization and dataset preparation, miner reads and analyzes rawprofiles and applies clustering to gathered data to generate clusters of interrelatedtopics and their corresponding profiles.

To increase the accuracy of the overall learning, clusters are stored and fed into aclassifier to evaluate and assess the correct assignments of clusters to topics andtheir corresponding profiles. Second learning process also helps creating an initialprobability distribution over all interrelated topic set, which could be used lateron by a semantically enhanced recommender or a simple topic recommender asinitial point to generate recommendation for the corresponding members of thecommunities. For the crawler to be able to discover relevant profiles, we need tobuild a taxonomy tree with leaves containing terms of the domain for which we are

9.2. RELATED WORK 131

discovering profiles for. For instance, if we’re looking for communities of art, thencrawler needs to formulate queries with terms around or related to domain of art.Consequently, we use the tree to formulate queries that are used eventually by thecrawler to cache the discovered profiles. To be able to intelligently and effectivelyformulate these queries heuristics can be proposed toward which queries are formu-lated.

As a matter of fact, we have defined three schemes (or schemes), namely: depth-based, allowing for discovering and crawling for topics on a certain taxonomy tree-depth at each time, n-split, allowing iterative discovery and crawling of all topicswhile at each iteration gathered data is split for n-times, and finally greedy, whichallows for discovery and crawling the network for all topics and processing thecached data altogether. We study this framework in the context of discovering pos-sible interested communities of individuals from LiveJournal, an openly accessiblesocial blogging journal and network. To focus the task of the framework onto acertain domain, we hypothesize gathering and processing data for two on-line muse-ums that are seeking communities and individuals of related interest to recommendinformation items pertaining to their artifacts to. We evaluate the effectivenessof framework from the two perspectives: firstly, accuracy of the mining steps andsecondly, the adaptivity of the outputs of learner from the perspective of lexicalrelevance to the query. To accomplish the former, we study the accuracy of theminer’s clustering and classification performance with respect to each scheme pro-posed. To justify the latter, a basic lexical parser is used to assess the relevance ofthe top terms in each cluster groups with respect to each scheme.

The rest of this manuscript is structured as follows: first a background overviewwill be given, and then approach is introduced, followed by introduction of threemining schemes proposed. This is followed by introduction of the framework, whichcontains the description of the learners, while last section describes the experimentwith two museums followed by evaluation results. And finally a conclusion andfuture work section brings this manuscript to its end.

9.2 Related Work

User profiles play a crucial role in the context of adaptivity enabled, and person-alization empowered information systems. Availability of profiles is vital for thesesystems to function properly. As a result, we can see the problem from two per-spectives: firstly, discovering the users and acquiring the knowledge pertaining totheir profiles, and secondly, mining these gathered data to find useful patterns thatcan be consumed by a recommender for personalized recommendation generation.Focusing on the former, discovering and sharing interest profiles across websites hasbeen under focus by many researchers. Ghosh and Dekhil [136] argue that profileconstruction and discovery on the web can be augmented to address the sparse-

132 PAPERS (C1)

ness of the profile data, as well as improving the content of the profiles. Teevanet al [338], study heuristics for discovering and processing the prior interactions(profiles) of users for the task of search personalization on behalf of the users.This problem becomes more important when personalization is cross-platform andcross-domain [223]. To profile users across multiple domains, ontology-driven userprofiles [129, 349] are utilized. As a result many researchers have discussed theissue of discovering and retrieving profiles across multiple domains within the con-text of ontological user profiles [59,102,120,303,322,335]. Gauch et al [127], give acomplete overview of different models of discovery and retrieval for ontology-baseduser profiling. Problem with these models are that most of these models are eitherfocused on modeling the user profiles rather than discovery or harvesting them, orthey are focused on generic web context while we are more interested in social webcontext.

When harvesting and acquiring profiles becomes trivial, problem of dealing withsparse data becomes the most important issue under focus. To address this prob-lem, researchers approach different methodologies to gather, analyze and generateuser profiles. This is usually done through applying machine learning techniquesto web data [320]. Utilizing these techniques has been very appealing for person-alization tasks [86, 111, 253]. For instance, techniques such as clustering of usertransactions [252], data mining techniques such as clustering [267] and web usagemining techniques [329], were proposed to support personalization needs. Thisfield is collectively referred to as web mining for personalization [111]. Mining webcontent for personalization has been attractive to addressing inherent problems ofrecommender systems [246]. More specifically two types of recommenders have beendependant on myriads of machine learning techniques for their functionality, namelycontent-based [284] and collaborative filtering recommenders [163]. Agent-assistedpersonalization has been a target domain for applied user profiling. Soltysiak andCrabtree [326] describe the architecture for an agentbased approach for user pro-filing and automatic generation of profiles using machine learning techniques. In asimilar work, Billsus and Pazzani [283] utilize the same heuristics for learning Webuser profiles. They use a naive Bayesian classifier for discovery and identificationof interesting websites for supporting the personalization task of users on whosebehalf the profiles are created and maintained. With respect to second problemmentioned, it seems that most of the literature at hand is focused on generatingaggregate usage history from an already cached and stored usage experience. Thisis while not much attention has been paid to addressing the problem of formulatingprofiles when no experience data is available. That’s why discovery of users hasbecome crucial to assist the task of profile generation [304].

If explicit interests of the users are already at hand, one can propose for utilizationof these interest descriptions for creation of possible user profiles. While such datacan be found nowadays in social web context, then social web can serve as a novel orcomplementary source for discovery and harvesting of profiles of relevant interest.

9.3. QUEST-DRIVEN SOCIAL WEB MINING ARCHITECTURE 133

Users tend to often express such interests in the form of bag-of-words [360] or simplytopics [21]. As a result scientists are proposing new methodologies for generatingprofiles for users from these topics and utilizing them for supporting recommendersystems. These methods are often utilizing text classification schemes [320], ormore recently topic modeling techniques [205]. While speaking of data mining insocial networks, current literature points out to applying statistical analysis foruncovering network structures [176], rather than content. Most related work to ourwork is presented by Liu and Maes [351]. Focusing on the social networking servicescontext, Liu and Maes propose for building models out of interests and tasteswhich are harvested out of Orkut, and leveraging it for addressing the sparsityof recommendation software. Inspired by Liu and Maes, within this paper, wehave proposed for Quest [103], a framework for harvesting and mining topic-basedinterest profiles from online social networks. This framework combines web miningarchitecture and profile generation techniques, but we put more emphasis on theactual profile generation process. While the former part helps for harvesting theprofiles from the network, the latter part helps mining and learning groupings ofprofiles according to their shared interest topics. We use a two step miner (similar toe.g. work of Soltysiak and Crabtree [326]), through which first we cluster the profilesaccording to their common topics, while the second step assesses the assignmentof profiles to their correct categories and clusters using a classifier. We focus ourexperiments on data acquired from social blogging site, LiveJournal [226].

9.3 Quest-Driven Social Web Mining Architecture

In this section we outline the architecture of the miner as well as the detailed pro-cess of learning phase. Many social networking sites are paving the path for ease ofpublication and sharing of users’ content within their boundaries. At the same timeease of publication and open expression of interests [21], allow for adaptive technolo-gies to spot and attract customers to on-line libraries of materials or goods, whichcould be eventually purchased by interested customers. To exploit this feature wedesign a framework (Fig. 9.1) at the top of a source social network, which allowsfor a semi-automatic supervised approach utilizing taxonomy assisted crawler anda two-step learner to first of all gather, then process and finally generate what werefer to as initial user profiles.

Through the knowledge flow created across this framework, we harvest the crawledinterest topics from the source repositories, and cache them. Then following thenormalization process we prepare them to be fed into the input of the mining pro-cess. Using the two step learner, we group them into clusters of profiles and theirinterrelated topics. The clusters are consecutively fed into the classifier. Classifyinglearner assesses and evaluates correct assignment of profiles to their correspondingclusters for further precision. After clusters resulting from previous step are clas-sified, a predictive model of clusters along with their probabilistic weights, are

134 PAPERS (C1)

Figure 9.1: Quest component architecture and knowledge flow

generated as output of classifier. We refer to the output in this step as initial userprofiles. Using classified clusters of profiles along with the weights generated by theclassifier the semantic recommender can create a predictive recommendation set.As designing the mining process has been the main focus of this work, we leave therecommendation generation for the future work. In the following section we focuson structure of the taxonomy tree, followed by the descriptions of our schemes andhow they affect the overall knowledge flow.

Utilizing Concept Taxonomies for Automating Profile DiscoveryProcessIn order to specify those topics which are used for discovering and acquisition ofrelevant data for mining segment to work on, we should have a partial to rathercomplete knowledge of which topics are needed to query for. This knowledge couldbe described semantically utilizing a taxonomical hierarchy comprising of topicssurrounding the target domain, towards which we’re gathering and processing datafor [114, 265]. We have adopted this idea as the solution to the problem at handand designed a basic taxonomy tree composed of an isa relationship in addition toa hierarchy, where the deeper the topics lie, the more specific the topics becomewhile the root of the tree is the central topic to the domain at hand. Taxonomyis formulated as a tree consisting of nodes representing distinguished topics. In ageneral case taxonomy could be derived or constructed from an ontological presenta-tion of the domain. In such taxonomy edges represent the subtopic relationship [74].


Figure 9.2: Primary taxonomy tree constructed from cultural heritage domain un-der study

In our experiment, we focused on the context of cultural heritage in main andfurther focus was given to two specific museums, one maintaining artifacts of theart, historical paintings, and the other maintaining artifacts of science, scientifichistorical instruments such as physics. Further detail regarding the taxonomy andconcepts incorporated in it is presented in the experiment part. topics are chosenfrom general topics corresponding to cultural heritage domain (root topic), and aswe move to lower levels we see the branches towards each museum, while museumitself forms a concept and the leaves present the exhibits corresponding to themuseums, so for the artistic museum we can observe topics related to art such assculpture, while in the case of scientific exhibit, we observe topics such as telescope.It is worth mentioning that this taxonomical classification could be more completeand at the same time more complex. But this poses a tradeoff to search: the moredetailed the tree becomes, the less probable it becomes to actually find interestedpeople, as the terms they use to express their interests are very broad. As mentionedearlier, it’s arguable that If granularity of the knowledge of domain exceeds a partialset of topics, as taken into account in our case, then a full ontology can be consideredinstead [74]. This poses a tradeoff where the more detailed the topics become theless probable becomes to actually find any interested individual or community inthat certain topic becomes. Experiments presented later justify this matter to someextent. At the same time if partial knowledge of the domain needs to be extended,one can consider mining semantically relevant topics surrounding the topics of the

136 PAPERS (C1)

Figure 9.3: Visualized schematics for Depth-based (top), N-Split (middle) andGreedy Quest (bottom); each boundary visualizes the concepts that formulate thequeries

existing topic and expanding the tree along the depth and width with interrelatedtopics. This has been taken into account for extension of the experiments in futurework.

Intelligent Query Formulation using Quest Schematics

To study the effectiveness of the approaches proposed for discovering and retrievalof possible matches across the network, we have proposed for a set of schemesamong which depending on the need for effectiveness or efficiency, scheme designeror mining supervisor can make a choice.

To do so, we define Φ as a query set as Φ = {T1, T2, T3, ..., Tn} in which Φ is aquery that consisting of n topics Ti. Taken into account this assumption we definethe quest schematics in following sections.

Depth-based Quest

Depth-based quest is an iterative decremental scheme. When working within theframework of this scheme in each iteration, starting from the root, topics are takenfrom a certain depth of taxonomy tree (depth d), and are used to fill the query Φto harvest the network for matching interest results. On the next iteration nextdepthd+1 is taken into account, and the same step is repeated, and this mechanismiterates until the leaves of the taxonomy tree are reached. For example, in fig.9.3taxonomy tree has depth d=4, and scheme iterates 4 times, accordingly. As a result:

Φdepth = {dT1ed=0 , dT1, T2ed=1 , ..., dTn, Tn−1ed}

Where Φdepth is the query formulated according to depth-based scheme anddTn, Tn−1edis a set of n topics taken from depth d of the taxonomy tree.


Split-based Quest

N-Split (Split) is an iterative scheme through which a subset (a split) of the taxon-omy tree is digested into the query formula to be executed. We iterate over and overthe query topics, until the size of split becomes lower or equal to a single topic. Re-turning results are incrementally cached. In this schematics, it is important whichnumber of splits is chosen. fig.9.3 depicts the geometry of splits. For instance, infig.9.3 , 3 splits are displayed as triangles of different size. Even splits. As a result:

Φsplit = {dT1...T3e , dT4...T8e , ..., dTn−4, Tned}

Where Φsplit is the query formulated according to split-based scheme, dTn−4, Tnedis a split topic set, divided by the size of n, e.g. n=4.

Greedy-based Quest

Greedy or all-at-once strategy is an incremental strategy, in which all topics withinthe tree are taken incrementally and are utilized to fill the query formula. As aresult:

Φgreedy = {dT1...Tne}

In which, Φgreedy is the query formulated according to Greedy strategy, and dT1...Tneis a set on n topics that exist in the taxonomy tree.

Mining for QuestWhile quest strategies allow for intelligent query formulation against the interesttopics available in the profiles, retrieved matches need to be processed in the nextstep accordingly to generate profiles consumable by recommender system. To dealwith interest topics gathered from the social network, we have proposed for a two-step mining process. In this process we reduce dimensionality of topic attributesthrough a clustering methodology, and we form centroid around the topics which areoriginally taken from taxonomy tree. Clusters are respectively fed into a function,typically a classifier, which eventually generates our initial profiles.

Clustering of Profiles

For a query Φ , containing ti topics we retrieve a set of profiles. We define pl asthe set of l profiles that have been retrieved with respect to our query as follows:

pl = {p1, p2, ..., pl}

If t′i is the retrieved topic set corresponding to the latter query. We can take t′i asan observation set, where each observation can be seen as n-dimensional vector ina vector space as follows;

138 PAPERS (C1)

t′i ={t′1, t

′2, ..., t

′n | ∀p ∈ pl, p =

k⋃n=1

t′n, |p| > k

}In which t′i is the set of all n topics retrieved from a profile p, in which profile pcontains at least k topics of all combinations topics that we have queried for. Takenthis set as our observation, we can use a generic k-means clustering algorithm [156]to partition these n observations into k sets of non-empty non-overlapping setsreferred to as clusters (k < n), as t′clustered ;

t′clustered = {t′1, t′2, ..., t′n}

In which t′clustered represents the partitioned or clustered set of profiled topics. Theclustering algorithm takes the set of n topics, and tries to divide the n topics intothe k set of t′clustered topics. If each profile contains m attributes, or features in thefeature space (here k=1, 2, 3 â¦ k) then each cluster is spread surrounding centroidsct′ ;

ct′ = {c1t′ , c

2t′ , c

3t′ , ..., c

nt′}

An example of centroids formation is presented in 9.4, where green topics are rele-vant matches, while red ones are either least relevant. We can observe that centroidsof the clustering learner are formed around those topics which are exactly matchingtopics to our original query topics, or are semantically close to those topics. Wetake advantage of this obvious effect to eliminate the centroids which are eitherirrelevant or less-relevant to the task at hand. With respect to this, the centroidformations are observed and supervised to make sure that the correct centroids arechosen for the clusters being formed. At the same time, number of clusters affectsthe centroid formation. By taking cluster size and distance metrics as control fac-tors, we can measure the performance of the clustering step in the experimentslater on.

Classifying Clustered Interest Topics

In general, clustering is utilized to reduce the dimensionality of an input data,especially if the number of input attributes is high. While clustering in singlestep can be configured to group a stack of profiles, but still assignments need tobe verified in a single further step for improved accuracy. This is essential if theframework is desired to be automated [103].At the same time clustering allows for a set of instances to be treated as a single(clustered) instance during each step in algorithm. Taken into account these twofacts, we can increase the effectiveness of the miner by feeding existing groupingsof clusters of profiles and their topics to a classifier. Classification techniques [19]are very appealing where a predictive outcome based on a set of input observa-tional data is needed. Taken into account this background, we take a classifier as


Figure 9.4: Sample centroid formation for interest topics with respect to schematicsused.

a function, where it can take the set of our clustered observations as an argument,while the output would be, a probabilistic model created from the clustered inter-est topics of respective profiles. This predictive output will become our generatedprofiles, favorable to recommendation step. Our function should assign higher pre-dictive probabilities to more appealing topics, which are the topics we have queriedfor and now form the centroids of the clusters. In our experiments we have usedthree classifiers; a Bayesian classifier, a kNN classifier, and a pruned decision tree.At this point supervisor (see fig. 9.1) observes the predicted model and if neededrefines the prediction output. Supervisor observes the accuracy and precision ofthe experiments to make sure the classification step was successful.

Predicted model is stored for recommendation service to take as input and generaterecommendation accordingly. Utilizing existing data for extended experiments wecan either use a full ontological recommender system [309], or a topic-based rec-ommender engine [205]. In the case of the former recommendations can be createdbased upon their lexico-semantic distance from user interests to predefined itemannotations [310], while in the case of the latter we can use the existing set oftopics along with their weight distributions for inferring new topic sets [205]. Avisualization of boundary of output profiles subjective to our experiment lateron,can be seen in fig.9.7.

Measuring Adaptivity of Quest Schemes

As presented earlier a generic k-means clusterer spreads centroids surrounding thehighly weighted topics within the profile documents. Exact matches rank highestnaturally since they are the most frequent among all profiles. One of the most

140 PAPERS (C1)

effective approaches used for increasing the chance of getting most relevant topics,is probabilistic topic models [205] that use Latent Dirichlet Allocation (LDA) [36].An LDA model aims at finding a combination of topic set for each profile, such asz | p, with each topic described by terms using another probability distribution,such as P (t′ | z), described as follows:

P (t′ | p) =n∑i=1

P (t′i | zi = 1)P (zi = j | p)

In which P (t′i is the probability of topic i for a given profile and zi is the latenttopic.P (t′i | zi = 1) presents the probability of t′i within topic j, while P (zi = j)is the probability of picking a term from topic j in the document. To do so firstwe can derive a probability distribution P1 where the frequency of topic per eachmatching profile is high. As a result:

P1 (t′ | p) = freq (t′, p)∑ntifreq (t′i, p)

In which P1 (t′ | p) is the measure of the probability of topic t given a profilep. This gives us the ability to put a threshold on the quality of profiles that weretrieve to avoid gathering empty or very sparse profiles. Now we can combine thismeasure with P (t′ | z) with P (t′i) to derive P2 (t′ | p) as follows:

P2 (t′ | p) = ρP1 (t′ | p) + (1− ρ)P1 (t′ | p)

Taking into account the discussion so far we formalize the adaptivity in the contextof our experiments as follows:

Adaptive Quest. A Quest scheme is referred to as adaptive if we can derive aprobabilistic distribution where the frequency and relevance of centroids topics arecomparatively high. The method is said to converge if the increasing relevancy andfrequency of profile topics are feasible.

We use P1 to measure the frequency of topics among the resulting centroids, whilewe utilize P2 to derive the relevance of topics with respect to clustered set of profiles.We will demonstrate how this concept holds during the experiment using bothmachine learning process as well as probabilistic method proposed.

9.4 Evaluating Quest Schematics: A LiveJournalExperiment

In this section we layout the details of our experiment followed by analysis of results.

9.4. EVALUATING QUEST SCHEMATICS: A LIVEJOURNAL EXPERIMENT141

Crawling LiveJournal Community Profiles

LiveJournal [226] is a social networking website empowering the users who create,share and maintain journals. Users within this website can keep a journal, diaryor simply a blog [233] . LiveJournal has about 2 million active users per month.When users create profiles on this website, they emphasize their interests1 using aset of topics. There are in general two types of accounts on this website: users andcommunities.

A LiveJournal community is a journal where users can share items, informationand posts about a similar subject. Since communities are focused on a subjectof interest, interest topics expressed for these communities creates this opportu-nity to study and analyze these communities for discovering interested customers,which turn out to be the members of the respective communities. We managed todiscover, crawl and cache about 1000 community profiles with in total more than11000 topics, among which each profile contains at least 8 topics corresponding toexisting topics in our domain taxonomy tree. First raw gathered topics are normal-ized using an unsupervised filter. We used porter’s word tokenizer and stemmingalgorithm [291] before hand to respectively tokenize the topics into words and mapthe terms within profile documents to their base linguistic forms. Subsequently, wefed the data into the miner.

At each pace clusterer slices the data, while supervisor observes the clusteringoutput and accuracy and then clustering results are saved and loaded into classifierto be processed for predicting the initial user profiles. At the end of this stepsupervisor observes the accuracy and resulting predictions before submitting it torecommender engine. The experiments at hand, we have focused the context oncultural heritage and museum domains. In this case we are aiming at discoveringon-line or on-site visitors of website or physical exhibitions of museums. To increasethe heterogeneity of results, we took into account two museums of different nature,one a collection of scientific instruments2, focusing on science and physics sub-domain, while the other museum is an art collection, focusing on paintings andstatues of religion sub-domain3.

Evaluation and Comparison of Quest Schematics

In order to study the effectiveness and accuracy of the process, we study two mainaspects of the framework proposed: mining accuracy and adaptivity.

1LiveJournal interests. http://www.livejournal.com/interests.bml2Museum of History of Science at Florence, Italy. http://www.museogalileo.it/en/index.

html3Museum of Fine Arts, Malta. http://www.heritagemalta.org/museums/finearts/

fineartsinfo.html

http://www.livejournal.com/interests.bml

http://www.museogalileo.it/en/index.html

http://www.museogalileo.it/en/index.html

http://www.heritagemalta.org/museums/finearts/fineartsinfo.html

http://www.heritagemalta.org/museums/finearts/fineartsinfo.html

142 PAPERS (C1)

Since our focus is more on evaluation of the quest schematics, which enable adaptivediscovery of communities as well as processing and transforming them into tangiblerecommendation input, we focus the first evaluation part of this manuscript intoeffectiveness of miners, and the second part onto measuring the adaptivity by takinga look at empirical results comparing frequency of relevant topics within the clustersformed.

Clusterer Evaluation

Per each strategy we ran the miner with the following configurations: depth-basedapproach with depth configuration d=0, 1, 2, n-split approach with splits sizen=2, 4, 6 and finally greedy approach with a complete dataset. To cluster theinterest topics gathered we have utilized Lloyd’s algorithm [185] for simple k-meansclustering. We have used two distance measures as our control factors; EuclideanandManhattan distances. Results are compared with respect to cluster size increaseand variation of schematics used. We compare these results with respect to theirwcss (within cluster sum of squares):

wcss = argt′min

k∑i=1

n∑ti

‖ti − ϑi‖

In which t′ is the set of partitioned topics, ti is the set of input topics, and ϑi isthe mean of points in the partition [185]. Given a set of observation algorithm triesto choose the mean and partition the input set in a way that error of wcss becomesminimal. As a result we utilize this metric to compare and show the effectivenessof the clusterer. To study the effectiveness and accuracy of the clusterer we havegathered and plotted wcss error per each scheme. To demonstrate effectiveness ofeach strategy, we have tested different sizes of clusters,e.g. 2, 4, 6, 8, 16 and 32.Results are depicted in 9.5. The plot is stacked to depict deviations more clearly.We perform the tests using a training set.

In these results, missing values are replaced with mean/mode in the case of Eu-clidean and in the case of Manhattan component-wise median is used. At firstglance, it is clearly visible that clustering algorithm using Euclidean distance per-forms more accurately than using Manhattan distance. We can clearly see in thecase of greedy result-set the larger the cluster size the higher the error and at thesame time we can observe the considerable difference between error using Manhat-tan and Euclidean distances. One of the reasons greedy errors are quite high is thesize of the final dataset, as the larger the data more iterations and splitting arerequired by algorithm, which in turn increases the error. Observing depth-drivenschemes, we can observe that error accumulated during clustering experiment isless than greedy opposite to greedy increasing size of clusters decreases the error.What is interesting is that the lower we move on the tree, the more considerable


the resulting error becomes.

This gradual increase in error is visible starting at the root of the tree, e.g. depth 0,moving towards the lower depth of the tree, e.g. depth 2. Finally moving to n-splitexperiment, we observe that split-based approach yields way less errors among allstrategies, and at the same time it degrades more gracefully by increasing the size ofn (size of splits at each iteration), as well as increasing size of clusters. At the sametime changing distance does not change the error between the results that much asthis difference is totally insignificant. So far we observe that split-driven schemesyield least errors among schemes proposed, as well as clustering with Euclideandistance is more accurate in the case of the dataset we have used.

Classifier Evaluation

We have experimented with three classifiers; a Bayesian classifier, a lazy classifier,and a tree classifier. For the Bayesian classifier we used a Naïve Bayesian classi-fier [177], for the lazy classifier we used k-Nearest Neighbor (kNN) classifier [6],and for the tree classifier we used Ross Quinlan’s pruned decision tree [296].

To evaluate the accuracy of the classifier, accuracy and precision of the profilesgenerated are analyzed. For evaluating the classification step, we have used f-score(f-measure) with respect to three classifiers being tested. F-measure considers boththe precision and the recall of the test to compute the score. In general f-measureis calculated as the weighted harmonic mean of precision and recall where precisionis the count of correct values divided by the count of all returned values and recallis the number of correct values divided by the number of values that should havebeen returned. We have used f1measure which is calculated as follows:

f1 = 2.precision.recall(precision+ recall)

Results are visualized in 9.6. As of before, results are stacked to distinguish de-viations more clearly. The former plot presents the average f-measure for eachclassifier with respect to increasing input cluster size. The latter on the other handvisualizes the f-measure with respect to each scheme utilized. Plots are presentedseparately to ease readability of results. As expected, we can observe that kNNclassifier performs more accurately than two other classifiers at hand, as it scoreshighest f-scores among all three. This is while tree classifier comparatively exhibitsless error than the Bayesian classifier.

We can easily see gradual decrease of scores in both resulting plots. In the case offirst plot, results do show deviations at large cluster sizes, which is expected due toincreasing calculations at each point, as they tend to converge the larger the clustersize becomes. With respect to second plot, starting with greedy experiment set,f-score seems to be the lowest among all of other schemes. And while observing

144 PAPERS (C1)

Figure 9.5: Stacked plot of average clustering accuracies with Manhattan and Eu-clidean distances, visualizing errors per cluster size (top) and per schematics (bot-tom)


Figure 9.6: Stacked plot of average classification accuracies, visualizing f-score persize of clusters (top) and per schematics used (bottom)

146 PAPERS (C1)

Figure 9.7: Boundary Visualization for Generated Profiles:(a,b) Class ProbabilityEstimators for N-split n=2 and n=6, (c,d) Class Probability Estimators for Depthd=1 and Greedy

depth- driven schemes we can see slow increase in accuracy and recall, when movingdown the root of the taxonomy tree. This is natural due to the fact that the lowerdepth of taxonomy tree contains more focused topics, as well as more number ofleafs to enrich a query. When moving to n-split approach, it is notable to see thatf-measures for n=2, 4, 6 are almost similar and the increase in the scores withrespect to increase of n is almost linear. As a matter of fact, we can see that lazyclassifier as well as split schemes yields most suitable results in the context of ourwork.

Adaptivity Evaluation

In addition to mining the stream of profiles and grouping them into topics tangibleby a recommender, it is important for us to see how much these topics are relevantand if choice of scheme can make a difference in adaptivity of our framework. Weevaluate this by using a lexical parser to process the topics associated with centroidsand in turn we get the frequency and occurrences of these topics. In addition weused the toolkit to train a Gibbs LDA (Latent Dirichlet Allocation) [372] model tohelp us to infer relevant topics with respect to current collection of profiles. Thenew set of inferred topics was used to filter out less-relevant or irrelevant centroids


Figure 9.8: Empirical comparison of the frequency and relevancy of centroids spreadof topics

topics. Sum of occurrences of terms together with sum of relevance weights areplotted in fig. 9.8.

To generate these results we have fixed the size of clusters. Observing the resultspresented in the plot we can see that the most relevant and maximum occurringtopics are associated with centroids found for n-split schemes. With respect tooccurrence times, depth schemes show increasing frequency of topics as the depthincreases, so frequency and depth and frequency of topics are associated directly.This is while greedy scheme holds the least frequency among all. Increasing fre-quency is completely visible in the split schemes, as n becomes larger the frequencyof topics increases sharply. At the same time, combined with the frequency, we cancompare the relevance of centroids topics.

We can observe that the most relevant topics are associated with split-schemes aswell, and this relevance increases with increasing number of splits; for instance, in 2-split we can see the centroids topic animalcrossing with a relatively high frequency

148 PAPERS (C1)

but with low relevance, and this is the case with 4-split with centroids topic beautywith same frequency, but as it is observable that 6-split centroids are highly relevantand are at the same time highly frequent. As compared with depth schemes, wecan observe that relevance increases but it is not associated with frequency: in thecase of depth 0, we can see centroids formed around fieldwork and anti, in the caseof depth 1, we can see eccoquelfieroistante or bella, and epistemology and learningin the case of depth 2. This is while greedy shows more improved relevance thandepth scheme. As a matter of fact, among all the schemes proposed n-splits seemsto be most adaptive one and it is evident that the higher the value of n the fasterthe results will converge.

Discussion on Experiments ResultsTaking into consideration results presented in the previous section, we discuss someof the qualitative and quantitative issues that can affect the result of ours, as wellas similar experiments below:

1. Size and semantics of the taxonomy tree:We have experimented with a rather small to average taxonomy tree in thiswork. The main reason for this choice is the fact that, across the social net-work subject to this experiment, the more focused the topics get on the lowlevel concepts, such as actual scientists like Galileo or instruments like Tele-scope, the less probable it gets that we might actually find any community orindividual being directly interested on the matter. So one might suggest thata full rich taxonomy be learned from the target domain and utilize that toformulate the queries. But if instead of increasing the size of the tree, e.g. thedepth, the length of tree, e.g. number of concepts at each level increases, thehit ratio for profiles discovered increases drastically. We have proven this us-ing query expansion technique, which has led to discovering more than 17000community profiles, as of compared to 1000 profiles subjected to experimentin this work.

2. Availability of interested people within the social network:One of the significant experiences with gathering profiles for the experimenteddomain at hand, e.g. Museums of Science or Visual Arts, is the fact thatfinding people on the social web, who have explicitly expressed their inter-ests about certain topics and concepts surrounding this domain, seems prettyhard. It is obvious that people tend to express very generic topics and atthe same time most of the attention of people is on media, music, books ormovies [21] rather than their specific interest in limited areas like visual artsor physics. At the same time, if we analyze their individual profiles, we seethat the keywords in the profiles are very scattered around different mat-ters and subjects, which led us to focus on community profiles in the first

9.5. CONCLUSIONS AND FUTURE WORK 149

place, since communities are actual places that people share their idea abouta certain subject or topics relevant to these topics. As a matter of fact, it isimportant to know first if within the target social domain exist people whoare actually interested on the target domain at hand. Perhaps this pointsout to extension of this work on how to discover multiple sites and domainswhere context-focused interested individuals gather and socialize.

3. Quality of the keywords documenting and presenting people’s in-terests:One of the problems with systems that utilize the user asserted keywords(such as LiveJournal) is the quality of these keywords. LiveJournal utilizesuser created or asserted keywords for describing their interests. This becomesimportant when a machine learning algorithm processes the data cached fromthese keywords. If not supervised normalized and filtered correctly during cre-ation, these keywords could become problematic and their quality can affectthe result of learning process. To deal with this problem, as stated previouslywe have managed to use the adaptive tree to cluster those keywords whichare important and less importance will be given to keywords less relevant orirrelevant to the process. This can also be measured by measuring distanceto cluster centroids. Another approach is treating keywords as tags. In thiscase quality of these keywords can be measured and possibly filtered [203].

9.5 Conclusions and Future Work

In this work we introduced a machine-learning equipped architecture which utilizesa set of semi- to fully- automated schemes and strategies for adaptive discovery andmining topic-based user profiles, to support the task of mining for personalization,in order to support a recommendation generation later on. We have experimentedwith interest profiles gathered from a popular social network to assist us withevaluating the accuracy and adaptivity of framework. Results present a trade-offbetween strategic approaches which could guide effective query formulation, ex-pansion and analysis of profile data from social web. According to current state ofresults, split-driven technique seems to be the most accurate and adaptive amongall three proposed. This is while depth-based technique shows average performancewhile it takes a bit time and data to converge when it comes to adaptivity. Amongall greedy technique shows poor performance although when it comes to adaptivityit can present more adaptivity as it shows average relativity of profile selectionand categorization. To deal with larger proportions of data, a combination of splitand depth techniques can provide reasonable results for an automated framework.As a future work the framework should be used with further social data, gath-ered possibly from multiple heterogeneous domains, as well as using the resultingprocessed profiles for actual recommendation generation to see if the accuracy ofrecommendation can be drastically improved.

Chapter 10

Ontologies in Trust RecommenderSystems

S. Fazeli, A. Zarghami, N. Dokoohaki, and M. Matskin,Mechanizing Social Trust-Aware Recommenders with T-Index Augmented Trust-worthiness,the 7th international conference on Trust, privacy and security in digi-tal business (TrustBus ’10), vol. 6264, M. S. Sokratis Katsikas, Javier López, Ed.Springer Berlin / Heidelberg, 2010, pp. 202-213-213.

151

Mechanizing Social Trust-AwareRecommenders with T-indexAugmented TrustworthinessSoude Fazeli1, Alireza Zarghami2,Nima Dokoohaki3, Mihhail Matskin4

1Open Universiteit in Netherlands, Centre for Learning Sciences andTechnologies, [email protected] of Electrical Engineering, Mathematics and ComputerScience, University of Twente - The Netherlandsa.zarghami @utwente.nl3Software and Computer Systems (SCS), Information and Communica-tions Technology (ICT), KTH - Royal Institute of Technology, Forum120, 16440- Kista, [email protected] and Computer Systems (SCS), Information and Communica-tions Technology (ICT), KTH - Royal Institute of Technology, Forum120, 16440- Kista, [email protected]

Abstract

Social Networks have dominated growth and popularity of the Web to anextent which has never been witnessed before. Such popularity puts forwardissue of trust to the participants of Social Networks. Collaborative FilteringRecommenders have been among many systems which have begun taking fulladvantage of Social Trust phenomena for generating more accurate predic-tions. For analyzing the evolution of constructed networks of trust, we uti-lize Collaborative Filtering enhanced with T-index as an estimate of a user’strustworthiness to identify and select neighbors in an effective manner. Ourempirical evaluation demonstrates how T-index improves the Trust Networkstructure by generating connections to more trustworthy users. We also showthat exploiting T-index results in better prediction accuracy and coverage ofrecommendations collected along few edges that connect users on a network.

153

154 PAPERS (D1)

10.1 Introduction

Semantic Web vision noted trust as one of the most crucial technologies enablinga future Web of openness and collaboration, collectively referred to as "‘Web ofTrust"’ [27]. Emergence of Social Networks and most importantly Web-Based So-cial Networks (WBSN) [139] from one side, and research into trust from the otherside, combined with Semantic Web technologies created an exclusive opportunityto merge existing efforts and create means for Social Network Analysis (SNA) atthe top of Semantic Web [146]. Among many systems which have realized theimpact of so called "Social Trust", Recommender Systems have been the most in-fluential ones. Recommenders are software systems which retrieve data items onusers behalf, by taking into account similarity between users interests (social or col-laborative based), or just by considering similarity between items (content-based),or by considering both item and user similarity (hybrid). Social Recommender Sys-tems which are extended with trust phenomena have proven to provide users withmore reliable recommendations.

In this paper, we propose a measure called T-index inspired by H-index [165] forenhancing a Social Recommender System. We employ T-index to keep a list of themost trustworthy users who already rated an item. We refer to this list as Top-Trustee list which is attached to each item. As a result, when a user rates an item,she/he is able to find users who might not be accessible within an upper boundof traversal path length, although they can be trustworthy users who share similarinterests in the respective item. We demonstrate how utilizing T-index improvesthe structure of generated trust networks in the context of movie recommendations.

The rest of the work is documented as follows: Section 10.2 provides the backgroundand related works. Section 10.3 describes our approach and then, Section 10.4shows our experimental results and discussions. Finally, we conclude and presentan overview of the future work in Section 10.5.

10.2 Background

As Social Networks has become increasingly popular, there is a growing need tomodel their structure on Semantic Web. FOAF (Friend-of-a-Friend) vocabulary [44]describes users’ information and their social connections through concepts and prop-erties in the form of an ontology using Semantic Web technologies [100], [139].Golbeck [139] proposes an ontology for extending FOAF vocabulary to model trustrelationship between users. Although Golbeck’s ontology provides an efficient struc-ture, every relationship describes only one subject. Dokoohaki et al. [101] introducesan ontology for modeling structure of trust relations between users that is more ef-ficient in terms of the size of the generated networks using ontology. We extendthis ontology to model trust between users with an extra element for measuring

10.3. A SEMANTIC TRUST-WARE RECOMMENDATION FRAMEWORK155

T-index-based trustworthiness of a user.

Massa and Avesani present an architecture for a trust-aware recommender in whichthe "web of trust" is explicitly expressed by users [234]. There exists some effortsto formalize the trust where it can not be explicitly expressed by users. Twocomputational models of trust are proposed by O’Donovan and Smyth [270] asprofile-level and profile-item-level based on the past behavior of user profiles. Lathiaet al. [214] introduce a "value" which is based on difference between a user’s and itsrecommenders’ ratings. This value is used to update the trust between the user andits recommenders. Their presented method is similar to the models presented byO’Donovan and Smyth in [270]. The trust-based collaborative filtering algorithmused in their method requires a centralized user-item matrix which might lead toscalability problem as the number of users increases. Weng et al. [376] assumeeach user as a peer connected to other users in a decentralized trust network ofusers. In this paper, we adapt the formalization presented by Lathia et al. [214] toderive the trust value between users. We propose an agent-setting in which everyuser is considered to be an agent connected to other users to form a trust network.Such a setting should provide better scalability since the distributed allocation oftrust-related data is supported.

10.3 A Semantic Trust-ware Recommendation Framework

Our goal is to create trust relationships among all types of users with respect to dif-ferent types of items, accessible through unique URI across heterogeneous networksand environments. To achieve this, we have developed an ontological framework,shown in Fig. 10.1, composed of three main modules: Semantic Profile Manager,Trust Engine and Recommendation System.

Upon rating an item by a user, the Semantic Profile Manager module either createsor updates an ontology-based profile for both user and item.

The Trust Engine module generates a so-called trust network of users based on theprofile information of users and items in a distributed manner. To do so, a user pro-file extends the trust ontology to keep top-n neighbors and its mutual trust valueswith them. Note that there is no global view of a trust network for users and theyare only provided with information regarding their neighbors and rating history.Therefore, it is possible to maintain users in different groups on several servers toachieve better scalability. To cope with privacy requirements, these servers can belocated in different organizations while profiles of users and items are accessibleonly through their URI.

The Recommendation System module enables traversals through the trust networkto collect recommendations for a target user and finally makes a predicted rating

156 PAPERS (D1)

Figure 10.1: Ontological Framework

for the user.

The whole model is built on top of a knowledge acquisition system to improvemanipulation of ontological data. The presented ontological framework provides uswith high interoperability and openness to deal with heterogeneous networks.

TopTrustee and T-indexIn order to build trust relationships among users, we enhance Collaborative Filter-ing with two novel concepts: T-index and TopTrustee.

T-index

The H-index [165] was defined by Jorge E. Hirsch, a physicist, "‘as the number ofpapers with a citation number higher or equal to H, as a useful index to charac-terize the scientific output of a researcher"’. Extending this idea, we propose anestimate of a user’s trustworthiness called T-index, similar to the H-index in show-ing the number of trust relationships between a user and its trusters with trust


value higher than or equal to T. T-index can be introduced as Indegree of nodesin a trust network which provides not only number of incoming edges as a regularIndegree, but it also considers the weight of incoming trust relationships. For anode on a network, Indegree represents the number of head endpoints adjacent toa node while Outdegree is the number of tail endpoints.

Algorithm 1 Computing T-index1: procedure ComputeT-index(user, TrusterList)2: TrusterV alueList← TrusterList.sort(trustV alue, descending)3: for all trustV alue in TrusterV alueList do4: trustV alue← multiply(trustV alue,MaxT−index).rounded5: end for6: Counter ← 17: for all trustV alue in TrusterV alueList do8: if Counter < trustV alue then9: Counter ← Counter + 110: else11: break12: end if13: end for14: T -index← Counter - 115: return T -index16: end procedure

The algorithm 1 describes how T-index is computed for a user. First, we introducethe maximum value of T-index as a global variable which defines the precision ofT-index computation. Thus, we multiply all trust values (shown as label of arrowsin Fig. 10.2) which are in the the range of 0 to 1, by this maximum value. In theexample presented by Fig. 10.2, we assume the maximum value of T-index as 10,for the sake of simplicity. Then, we start to count the number of trusters until thecounter becomes greater than the trust values.

In this work, we define cluster as a group of users who all trust a common user,called Centric User as the most trustworthy one within the cluster. Fig 10.2 showsua and uf as centric users of two clusters.

Item’s TopTrustee

An item’s TopTrustee is a user who has already rated the item and can join item’sTopTrustee list if its T-index value is higher than a certain threshold. In fact, Top-Trustee list introduces trustworthy users to the user who has just rated the item.The users in TopTrustee list may have no trust relationship with the user yet asthey can not be reached through the maximum path length of L. However, They

158 PAPERS (D1)

might be a source of useful information for the item’s rater. We form TopTrusteelists by exploiting T-index.

Figure 10.2: A scenario of utilizing TopTrustee List

As shown in Fig. 10.2, when ub rates item ia, its mutual trust values with all usersin two sets are computed and updated. The first set is its top-n neighbors as thefirst n users who are not only directly connected to the user but also provide thehighest mutual trust values with the user. The other set is the item’s TopTrusteelist. The arrows between the users and the TopTrustee list show that the users ratedia. uf has rated ia and is already located in ia’s TopTrustee list. After computingthe trust value between ub and uf based on the trust formula presented by [214],ub finds uf more trustworthy than ua as one of its current top-n neighbors eventhough uf is not accessible to ub within path length of L. Eventually, ub adds ufto its top-n neighbors. As a result, ub can be provided by uf with more reliablerecommendations in comparison with ua’s recommendations.

Semantic Profiling ManagerSemantic Profile Manager module is responsible for creating and updating ontology-based profiles for both user and item.

Ontological User Profile

We take advantage of the trust model presented by Dokoohaki et al. [101] to de-fine the trust between users who are expressed using the FOAF Agent concept.Dokoohaki’s trust ontology has three concepts. Relationship is the main elementwhich expresses the trust relations on top of the Social Network of FOAF userprofiles. MainProperties and AuxiliaryProperties are the other main components ofaforementioned ontology, which respectively define essential and optional attributesfor relations which exist in between users on the network. Two associations connectbothMainProperties and AuxiliaryProperties to the Relationship concept. Relation-ship always has a sink and a source, which is described by a Truster and a Trustee.Reader is refered to [101] for more information about the complete structure of


trust ontology. In our model, a trust value is computed based on users’ ratings todifferent items, possibly in different contexts. To compute the trust value betweenusers, we follow the approach proposed in [214] based on the difference of a user’srating and its recommender’s rating to their common item(s). As a result, as thedistance between their rating values increases, trust decreases linearly.

Figure 10.3: User Ontology Model

As shown in Fig. 10.3, we create an instance of Relationship concept between twousers for whom a trust value is computed. The users are specified as Truster andTrustee and their trust value and subject is assigned as MainProperties [101] to theinstance defined earlier. In addition, we assign T-index as a MainProperty of theRelationship instance. We also define the RankRelation concept for associating auser to an item by a rank value. This concept is used to keep track of rated itemsby a user that we refer to as user profile.

Ontological Item Profile

We have developed an ontology for item’s knowledge domain which can be extendedby all other ontologies in the same domain. We introduce a new concept calledTopTrustee, which is derived from the notion of item’s TopTrustee described insection 10.3, and we assign it to an individual item to create a list of users who ratethe item. The list of raters is ordered by their T-index. In a real world scenario,these TopTrustee lists can be implemented by Distributed Hash Tables (DHT) [95]with unique URI as their keys.

160 PAPERS (D1)

Trust EngineSuppose we have two users ua and ub. Trust between them is formalized as follows[214]:

T (ua, ub) = 1−∑ni=1(rua,ii − rub,ii)

rmax ∗ n(10.1)

This formula computes the total differences between a user’s rating values andits recommender’s rating values over n historical ratings of ua multiplied by themaximum value in each rating scale (i.e., 5). This trust value is used to update thetrust between the user and its respective recommenders.

Trust NetworkWe gradually build up the trust relationships between users based on the ratinginformation of user profile and item profile to generate a so-called trust network ofusers.

As mentioned, we keep top-n neighbors of a user in an ontological structure basedon their mutual trust values. The list is updated on "‘rating a new item"’ event. Ifthe event leads up to some modifications in top-n neighbors of a user, then T-indexvalue is recalculated and updated in all TopTrustee lists which contain the user.The scenario is described as follows: when a user rates a new item, we compute itstrust with all item’s TopTrustees who do not exist in its current top-n neighborsbut might be potentially trustworthy users. We also update trust values betweenthe user and its top-n neighbors. Eventually, we form a new top-n neighbors byselecting the most trustworthy users from the union of its preceding neighbors andthe potential trustees.

Recommendation SystemThere is no central view of similar users’ ratings in distributed recommender sys-tems. Thus, in order to generate a recommendation, we need to find a solution forgathering neighbors’ opinions. Traversals through neighbors would be an appropri-ate solution for collecting an item’s ratings. In addition, length of connected edgesbetween users through the trust network should be limited to an upper bound (L).However, defining a suitable value for L is challenging as it leads to a trade offbetween accuracy and performance. Therefore, as the number of parallel traversalsand L increase, we can achieve better prediction accuracy and coverage for recom-mendations, while we require more resources of bandwidth and computations. Onthe other hand, a user is allowed to traverse through its either direct or indirectneighbors as long as its mutual trust value does not fall down a predefined minimumthreshold (v).

10.4. EVALUATION 161

After collecting all the information from a user’s neighborhood by traversals, we aimto minimize the risk of recommending irrelevant items to a user [214]. Therefore,predicted rating value provides us with the fact that whether the user is interestedin an item or not. Prediction value is taken as a weighted average of user a’sneighbors ratings [24]. Reader is advised to refer to Zarghami et al. [388] for moreinformation regarding collecting the recommendations and making predictions.

10.4 Evaluation

Setup

We evaluate above presented method based on MovieLens1 dataset which consistsof 943 user profiles. Ratings are based on five point scale. The profiles are dividedinto training and test sets including 80% and 20% of ratings, respectively. To designontological profiles for user and item, we use Protégé [294]. We take advantage ofProtégé API in Java for implementing the recommendation system. First, we buildup trust-aware social networks as described before, based on the training data andwe visualize the constructed networks by Welkin [374] to study effect of T-indexon structure of the networks. Then, we use a traversal mechanism for collectingrecommendations through the trust networks. In fact, evaluating the trust com-putation is not our concern. As we explained in Section 10.3, we have adapted alight-weight trust formalization to conduct our experiments for investigating theimpact of T-index on the performance of our recommendation system.

In this work, we aim to show how the network structure based on trust relation-ships in a social setting, can be affected by T-index. To do so, we first compareIndegree distribution of top-10 trustworthy users with different values for T-index.Then, we build trust networks with and without T-index to observe the difference.The differences includes both inferred and trimmed edges made when T-index isemployed. We study the effect of T-index variation on the prediction coverage andaccuracy of recommendations collected based on rating values of neighbors whoprovide mutual trust value higher than the minimum threshold(v) as 0.1 and canbe reached within the upper bound for path length of traversals (L) as 3.

We run our experiment in different settings for various sizes of top-n neighborsfor each user as n and TopTrustee list for each item as m. Although utilizingT-index we achieved more improved results, we have gained the most significantimprovement when experimenting withm= 5 and n= 5 in previous work. Therefore,we choose the values of both n and m to be 5 for studying the Indegree distributionand trust networks structure in an effective manner. We also consider differentvalues for T-index which range from 0 meaning no T-index is used to other values

1http://www.cs.umn.edu/research/GroupLens/data/

162 PAPERS (D1)

25, 50, 100, 200, 500, 1000. To study coverage and accuracy, the values of n are tunedto be ∈ 2, 3, 5, 10, 20, 50 while m stays the same as 5.

Results and DiscussionsIn the first step, we study the Indegree distribution of the top-10 trustworthy usersfor various values of T-index while n and m are both equal to 5. As mentionedearlier, Indegree represents incoming edges to a node as a user who is trustedby others. As shown in Fig. 10.4, when T-index is employed (T-index<> 0),the top-10 trustworthy users’ weights in terms of incoming trust relationships aremore balanced. This means that users have on average more opportunities to findthe most similar centric nodes as their main clusters. As a result, the load ofincoming trust relationships imposed on the most trustworthy user, is distributedamong other trustworthy users which makes our recommendation system moreresistant against node failures or bottlenecks on the trust networks. Thus, theresults significantly change when T-index is used, regardless of its non-zero values(25, 50, 100, 200, 500, 1000).

Figure 10.4: The Top-10 trustworthy users Indegree

To study the effect of T-index on trust networks structure, we generate two trustnetworks with and without T-index while n and m are the same as 5. Fig. 10.4shows that the Indegree distribution dramatically declines for the first top-5 trust-worthy users without using T-index and the first top-10 trustworthy users withapplying T-index. However, for the most trustworthy users placed after the firstten, the Indegree distribution has a steady decrease continuously. For the sake ofsimplicity, we only study the trust networks structure of the users who are directlyconnected to at least one of the top-10 trustworthy users.Figs. 10.6 and 10.6 depict the trust networks structure with and without T-index,(T-index=100) and (T-index=0), respectively. Figs. 10.5 and 10.6 show the trustnetworks’ structure with and without T-index, for T-index=100 and T-index=0, re-

10.4. EVALUATION 163

Figure 10.5: Generated Trust Networks for Top-10 Trustworthy Users (n= 5, m=5): Without T-index

Figure 10.6: Generated Trust Networks for Top-10 Trustworthy Users (n= 5, m=5): With T-index= 100

spectively. For the sake of simplicity, we display only users(displayed as nodes) andtheir connections (trust relationships) to top-10 trustworthy users. As mentioned,each cluster is described as a group of like-minded users in terms of trust. It isshown that the number of common users between clusters increases which enablesusers of different clusters to find each other easier. In our case, more users formdivergent areas of users’ interests, presented as clusters, can be accessible.To justify the results, we compare the formed trust networks with and without T-

164 PAPERS (D1)

Figure 10.7: Alignment of Trust Networks for Top-10 Trustworthy Users (n= 5,m= 5) : Inferred Edges

index to show the inferred and trimmed edges individually. Fig. 10.7 indicates thatinferred edges are mostly located between centric nodes. Therefore, the numberof users which belong to different clusters, grows in the centric area of the figure.In contrast, 10.8 reveals that most of the trimmed edges are located in just onecluster.Finally, we study coverage and MAE of the generated recommendations for severaln with different T-index values while the value for m is the same and equal to 5.As shown in Fig. 10.9, the minimum coverage for n= 2 without T-index is morethan 85% which is improved in camparison with the result of similar work [376]with coverage< 60% at the same path length (L= 3) and even for larger sizes ofn. Fig. 10.9 shows that coverage has improved at all values of n when T-indexis employed. We also demonstrate that the coverage improvement is almost thesame for all non-zero values of T-index. Nevertheless, we achieve better results forcoverage as the size of neighbors list (n) decreases. As shown in Fig. 10.10, themaximum MAE value for n= 2 without T-index is less than 0.91 which outperformsa similar work [376] with MAE> 0.96 considering the same threshold for pathlength of traversals (L= 3). It shows that including items’ TopTrusteeList in "top-nneighbors" can improve the results. On the other hand, it reveals that utilizingT-index achieves better results. As with coverage, we observe in Fig. 10.10 thatT-index improves MAE for all values of n. However, the extent of improvementof MAE changes with a constant value of T-index and different values of n. Forinstance, although MAE has the most effective result with T-index= 100 and n= 5,it has its worst value with the same T-index when n= 10. Despite coverage, T-index

10.5. CONCLUSION AND FURTHER WORK 165

Figure 10.8: Generated Trust Networks for Top-10 Trustworthy Users (n= 5, m=5): With T-index= 100

Figure 10.9: Comparing the results based on different T-index values: Coverage

does not always make MAE better as the size of neighborhood list decreases. Fig.10.10 shows that MAE is improved significantly with T-index when n= 5 and 10whereas MAE result is trivial when n= 3 and 50. In conclusion, while using T-indexresults in better prediction accuracy and coverage of recommendations, accuracy ismore affected by different values of T-index and the size of neighborhood list (n).

10.5 Conclusion and Further Work

In this work, we have formed trust networks of users on which recommendations arecollected by neighbors either directly connected or indirectly connected. The indi-rect relationships between users are established through trust propagation mech-

166 PAPERS (D1)

Figure 10.10: Comparing the results based on different T-index values: MAE

anism. We have proposed an estimate of a user’s trustworthiness called T-index,similar to H-index [165] to show the number of trust relationships between a userand its trusters with trust value higher or equal to T. We employ T-index to forman item’s TopTrustee list which include users who might not be reachable througha predefined maximum path length of traversals. We have shown that by utiliz-ing items’ TopTrustee list, traversals length for finding users who rate a desireditem, decreases which results in high performance. To justify the results, we haveanalyzed and visualized the effect of T-index on the structure of generated trustnetworks based on the experimental data. We have demonstrated that T-indexboosts the number of common users between different clusters. It results in betterprediction coverage and accuracy of recommendations collected within few edgesthat connect users on trust networks.

We plan to assess T-index value for each user in a distributed manner like gossipbased aggregation [175] for alleviating the problem of malicious nodes on trustnetworks.

Acknowledgment

This work has been done within the FP7-216923 EU IST funded SMARTMUSEUMproject and is part of the IOP GenCom U-Care project which is sponsored by theDutch government under contract IGC0816.

Chapter 11

Management of Profiles in TrustRecommender Systems

S. Magureanu, N. Dokoohaki, S. Mokarizadeh, and M. Matskin,Epidemic Trust-based Recommender Systems,IEEE international conference on Social Computing 2012 (SocialCom ’12), 2012.

167

Design and Analysis of AGossip-based Decentralized TrustRecommender SystemStefan Magureanu1,Nima Dokoohaki2,Shahab Mokarizadeh3, Mihhail Matskin4

1234Software and Computer Systems (SCS), Information and Commu-nications Technology (ICT), KTH - Royal Institute of Technology,Forum 120, 16440- Kista, Sweden{magur, nimad, shahabm, misha}@kth.se

Abstract

Information overload has become an increasingly common problem in today’slarge scale internet applications. Collaborative filtering(CF) recommendationsystems have emerged as a popular solution to this problem by taking advan-tage of underlying social networks. Traditional CF recommenders suffer fromlack of scalability [314] while decentralized recommendation systems (DHT-based, Gossip-based etc.) have promised to alleviate this problem. Thus,in this paper we propose a decentralized approach to CF recommender sys-tems that takes advantage of the popular P2P T-Man algorithm to createand maintain an overlay network capable of generating predictions based ononly local information. We analyze our approaches performance in terms ofprediction accuracy and item-coverage function of neighborhood size as wellas number of T-Man rounds. We show our system achieves better accuracythan previous approaches while implementing a highly scalable, decentralizedparadigm. We also show our system is able to generate predictions for a largefraction of users, which is comparable with the centralized approaches.

11.1 Introduction

In today’s large scale internet applications, users are dealing with very large amountsof data that can become time-consuming to analyze. This is known as the informa-tion overload problem. A popular way to address this matter is to use recommenda-tion systems. The most common use of such systems is in e-commerce applications

169

170 PAPERS (D2)

where a user needs to browse very large databases of items. A CF recommendersystem can alleviate this problem by offering the user a shortened list of items whichother clients with similar taste have found interesting. Recommender systems arealso being used in social networks as a way of helping users discover new links. CFRecommenders rely on the presumption that people tend to assign more weightto suggestions coming from friends or people with similar interests. CF strategiessimilarly put more weight on suggestions received from more similar users. Thismeans that this class of approaches rely on having enough information on usersto determine the relationship between them with a high degree of accuracy. Thisleads to the recommendation accuracy and coverage being dependent on whetherthe system can accurately determine the type of relationship between users. Analternative approach uses matrix factorization (MF) to generate predictions evenfor very sparse datasets. At the cost of accuracy, MF-based approaches manageto produce more predictions than CF recommenders, this being the reason whyMF-based recommender systems are growing in popularity, as real-life databasesare often very sparse.

Traditional recommender systems (both CF and MF based) are implemented in acentralized fashion, in order to increase item coverage. This paradigm however re-sults in high computational costs and renders these systems impractical and costlyto run. Thus, decentralized approaches are needed in practice. Distributed ap-proaches rely on techniques borrowed from P2P(peer-to-peer) and Grids such asDHTs or Gossip-based algorithms. Since traditional CF recommenders tend togroup similar users together, which is exactly what the Network Overlay is doing,there is no significant loss in prediction accuracy. To further improve performance,it is desirable to implement trust-awareness withing neighborhoods, as a comple-ment to this method.

In this paper we will address the scalability problem of the centralized trust-basedrecommenders by using popular techniques from P2P systems. We propose theuse of T-Man [174] to cluster similar users together and use a novel trust inferencemodel to improve prediction accuracy over previous trust metrics. We showcasethe improvements our approach achieves and analyze its performance over two im-portant datasets, namely Epinions and Yahoo! Webscope [87]. We analyze theinfluence of neighborhood size and the number of T-Man rounds on prediction ac-curacy and item coverage. Also, we propose methods for increasing item coverageby varying the distance metric used by T-Man and introducing recursive predic-tions.

In the following section we will describe previous work in the field of trust-awarerecommender systems with a focus on decentralized CF. In section 12.2 we willpresent our approach to creating the network overlay and to the computation oftrusts between neighboring nodes. Section 11.3 will be dedicated to accommodatingour experimental setup while section 11.4 will contain the results of our experiments

11.2. BACKGROUND 171

and evaluation of the performance of the system. The final section will be dedicatedto the conclusions and future work.

11.2 Background

In this section we will be briefly describing previous work in the domain of rec-ommender systems with a focus on decentralized approaches and trust inferencetechniques. Gossip-based recommenders rely on epidemic network overlay algo-rithms to allow nodes to generate recommendation by only using a limited amountof information available to them in the overlay network. The main advantages ofthese algorithms are their intrinsic scalability and their ability to generate predic-tions very fast, since only local information is used.

Decentralized CFUsing peer-to-peer techniques in the context of distributed recommender systemshas been considered in other works. This paradigm shift is common when dealingwith very large databases such as the case of social networks, due to its intrin-sic scalability. Research into the field of recommenders has also shown interestin decentralized approaches as a replacement for the more traditional centralizedtechniques. In [400], Ziegler presents a detailed analysis of the challenges relatedto decentralized recommenders and proposes a framework for implementing suchsystems.

The two main types of CF recommenders are memory-based and model-based.Memory-based CF recommenders use the ratings of a subset of users to gener-ate recommendations, usually the most similar users to a user or its neighbors inthe network. This method is usually referred to as user-based CF. A variation ofthis method uses the same principle but it predicts the rating of an item based onsimilar items rather than users. This method is referred to as item-based CF(orcontent-based CF). Model-based CF recommenders rely on creating a model froma collection of users and items. The resulting model can then be used to makerecommendations without the need of storing the collection from which it was in-ferred. The drawback of this approach is that creating a model is usually morecomplicated and harder to implement than simply using ratings expressed by otherusers directly, as in the case of memory-based recommenders.

Model-based Approaches

Random walk models, such as those presented by M. Gori and M. Jamali in [150]and [173] , have been suggested as a solution to sparsity in CF recommender sys-tems. Random walkers work by exploring the social network, starting from a userand taking a probabilistic path outwards into the network until it reaches a certain

172 PAPERS (D2)

depth. The probability of a path being chosen is usually dependent on trust valuesbetween nodes. During this exploration, the walker uses the ratings of encounteredusers to create a model that can later be used for recommendations. In [251], B. N.Miller proposes a peer-to-peer recommender system in which nodes exchange rat-ings with a neighbor at each step in order to construct an item to item similaritymatrix which can then be used to make offline predictions. The choice of neighborsas well as determining the neighbors of a user are implementation dependent inthis approach. Unlike our approach, in [251], Miller does not maintain an overlaynetwork. This is understandable since his proposed system does not need to keepsimilar profiles easily accessible and only needs a profile for a one-time computation,after which it can be discarded. In our approach, the most similar profiles mustbe consulted for every recommendation, meaning an overlay network is needed inorder to keep those profiles easily accessible.

Memory-based Approaches

In [154], a DHT-based (Distributed Hash Table) approach is suggested, where thecentral dataset is organized into "buckets" of users which can be saved on individualnodes, each user using his most suitable "bucket" to choose neighbors with whichto generate predictions. In [314], user clustering is suggested as a solution forsolving scalability problems as well as a means of improving accuracy. Unlike ourapproach, Sarwar et. al. [314] presents clusters as groups of users where all the usersin a cluster are each other’s neighbors, whereas in our case, the "neighbor" relationis directional. A directional "neighbor" relation is desirable since, while a user’sneighbors will be the most similar users to it, there might be others that are moresimilar to a neighbor than said user. Ormandi et. al. [277] determines that usinggossip based algorithms to cluster a network in the context of recommender systemsoffers potential for increasing accuracy of prediction. This is particularly interestingfor our work since in [277] the main algorithms being tested are variations of the T-Man algorithm. However the aforementioned work does not analyze item coverageand does not cover trust-awareness in recommender systems, instead focusing onload-balancing.

Trust InferenceTrust has been the focus of much research since it emerged as a reliable means ofimproving recommendation accuracy. Trust is presented by Mui et. al. in [256] as"a subjective expectation an agent has about another’s future behavior based onthe history of their encounters". Several other definitions are presented by Zhou et.al. in [397]. Zhou also presents a more thorough presentation of a wide range ofapproaches to trust-aware recommender systems.

Throughout this paper we use the term trust to denote the confidence a user has inthe recommendations of another. As discussed in [235], trust complements CF rec-


ommenders by addressing such problems as the reduced computability of similaritybetween users and improving accuracy of prediction. In [385], Yuan et al. describestrust networks as being social networks with user defined trust networks. The au-thors determine that this type of networks hold the property of small-worldness,which involves having closely clustered users and small average path lengths be-tween any two users. They then use this finding to define a model for recommendersystems that takes advantage of the small-worldness of social networks in orderto increase both accuracy and item coverage. Several approaches, such as Gol-beck [141], Kuter et al. [209], Avesani et al. [16], DuBois et al. [109] and Zarghamiet al. [388], also exploit underlying mechanism in a network that allows for explic-itly stated trust statements between users. However, not all systems support suchfeatures and the ability of users to express confidence in others is limited due tothe time and effort required to evaluate other members of the network in order toform an opinion. Therefor, the ability of recommender systems to infer trusts fromlimited knowledge is still a desired feature.

P. Victor et. al. in [355] proposes a model that uses distrust to complement trust.This approach helps deal more effectively with users that have undesired behavior.The concept of distrust is also used in [353] by N. Verbiest et al. In their work,Verbiest et al. analyses the effect of path length on trust and accuracy. This isparticularly interesting to our work since we also observe the effects of using furtherneighbors on the accuracy and item coverage of our recommender system.

The technique used to infer trust between users is critical to the accuracy of a trust-based CF recommender. Pearson similarity is a popular weight metric, however, asshown in [236], using a more complex weighing measure than just similarity has thepotential to offer more accurate results, especially in sparse datasets. Approachessuch as those proposed by J. Golbeck et al. [141], [209] take advantage of trustratings explicitly stated by the users themselves to infer trusts between nearbymembers of the network through trust propagation. In [388], Fazeli et al. proposesthe use of a local trust metric together with a global trust metric computed usingthe in-degree of a node in relation with the trust on each incoming edge. The globaltrusts are then used to create a global repository of top trusted users for each itemwhich can then be referenced by other nodes in order to find new neighbors inthe network. It is important to note that our trust inference technique does notrequire any user-defined trust between nodes and it computes trust knowing onlyuser ratings.

metric proposed by O’Donovan and Smyth in [270] is similar to ours in this respect.O’Donovan uses the known ratings to create an artificial history of predictions foreach user. By predicting the known ratings of users using all the other users andcounting the amount of correct predictions that each user makes, O’Donovan andSmyth establish a global trust for each user as the ratio of correct predictions to to-tal predictions of a user(in [270], they also propose item level trust which is similar

174 PAPERS (D2)

to user level trust only applied on items; both trust models can be used concur-rently to offer better results). Our approach follows the same path of using artificialpredictions of known ratings to adjust trust values for users. However, we computethe trusts differently, solely within a neighborhood, resulting in local trusts ratherthan global trusts. O’Donovan’s method requires the analysis of the whole databasewhen computing a user’s trust rating which will greatly impede scalability andwill become more computationally demanding as the number of users grows. Ourtechnique can calculate a user’s trust towards any subset of users in the network,making it easily implementable in a decentralized paradigm.

Unlike us, O’Donovan computes trusts as an absolute feature of a user, all usersin the network have the same trust in a given user. We refer to our trusts localbecause, as we will show in section III, the values computed are relevant only inthe context of a neighborhood. Thus, the trust between two users depends on theprofiles of the other neighbors as well as the profiles of the two users.

11.3 Approach

In this section we will present our proposed approach. First, we will describe howwe use the T-Man algorithm in the context of a recommender system for socialnetworks, and then we describe the trust inference technique we use to computethe trust between two users based on their known ratings.

Network OverlayA network overlay is a network constructed on top of another network, by reor-ganizing the logical links between nodes in order to make it more suitable for theapplication logic. In our case, the overlay network will be built on top of the socialnetwork and a user’s resulting neighbors will not necessarily be his friends in thesocial network.

Distance Metric

The T-Man algorithm [174] is widely used in P2P systems for obtaining overlaynetworks for a very diverse range of purposes. T-Man is a gossip algorithm thatworks by having each node maintain a set of neighbors by exchanging on each it-eration neighborhood entries with a node in that set deemed most suitable. Afterthe exchange takes place, both nodes will replace entries from their neighborhoodwith nodes that are more suitable from the received set of entries. Suitability isusually represented by a distance function that is implementation dependent. Toincrease convergence speed, T-Man is usually used in conjunction with Cyclon [357],a gossip-based random peer sampling algorithm. Thus, each node will have a ran-dom view beside its neighborhood. After each exchange of entries with a peer,along with the step described earlier, each node also keeps the most suitable entries

11.3. APPROACH 175

from its random view in its neighborhood.

Our goal is to fill each user’s neighborhood with its most similar peers. An intu-itive distance metric for our case would be using the Pearson similarity coefficient.However, Pearson similarity has a negative impact on the number of relevant pre-dictions the system is able to make, since it disregards the number of items incommon between two users. Thus, we will use a slightly different version :

Similarity(u1, u2) =∑i ru1,i × ru2,i√∑

i C × r2u1,i×

√∑i r

2u2,i

(11.1)

where ru,i is the rating user u assigned to item i and C is a value in the interval[0,1] if u2 has not rated item i and 1 if u2 has rated item i. The higher C is, themore weight we put on the two users having common rated items at the expense ofputting less weight how similar their ratings are. This new metric offers a balancebetween how similar two users are in terms of ratings for common items as well asin terms of the number of items in common. In our experiments we use C = 0.5.We can now form neighborhoods based on the similarity of ratings in users profilesas well as the number of items they have in common, meaning we are more likelyto be able to make predictions for items more relevant to users. It is important tonote that in the trust inference step and recommendation step, only the nodes inthe neighborhood will be used.

Dealing With Sparsity

In order to deal with potential coverage issues, we propose using recursive ratingprediction requests. This way, if a neighbor does not have the desired item rated,it can ask its neighbors for a prediction for the item in question, and pass it to theuser asking for a prediction as its own. This will greatly reduce situations in whichnone of the neighbors of a user have the item the user is interested in. However,this behavior can not scale very well, the number of involved neighbors potentiallyincreases exponentially. Fortunately, as we will present in the section V, having re-cursive calls with a depth of maximum 2 is sufficient for even a very sparse database.To reduce complexity for higher range values, recursive calls can be made only whennone of the neighbors have the desired item rated. Furthermore, the overhead canbe reduced by allowing nodes to store the ratings of their neighbors locally, thussignificantly reducing the number of nodes involved in recursive predictions at theexpense of used memory.

Trust InferenceOur approach for computing trust is inspired by machine learning techniques [231],in the sense that we use a user’s known ratings as a training set, based on whichwe tune the trusts so that we obtain sufficiently accurate predictions for the known

176 PAPERS (D2)

ratings. In our experiment, trust is computed after a certain number of T-Manrounds determines the neighborhood of each user. In real scenarios, the T-Manalgorithm can be run continuously and the trusts can be computed on stable neigh-borhoods, where a stable neighborhood is one that has not changed in a set numberof rounds.

Modeling the system

We chose to use the formula described by (11.2) to calculate predictions of an item’srating for a user. The formula is one of the most popular ones for predicting ratingsand is also used by Resnick prediction and O’Donovan et al in [270].

∑n(rn,i − rn)× wn∑

n wn= ri − r (11.2)

where rn,i is the rating of neighbor n for item i, rn is the average of user n’s ratings,wn is the weight the user assigns to neighbor n(or the trust of the user for n; wewill be using the terms "trust" and "weight" interchangeably), ri is the user’s ratingfor item i and r is the user’s average rating. It should be noted that the denom-inator sum is not of weights in absolute value, meaning we only consider weightsto be positive. This makes perfect sense since we are interested in the proportionsbetween them and not their actual values.

Given the fact that we know all the ratings involved, since we are applying it foritems a user has already rated, we can interpret (11.2) as an equation with thetrusts representing the variables. If we use the formula for every item a user hasrated, we can form a linear system of I equations and N variables, where I is thenumber of rated items the user currently has, and N is the number of neighbors,of the form:

∑n (rn,0 − rn − r0 + r)× wn = 0

· · ·∑n (rn,i − rn − ri + r)× wn = 0

(11.3)

This system most likely will not have positive solutions, however we can try toapproximate them. For this step, we have chosen to use the algorithm proposedby D. Cartwright in [64], which uses expectation maximization [231] and iterativeproportional fitting to gradually converge to an approximation of the solution ofsystems similar to the one above. However, in order to be able to use this method,we must first overcome the constraint that the system can only have positive coef-ficients. To ensure that the coefficients remain positive no matter what ratings areinvolved in the calculation, we will modify the system, as follows:

11.3. APPROACH 177

Figure 11.1: Trust distribution obtained for Yahoo! Webscope dataset. Eachbracket represents an interval of size 0.1 in the allowed trust interval of [0,1].

∑n (2×Rmax + rn,0 − rn − r0 + r)× wn = C

· · ·∑n (2×Rmax + rn,i − rn − ri + r)× wn = C∑n wn = N × Trustmean

(11.4)

where C = 2×Rmax ×∑n wn which, considering the last equation in the system,

will actually be a constant C = 2×Rmax×N ×Trustmean. In the above formulae,Rmax is the maximum rating available in the profile database. It is now obviousthat all the coefficients will become positive.

Approximating the Trusts

As we stated earlier, we will use D. Cartwright’s approach to approximate solutionsfor equation (11.4) . Our case is a particularly simplified example of the types ofsystems the algorithm can solve, meaning the algorithm can be reduced to a muchsimpler form. Thus, the algorithm is modified as follows:where Ci is the value of the sum in the left term of equation i of the systemwith the current values for weights, coefi,n is the coefficient of wn in equation i,or 2 × Rmax + rn,i − rn − ri + r, and C is the constant mentioned above. Alsowe must mention that we initialize the weights with random values in the trustinterval we chose to use. It is very important to state that the trust values areupdated simultaneously. The terminating condition is described in more detail inthe following section.

178 PAPERS (D2)

Algorithm 2 Approximating the weightsprocedure ApproximateWeights(neighbors, Items)

while not converged dofor all n in neighbors do

wn = wn ×∑

i∈Items

CCi×coefi,n∑

i∈Itemscoefi,n

end forend while

end procedure

Convergence

In Cartwright’s algorithm [64], the convergence of the algorithm is given by thecondition that the Kullback-Leibner divergence between the solutions vector beforeand after a step should be below a certain threshold. However, we want to achievea balanced trust distribution and we prefer to prevent some users from receivingan imbalanced high trust value. For example, if one of the neighbors had exactlythe same ratings as the user, to solve the system, an obvious solution is to give allthe other neighbors 0 trust and use the value required to satisfy the last equationof (11.4) , i.e. N × Trustmean. Such situations are often encountered in sparsedatasets, when the user has a small number of rated items but a significantly highernumber of neighbors, resulting in far more variables than equations.To deal with this setback, we will limit the values for trust to a certain interval.We used a minimum value of 0.1 and a maximum value of 1. If at a step, atrust value exceeds either, it is rounded to the appropriate end of the interval aftereach iteration. Even so, trust distribution tends to clutter around the edges ofthe interval, so we interrupt the iterations when the number of values that havereached either end of the interval exceeds 10% of the size of the neighborhood. Thisvalue can be changed if needed. However, we observed that it gave a good trustdistribution across the allowed interval as shown in figure 11.1.

11.4 Experimental Setup

To test the performance of our system and trust inference method, we used twodatasets. The first dataset is obtained from Yahoo! Webscope [87], which rep-resents relatively less sparse databases. The second is from Epinions.com, whichreflects the sparse databases one would expect to encounter in a real scenario. TheYahoo dataset consists of 15,400 users and 300,000 ratings for 1,000 items, with aminimum of 10 ratings per user. The Epinions 1 dataset consists of 49,290 users,139,738 items and 664,824 ratings, some of the users(9,127 of them) having empty

1Epinions dataset available at http://www.trustlet.org/wiki/Epinions_datasets

11.5. EXPERIMENT RESULTS 179

profiles. This sample has a more disproportionate distribution of ratings across thescale 1 to 5, a large majority of the ratings being either 4 or 5. Also, in the case ofthe Epinions dataset, the average number of ratings per user is only 13.

To evaluate our methods, we will use the leave-one-out methodology in which wehide one rating for every user and then run the algorithm and try to predict thehidden ratings. We initialize the neighbors with a set of randomly picked nodes. Ina real scenario, this initialization could be replaced by choosing to use the friendsof a user in a social network for example. As a measure of item coverage, weevaluate the number of users for which the system can generate a prediction for thehidden rating i.e. at least one of the neighbors can provide a rating for the item inquestion. We evaluate the performance on the above mentioned datasets using 3trust metrics: Pearson correlation, the metric proposed by O’Donovan in [270] andour approach, presented earlier.

11.5 Experiment Results

In this section we will discuss the performance of our system in terms of accuracy ofprediction and coverage. We will be measuring accuracy in terms of MAE (MeanAbsolute Error) and Coverage as the number of users for which the system cangenerate a prediction for the requested item i.e. the item hidden from the profileof the user.

We have structured our findings in two sections. First, we will present the perfor-mance of our trust metric and compare the obtained MAE by different approachesagainst the chosen baselines. Secondly, we will present the influence of the net-work overlay algorithm and the results of our attempts to maximize coverage inour decentralized system.

Trust Metric and MAEIn this section we will discuss the results of our trust metric compared to previ-ous approaches and evaluate the evolution of accuracy and coverage function ofthe neighborhood size. The chosen baselines for our system are regular predictiontaking all the users into account with trust represented by the Pearson similarityand by the trust metric proposed by O’Donovan in [270]. These baselines are re-ferred to as Pearson control and O’Donovan control in the figures in this section.We compare the performance of these two methods against our system using theoverlay algorithm in combination with the trust metrics used in the baselines aswell as with our own metric, described in the section 11.3.

Figure 11.2 shows the performance of the different trust metrics over the size of theneighborhood in the case of the Yahoo! Webscope dataset. We can observe that allthe metrics have increased accuracy with the size of the neighborhood and all show

180 PAPERS (D2)

Figure 11.2: The evolution of MAE over neighborhood size for Yahoo! Web-scope(left) and Epinions(right) datasets after 150 T-Man rounds. We use our de-centralized approach in combination with Pearson similarity, O’Donovan’s trustmetric and our trust metric and compare with the baselines set by O’Donovan’strust metric and Pearson similarity over the whole dataset.

very similar performance. The chosen baselines are the variations of the Resnickprediction technique, presented in [270], (using Pearson correlation as similarityand O’Donovan trusts) over the whole set of users.

We notice that our decentralized approach yields better results than the best per-forming baseline(0.95 compared to the baseline of 1.01), while presenting the addedadvantage of scalability. The trust metric used in neighborhoods seems not to in-fluence performance, Pearson, O’Donovan trust and our trust showing very closeperformance. This is because the Yahoo dataset only presents the ratings for 1,000items (compared to 139,738 in the case of Epinions). This will allow for more rel-evant similarity computations when forming neighborhoods since odds are nodes


Figure 11.3: Evolution of MAE over T-Man rounds for different trust metrics onEpinions dataset.

will encounter peers with more items in common, compared to Epinions. Since theresulting neighbors will have a higher number of items rated similarly to the activeuser, chances are they will also have other interesting items for the active user ratedsimilarly among themselves. Meaning that, when making predictions, since all thesuggestions will be similar, the weights of each user will be less relevant than in thecase of Epinions.

On the Epinions dataset however, our trust metric outperforms Pearson similarityimplemented both in a distributed and in a centralized fashion. O’Donovan trustperforms better especially for smaller neighborhood sizes, however, we must notethat these trusts were computed using the whole database as a training set(bestperformance scenario). Even so, the results are similar for high neighborhood sizeto our trust metric. O’Donovan’s metric achieves better accuracy partly because itoften fails to make predictions for items that have been rated by only a few usersand for which other approaches produce predictions of lower quality. This failureis due to not finding two users that have rated an item within the designated errorthreshold required for computing O’Donovan trust. Also, implementing this metricin a purely decentralized fashion would be significantly more difficult as it requiresa large training set. In these experiments, the distance metric chosen for formingthe overlay is the variation of Pearson similarity presented in formula (11.1).

In figure 11.3 we can clearly notice how accuracy increases as the network overlayconverges. In this experiment we compare the evolution of accuracy with the num-ber of T-Man rounds and also compare the two earlier mentioned distance metricsused in the T-Man implementation: regular Pearson similarity and its variationpresented in formula (11.1). From the point of view of prediction accuracy, both

182 PAPERS (D2)

Figure 11.4: Influence of search range on item coverage and prediction accuracy forEpinions dataset.

distance metrics give similar results, with the proposed variation yielding slightlybetter results when combined with our proposed trust. In the case of using Pear-son similarity as trust, the variation only gives lower errors in the earlier roundswhile using the default Pearson similarity as a distance metric offers slightly betteraccuracy near the convergance of the network.

Network Overlay and Coverage

We will refer to a request from a node to a neighbor for the rating of an item asa prediction request. A prediction request is successful if the neighbor can providea rating for the desired item (either his own rating or a prediction that it obtains


from its own neighbors). In section III we discussed ways in which we can overcomesparsity and we proposed two methods which can be used concurrently. The firstmethod is using a slight variation of the Pearson similarity as the distance metricin the implementation of the T-Man overlay management algorithm. The specifiedfunction is presented in equation (11.1). The second is using recursive predictionrequests which will allow neighbors that do not have a requested item rated to sub-sequently ask its neighbors for a prediction of the rating for the item in questionand pass the prediction as its own rating to the initiating node. To avoid suchrequests going on forever, we limit such calls by a time-to-live, which we refer to assearch range.

Figure 11.4 presents the results of varying the search range for the Epinions dataset.It is worth noting that in terms of coverage, our proposed decentralized approachyields very similar results for our trust metric and Pearson similarity, analyzed inthe previous subsection. In the left image of figure 11.4, we can clearly observe thatusing regular prediction techniques, the coverage is very poor for sparse datasetslike the one analyzed in this experiment. For 120 neighbors, less than 25% of thenodes manage to obtain a prediction from their direct neighbors, meaning that in75% of the cases, none of the 120 neighbors had the item in question rated. Sincethe Epinions dataset is a good representation of a real-life scenario, such low cov-erage is unacceptable.

We notice that increasing the range to 1 (i.e. allowing immediate neighbors to asktheir neighbors for a prediction for that item) greatly increases coverage. In thiscase coverage in the case of 120 neighbors approaches the baseline set by usingResnick prediction with Pearson similarity over the whole dataset (73% coverage).The baseline is only 73% because 9,127 of the 49,290 in the network have no itemsrated meaning we can not verify prediction results against their real rating and weconsider these requests as unsuccessful. On top of that, we have hidden 1 item fromeach user, meaning that sparsity is even worse than in the initial dataset. Despitebeing computed in a centralized fashion, O’Donovan trust offers a significant lossof over 10% total item coverage across the whole range of neighbors(15% for range2), as we stated in section 5.1. If we increase the coverage to match that of ourmetric, the MAE in the case of Epinions would increase to levels comparable tothat of Pearson similarity.

Once we increase the search range to 2, we notice coverage is very close to thebaseline starting from a neighborhood size of only 35. Even so, using a range of 2could potentially involve 353 nodes in a single request, for the worse case scenario,which might produce significant overhead. However, it is still less demanding andmore scalable than having to consult the ratings of every user in the network, sincerecursive queries can be done asynchronously between nodes. We also proposed afew ways of reducing the overhead of recursive hops in section 3.1.2. Given thesparsity of the dataset used for this experiment, we believe that using a search

184 PAPERS (D2)

Figure 11.5: Influence of distance metric on coverage as network converges in thecase of Epinions.

range higher than this is unneccessary for most real-life cases.

In the plot on the right of figure 11.4, we notice the effect of increasing the searchrange on prediction accuracy. We notice that both in the case of our trusts andO’Donovan trusts, using a range of 2 yields better results than using a range of 1.This is to be expected since there will be significantly more ratings available for theprediction phase. The advantages of using a search range of 2 are particularly ob-vious in the case of the O’Donovan metric where the accuracy is increased sensibly,significantly surpassing the centralized approach using the same trust metric(1.02MAE compared to the centralized approach of 1.17). From this experiment wecan infer that choosing a range for requests will represent a compromise betweenaccuracy and coverage, on one side, and overhead on the other.

Figure 11.5 depicts how coverage is affected by the convergence of the network andby the distance metric used by the T-Man algorithm, discussed at the beginningof the section 3. We can observe that as neighbors become more similar to auser, coverage increases significantly. The experiment was run on the Epinionsdataset, for a neighborhood size of 60 and using a search range of 1. We notice howcoverage is increased when using the variation of the Pearson similarity mentionedin equation (11.1). This makes perfect sense since taking the number of itemsin common between two users into account when calculating similarity increasesthe probability that highly similar users will also be interested in similar items.Taking into account the results presented in figure 11.3, we notice that our proposedvariation offers better coverage and better accuracy, as in the case of using ourproposed trust metric for rating prediction. Thus, we can assume that our metricis overall a more desirable distance metric to be used in the implementation of the

11.6. CONCLUSION AND FUTURE WORK 185

T-Man algorithm.


In this work, we have used techniques inspired from P2P applications to create ascalable recommender system model. We have used the T-Man algorithm to clus-ter similar users together using a variation of the Pearson similarity as a distancemetric. We have shown the improvements in item coverage and accuracy the pro-posed variation achieves and implemented a new trust inference method to obtaindirectional trust values between a user and its neighbors while only knowing theirratings. Our inference method can compute trusts between a user and any givensubset of neighbors by having them predict the user’s known ratings and modifyingtrusts values until predictions are close to the known ratings. We showed that ourdecentralized approach achieves better accuracy than two popular centralized mod-els while maintaining comparable item coverage. Also, our trust inference methodin the context of our decentralized approach performs better than using Pearsonsimilarity and is comparable to the popular trust metric proposed by O’Donovanand Smyth in [270], while being, easily applicable in a decentralized model, in con-trast to the latter metric. We also showed our trust metric allows for higher itemcoverage than O’Donovan’s, even though the latter metric is computed centrallyand ours only requires a limited neighborhood.

Future work will include observing the influence of different privacy settings ofusers on trust values and implementing a policy for punishing users that share toofew items through lowering their trust. Also, it is worth exploring the potential ofother distance metrics for T-Man and/or prediction formulae and the effectivenessof implementing our trust metric in a centralized fashion. We are also interestedin pursuing gradient-based boosting algorithms as a possible replacement for ourtrust inferrence method.

Chapter 12

Modeling and Evaluating Privacyin Trust-Based RecommendationSystems

N. Dokoohaki, C. Kaleli, H. Polat, and M. Matskin,Achieving Optimal Privacy in Trust-Aware Collaborative Filtering RecommenderSystems, 2nd International Conference on Social Informatics (SocInfo ’10), LNCS6430, pp. 62-79, Springer, Heidelberg, 2010.

187

Achieving Optimal Privacy inTrust-Aware Social RecommenderSystemsNima Dokoohaki1, Cihan Kaleli2,Huseyin Polat2, Mihhail Matskin1

1Software and Computer Systems (SCS), Information and Communica-tions Technology (ICT), KTH - Royal Institute of Technology, Forum120, 16440- Kista, Sweden{nimad,misha}@kth.se2Department of Computer Engineering, Anadolu University, Eskise-hir, 26470, Turkey{ckaleli,polath}@anadolu.edu.tr

Abstract

Collaborative filtering (CF) recommenders are subject to numerous short-comings such as centralized processing, vulnerability to shilling attacks, andmost important of all privacy. To overcome these obstacles, researchers pro-posed for utilization of interpersonal trust between users, to alleviate many ofthese crucial shortcomings. Till now, attention has been mainly paid to strongpoints about trust-aware recommenders such as alleviating profile sparsity orcalculation cost efficiency, while least attention has been paid on investigatingthe notion of privacy surrounding the disclosure of individual ratings and mostimportantly protection of trust computation across social networks formingthe backbone of these systems. To contribute to addressing problem of pri-vacy in trust-aware recommenders, within this paper, first we introduce aframework for enabling privacy-preserving trust-aware recommendation gen-eration. While trust mechanism aims at elevating recommenders accuracy,to preserve privacy, accuracy of the system needs to be decreased. Sincewithin this context, privacy and accuracy are conflicting goals we show thata Pareto set can be found as an optimal setting for both privacy-preservingand trust-enabling mechanisms. We show that this Pareto set, when used asthe configuration for measuring the accuracy of base collaborative filteringengine, yields an optimized trade-off between conflicting goals of privacy and

189

190 PAPERS (D1)

accuracy. We prove this concept along with applicability of our frameworkby experimenting with accuracy and privacy factors, and we show throughexperiment how such optimal set can be inferred.

12.1 Introduction

Adaptive Web and its myriads of techniques are paving the path towards fulfillingthe promise of alleviating classic problem of information overload. Recommenders,one of the most widely adopted and well-anticipated of this stack of technologies,remain the sole leader of this essential advancement. Recommenders intend to pro-vide people with suggestions of products they will appreciate, based upon their pastpreferences, history of purchase, or demographic information [306]. Most successfulrecommenders employ well-known collaborative filtering (CF) techniques [148]. CFautomates the word-of-mouth process, when asking like-minded friends or familymembers for their individual opinions on different matters like new movie releases.This process involves finding users similar to the user receiving the recommendationand suggesting her items rated high in the past by similar taste users. Since thereare always numerous items and the ratings scored by users are sparse, often thestep of finding similar users fails. To alleviate this shortcoming, the former stepwas replaced by utilizing a trust metric, which enables a trust-based heuristic topropagate and spot users whom are trustworthy with respect to active user (a) thatwe are gathering recommendations for [144]. Recommenders that take advantage offusion of interpersonal trust with CF heuristics within and across their architecturesare collectively referred to as Trust-Aware Recommender Systems [237]. Privacyremains a foundational problem in personalization research. In general CF systemsusually fail to protect users privacy. Users who remain concerned about their pri-vacy might use false data in the process. Using false data decreases accuracy of CFsystems [182]. Users who are concerned about their privacy may employ false data,because data collected for CF can be used for unsolicited marketing, governmentsurveillance, profiling users, misused, and it can be transferred [287]. As a matterof fact, it is more likely that users will give more truthful data if privacy measuresare provided. Massa and Avesani [237] study the architecture and design of trust-aware recommender systems and describe how trust-aware recommenders alleviateshortcomings of traditional systems.

Trust-aware recommenders are modeled and designed in a decentralized fashion.However, current implementations are either centralized or are not tested in adecentralized fashion [237]. As a result, there is a growing concern about the vul-nerability of these systems to shilling attacks [41]. At the same time, as mostresearch invested in analyzing trust-aware recommenders, focuses on improving therecommendations, they fail to clearly address the privacy issues surrounding thearchitecture and components of these systems. As a result this research work in-vests on dealing with privacy issues surrounding the architecture and components


of trust-aware recommender systems.

To address these shortcomings, during the pace of this manuscript, we extend thearchitectural landscape of traditional CF techniques and trust-aware recommendersto include building blocks required for realizing a privacy-preserving trust-awarerecommender system. As an example of such architecture, we implement a frame-work for applying data perturbation techniques to user rating profiles. To do this,we introduce a private trust computation process. Then, accordingly, we proposemethods for producing private recommendations based on trust-based CF recom-mender systems. We ground this framework at the top of a social trust recom-mender system [388], which utilizes T-index [118, 119] as its trust metric. We willshow how the overall trust estimation can be augmented to accommodate the pri-vate trust estimation and prediction generation. We design this framework, havingprotection and preserving users privacy in mind, while still providing accurate rec-ommendations on masked data using trust-enabled CF schemes. We conceptualizethis trade-off between accuracy and privacy as a Pareto frontier notion. We willshow that privacy and trust mechanisms, each with their respective configurationsjointly form configurations of the overall framework. According to Pareto optimal-ity perspective, at least a joint setting of both configurations exists which whenutilized results in privacy of user data being maintained, while keeping accuracydecent at the same time. To evaluate this framework, we study the accuracy of therecommendations under different masked distributions and compare the results ofthe computations with original data.

Our experiment results clearly show that the proposed scheme provides recom-mendations with decent accuracy while preserving users privacy. The rest of themanuscript at hand is organized, as follows: First, a background into the mainconcepts shaping the foundation of this work is presented. The architecture of thesystem is presented in the third section, followed by a detailed description of theapproach. Experimental evaluation is presented in the forth section, followed by adiscussion of results. Finally, a conclusion and future work brings this work to itsrespective end.

12.2 Background

Trust-Aware Collaborative FilteringCF algorithms generally make recommendations based on similarity between theusers tastes. Similarity measure is not sufficient when user rating scores are sparseand insufficient. In the face of these shortcomings, traditional user similarities deemuseless and recommenders need new ways to calculate user similarity. As a responseto this problem, interpersonal trustworthiness was proposed to replace old similar-ity measures. Ziegler et al. [403] describe a relationship between how similaritybetween two users can be interpreted as how much they might trust each other.

192 PAPERS (D1)

Golbeck [140, 144] shows the correlation between similarity and trust and how itcan elevate movie recommendation accuracy. Taking into account this fact, trustcan be considered as a measure for expressing the relationship between two usersin recommendation systems. ODonovan and Smyth [270] approach trust-awarerecommenders by utilizing a two-mode profiling model that documents the past be-havior of users. Massa and Avesani [234,237] present architecture for a trust-awarerecommender system in which trust can be prop agated and aggregated for all ofthe users in a social network setting. Lathia et al. [214] model a variation of kNN(K-Nearest Neighbor) CF recommender, which allows users to learn who and howmuch to trust by evaluating the utility of the rating information they receive. Oneof the problems with frameworks presented above is that the functionality of previ-ous recommenders is dependent on availability of explicit trust ratings in betweenusers to infer other trust relations.

Zarghami et al. [388] introduce a decentralized trust-aware recommender system,which utilizes T -index [119], as a trust metric for filtering trust between users.Unlike previous approaches, a trust network between users can automatically bebuilt from existing ratings between users. Framework increases the probability offinding trustworthy users across the network by creating a Distributed Hash Table(DHT) like list of trustees, TopTrusteeList (TTL) [118] that wraps around theitems, which are tasted similarly to those of current user. Our work utilizes thisrecommender as the foundation of our framework.

Privacy-Preserving Collaborative FilteringPrivacy remains the most significant problem in the context of CF recommendationsystems. Canny [56,57] proposes privacy-preserving schemes for CF. In his schemes,users control their private data and they are capable of getting personalized refer-rals produced without disclosing their data. Canny proposes to use homomorphicencryption in his schemes to protect individuals privacy. Polat and Du [287] employperturbation techniques to offer predictions. In their scheme, users disguise theirprivate data before sending it to central server that collects masked data insteadof actual data. Kaleli and Polat [182] study how to produce predictions while pre-serving individuals privacy while producing naive Bayesian classifier (NBC)-basedprivate recommendations. They employ randomized response techniques (RRT) toprotect users privacy. Parameswaran [280] presents a data obfuscation technique inwhich she designs and implements a privacy preserving shared collaborative filter-ing framework using data obfuscation algorithm. Berkovsky et al. [26] investigate adecentralized approach, which does not require sending data to a centralized server.Collaborative filtering techniques can be employed in the context of peer-to-peer(P2P) and social networks. Kaleli and Polat [183] propose a solution to produceNBC-based private recommendations in a P2P network. Authors present a solutionto produce private referrals in a social network context [184]. Proposed solutionrequires using data disguising techniques. Within the context of our framework, we

12.3. RECOMMENDATION FRAMEWORK 193

have adopted this approach to provide private recommendations in the context of atrust network of users, where actual user profiles are masked and trust computationprocess and recommennation procedure are changed accordingly to produce privaterecommendations.

Preserving Privacy in Trust-Aware Recommender SystemsTaking measures for preserving privacy during trust calculation and computationhas been of great importance. Lack of privacy protection within the context ofsystems dealing with trust and reputation, can ease attacks by malicious insiders,as they might infest the existing trust establishments or alter the trust computationresults. As a result, great deal of research has been invested in analyzing schemesfor combining privacy with trust establishments in different fields. In Multi-AgentSystems, preserving privacy during trust negotiations between software agents inany open system is a crucial task because sensitive data is exchanged betweenstrangers without any prior knowledge of each other [324, 328]. In P2P systems,similar concern for privacy is raised about the possibility that malicious users canexploit the peers trust network to spread tampered-with resources [211]. In thecontext of recommender systems, Lam et al. [211] give an overview of privacy andsecurity problems with recommenders. These problems are twofold: the personalinformation collected by recommenders raises the risk of unwanted exposure andmalicious users can bias or sabotage the recommendations that are provided toother users [254]. While former points out to privacy of recommenders, the latter iscollectively referred to as Shilling attacks [212]. Attacks on recommenders remaina significant security hole in these systems [211,237]. As popularity of trust-awarerecommenders in academic and industrial community increases, problem with at-tacks on trust-enabled recommenders remains at large. Zhang [392] executes anaverage shilling attack on a trust-aware recommender system and demonstratesthat trust-recommender exhibits more stability over a traditional kNN -based rec-ommender.

Our framework is designed with idea of being capable of withholding shilling attacksin mind. As the main focus of this work is on implementation and design of aprivacy-preserving trust recommender, we leave the analysis of framework stabilityunder different attacks for the future work.

12.3 Recommendation Framework

In this section, we present the framework that we have composed for building aprivate trust-aware recommender system. To do so, first, a brief introduction intothe architecture and design of our trust-aware recommender is presented in thefirst section, while in the second section; we describe how this architecture can beextended with components needed to build a private trust recommender. This isfollowed by description of the process of trust estimation and prediction generation

194 PAPERS (D1)

of our resulting system. In the last section we present the definition of optimalprivacy set and the process of how to infer this set, with respect to the context ofthis work.

Architecture of a Private Trust-Aware Recommender SystemMassa and Avesani [237] present a generic architecture for a trust-aware recom-mender system. This architecture is presented in Fig 14.1. In this architecture,gray boxes present modules, while white boxes represent the matrices used as inputand outputs of algorithms. Typical inputs of the architecture are: rating matrix(rating scores assigned to items by users) and trust network [101] (trust statementsof users with respect to each other). While rating is the main input of traditionalCF recommenders, trust can be inferred and in our case, automatically generatedout of the rating matrix. In this architecture, they visualize the anatomy of a tra-ditional CF recommender being composed of two main building blocks: a similaritymetric and a rating predictor.

Figure 12.1: Architecture of a Private Trust-Aware Recommender System. Trust-aware recommenders (top box) can be extended with privacy (bottom box) toenhance traditional similarity driven collaborative filterers (middle box). Compu-tational modules are depicted in gray boxes, while inputs and outputs are depictedin white. Architecture is adapted from Massa and Avesani [237].


Similarity metric, helps finding similar users (or neighbors), which is typically Pear-son Correlation Coefficient (PCC) [41]. Rating predictor module predicts ratingsbased on a weighted sum of ratings given by similar users to the items [237]. Archi-tecture of a trust-aware recommender is made up of a trust metric. The differencebetween two architectures is made in how neighbors are discovered and how theirweights are identified. This can be done through similarity module or throughtrust metric module. The combined output, estimated trust network along withuser similarity, can be used to generation ratings predictions.

To introduce the notion of privacy within this architecture, we need to justify whatwe mean by privacy first. As we construct a trust network of users for propagationand aggregation of trust values within our framework, we propose for adoption ofnotion of social privacy across the network of users. To define privacy in our terms,we approach the notion of privacy in following terms:

Any user a, who wants prediction [in a Trust Network] does not haveto reveal her rating vector during recommendation process and otherusers in the recommendation process cannot learn any rating value of aand the rated and/or unrated items of user as rating vector.

Taking into account this adaptation of privacy, we have extended existing architec-ture in Fig14.1 to include the architecture of a private trust aware recommendationsystem. As depicted, a privacy-preserving trust-aware CF can be composed of twomain modules: a private trust metric and a private rating predictor. This architec-ture takes a masked rating set as input and generates a private trust estimationwhich is used by a private rating predictor, which in turn combined with ratingprediction module from the pure CF step can generate predicted rating matrix.

Obviously, a private trust recommender is actually composed of both pure andtrust-enhanced recommendation modules and inputs. To understand how this ar-chitecture can be realized, we adapt this architecture onto our recommendationframework, presented in previous section. In our framework, the private trust met-ric is realized through data disguising and private trust calculation steps. To achieveour privacy aim, we propose to use z-scores of user ratings instead of their actualpreferences. The z-score [287] of an item indicates how far and in what direction,that item deviates from its distribution’s mean, expressed in units of its distribu-tion’s standard deviation.

In this work we utilize z-score transformation for normalizing data. Since z-scorevalues have zero mean, we can hide their value by adding random numbers from adistribution with zero mean and predefined standard deviation. As a result, userswill all make computations with their z-scores instead of their actual ratings. Toimprove privacy level, we propose to hide unrated items of users, too. Users fill f

196 PAPERS (D1)

(related with users density) percent of their unrated items with random numbershaving the same perturbation properties as employed for z-score disguising. Sincehaving rating for an item shows that user has purchased this item, to hide whichitems are really purchased by any user, users fill f percent of their unrated itemswith random numbers.

The flow of data through the architecture is as follows:

At first, original rating profiles are masked. The decentralized protocol for datadisguising is presented in the next section. The matrix generated from data dis-guising step is fed into the private trust module then. To realize the private trustmodule, at first, the trust is formalized to adapt to calculation with respect toz-scores. This is followed by disguising z-scores with randomness values. In ourapproach, users will apply the protocol described in following section to disguisetheir private vector during the process of trust estimation and computation. Whenusers finish calculation of z-scores and data disguising, we can compute the trustbetween them. Within this framework, we have adopted Neil Lathias [214] trustformalization for calculating interpersonal trust. The process of calculation is donein a decentralized fashion on each users side. At the end of this step, trust valuesare returned and stored in the trust network. After this step, we use the privaterating predictor module to produce final predictions, which generates the user-itemratings matrix, as the output of the recommender. To do so, we have adopted thePCC [41] along with the interpersonal trust values between users from previousstep and generate referrals. Since users get normalized values from this output, theresults are de-normalized. These steps are explained in more detail in the followingsection.

Data Disguising and Private Trust ComputationOnce any user a, requests a recommendation from social recommender system, aneed to disguise his/her rating vector to protect his/her own privacy. Thereforea firstly normalizes his/her rating vector and add randomness to normalized data.In our scheme users follow below protocols and processes for perturbing their data,estimating private trust, and producing private predictions.

Data Disguising Protocol

The procedure for data disguising is as follows:

1. Each user computes their ratings z-score values.

2. Each user u selects a β value and then they uniformly randomly select stan-dard deviation of the random numbers (σu) over the range [0, β]. Theyalso compute number of rated items (numrat) and number of unrated items(numunrat) in their rating vectors.


3. Each user computes her density value d and she selects a random integervalue f showing the percentage of unrated items to be filled between 0 and anumber associated with d such as d/2,d, or 2d. Each user u randomly selectsf percent of their unrated items.

4. Users can utilize uniform distribution or Gaussian distribution to generaterandom numbers. To select the distribution, users decide a θ value over therange [0, 1] and they uniformly randomly select a random number ru over thesame range with θ.

If ru ≤ θ, the users generate random numbers having uniform distributionhaving interval [−σ, σ], σ can be 1, 2, 3 or 4.

Otherwise, they use Gaussian distribution with zero mean and standarddeviation σu .

5. After selecting distribution, each user generates (numrat + numunrat ∗ 100)random numbers having zero mean and σu. To disguise rated items, each useradd numrat of random numbers to rated items z-score values and they fill frandomly selected numunrat ∗ f

100 unrated items with other random numbers.

6. Each user saves their masked z-score vectors.

Private Trust Estimation

As mentioned private trust module allows us to generate private trust values. As-sume there are two users; ua and ub . We formalize the trust between them asfollows:

z(u, i) = Ru,i−Ru

σu(12.1)

Where Ru,i is the true rating of user u on item i, Ru is the mean rating of user u,σu is the standard deviation of user u’s ratings and z(u, i) is the z-score value ofuser u on item i.

z′(u, i) = z(u, i) + ru,i (12.2)

Where ru,i is the random number generated by u to disguise z -score of item i andz(u, i) is the masked value of z(u, i). When users finish calculating z-scores anddata disguising, they compute trust among other users using 12.3:

T (ua, ub) = 1−∑

i=1nz′ua,ii

,z′ub,ii

z′max∗n(12.3)

This equation is an adapted formalization of trust proposed by Lathia et al. [214],which is based upon difference of a users rating and its recommenders rating to theircommon item(s). Here T (ua, ub) is the estimated (private) trust between respectiveusers; ua, and ub , z′ua,ii

is the masked z-score of user ua for item ii and zmax ismaximum masked z-score.

198 PAPERS (D1)

When ua and ub computes trust, they follow the steps below:

1. ua and ub decide which half they will operate on.

2. ua and ub send the parts that they will not operate on to each other.

3. When users receive related part of other users vector, they compute sub-resultof trust using Eq. 12.3.

4. Each user sends her sub-result to other user.

5. They compute trust value between each other by summing up sub-results.

Private Recommendation Prediction Process

To produce recommendations, 12.4 can be used. Since, z-scores are used instead ofactual ratings in our scheme, when users finished computing trust; they use 12.4 toproduce referrals, as follows;

p(a, i) =∑

b∈N(a,i)z′b,i∗T

′(a,b)∑b∈N(a,i)T ′(a,b)

(12.4)

Users get a normalized rating value when they use Eq. 12.5. To obtain actualrating value, users need to de-normalize result of Eq. 4 by using Eq. 5.

p(a, i) = R̄u + σu ∗ p (12.5)

Where p(a, i) is the de-normalized prediction for user a and item i, R̄u is the meanrating for user u, σu is the standard deviation of user us ratings and p is the referralvalue from previous step.

Defining and Inferring Optimal Privacy SetIt is accepted that privacy and accuracy are conflicting goals in the context of per-sonalization and Collaborative filtering recommenders [403]. This conflict becomesmore imminent in the presence of trust. Utilization of interpersonal trust aims atincreasing [392], or maintaining the overall accuracy [211].

Trust metrics along with other factors such as neighbors list size at each step oftrust estimation increase or maintain the accuracy of predictions. This is while in-creasing the amount of perturbations leads to further information loss. To protectthe private data, the level of perturbation is vital. If the amount is too low, themasked data still discloses considerable amounts of information; if it is too high,accuracy will be very low [403].

If we take into account the configurations that affect the privacy mechanism atone hand, and take into account the configurations affecting trust in another hand,


we can argue that an optimal setting can be defined where privacy and accuracycan be both maintained at the same time. From the perspective of achieving anoptimal result, problem space can be seen as an optimization design space. Withinthis design space we have j real parameters corresponding to trust mechanism con-figurations, while we have k different criteria corresponding to privacy mechanismconfigurations. In this space we take privacy enhancing mechanism as a function p,which generates privacy configurations set. For example, as we have used pertur-bation to protect private data, these operators become the distributions we haveutilized for adding perturbations to the user rating profiles. We refer to this set asa Privacy Configuration Set (PCS):∏

i∈Ipcsi = {p : I → ∪i∈Ipcsi | (∀i)(p(i) ∈ pcsi)} (12.6)

Since in theory, we can have an infinite number of parameters, we consider that ateach time only j parameters are taken into account:

Φj :∏i∈I

pcsi → pcsj (12.7)

Φj(p) = {(pcs1, pcs2, . . . , pcsj) | pcs1 ∈ Φ ∧ pcs2 ∈ Φ . . . ∧ pcsj ∈ Φ} (12.8)

In a similar fashion, trust enhancing mechanism, can be taken as a function t,which generates trust configurations set. For example, configurations that ourtrust-aware recommender uses to enable social network mediated trust inferenceare trust metric and the size of the trust lists, at each step. We refer to this set asa Trust Configuration Set (TCS):∏

i∈Itcsi = {t : I → ∪i∈Itcsi | (∀i)(t(i) ∈ tcsi)} (12.9)

Since in theory, we can have an infinite number of parameters, we consider that ateach time only k parameters are taken into account:

Ψj :∏i∈I

tcsi → pcsj (12.10)

Ψk(p) = {(tcs1, tcs2, . . . , tcsk) | tcs1 ∈ Ψ ∧ tcs2 ∈ Ψ . . . ∧ tcsj ∈ Ψ} (12.11)

Since we study and analyze configurations from both mechanisms at once, then weneed to make a joint set containing members from both sets. As a result we definean ordered set composed from Cartesian product of all privacy configuration sets(PCS), and trust configuration sets (TCS), as follows:

ψ = Φ× Ψ (12.12)

ψ = {(pcs1, pcs2, . . . , pcsj , (tcs1, tcs2, . . . , tcsk) | (pcsj) ∈ Φ ∧ (tcsk(∈ Ψ} (12.13)

200 PAPERS (D1)

As the goal is achieving acceptable accuracy and respective privacy at the sametime then optimization problem becomes multi-objective. As a result, problem ofachieving a trade-off between accuracy and privacy in the current context becomesa Pareto optimization problem. Taking into account this fact, we define an OptimalPrivacy Set (OPS) as follows:

Definition 1. Let ψ be the set of all possible joint configurations.There exists a set ψi, in which all possible joint privacy and trust con-figurations achieve a decent privacy and accuracy at the same time,in comparison to ψ∗i , which is the other possible joint configurations.Such set exhibits Pareto optimality. We refer to this set as an OptimalPrivacy Set (OPS):

ψj � ψ∗i (12.14)

In other words, among all possible configurations we can always find atleast one setting that can either maintain or improve privacy, in the faceof accuracy loss. To find such set, following heuristic can be adopted:Heuristic. To infer OPS, following heuristic is used:

1. Perturbing the overall user data using different PCS settings;2. Observing the framework under variations of TCS;3. Perturbing the sparse user data with PCS inferred from step 2

allows for inferring OPS and finalizing the Pareto optimal setting.

In the evaluation section, through experiment we show how such set is inferred andjustified as the optimal result, which respects the trade-off we are trying to achieve.

12.4 Recommendation Framework Evaluation

To evaluate our framework, we have conducted two sets of experiments: First setdemonstrates the effect of insertion of random data on accuracy of predictions gen-erated as output of the recommendation system. The second set of experimentsdemonstrates how filling unrated items with varying f values affect the overall ac-curacy of recommender system. At the end, we define and infer the optimal privacyset with respect to experiment results. To measure the accuracy of recommendationsystem, we have utilized MAE (Mean Absolute Error) as respective metric. MAEmeasures the average absolute difference between predicted rating score made for aspecific user and the users actual rating [164]. For these experiments we have usedpublic MovieLens dataset [308]. This dataset contains 943 user rating profiles, withmore than 100000 rating values. Rating values are on a 5 point scale. For the firstexperiment part we have divided the profiles into 80% of data for training purposeand 20% for testing purposes. For the second part we have used 60% of data fortraining purpose and 40% for testing purpose.

12.4. RECOMMENDATION FRAMEWORK EVALUATION 201

Accuracy under Overall Masked User DataWe have masked users profiles with different random numbers having Gaussiandistribution and Uniform distribution to show effects of distributions on accuracy.To setup the experiment we change two set of parameters: parameters that affectthe data disguising operations, and parameters that affect the overall private trustcomputations. With respect to former, we have tried changing β and σ values whilewith respect to latter we have tried changing the t (trust metric value), n (neighborslists size). To compare the results under masked data with results without maskeddata, MAE for variations of t and n are presented in Fig. 12.2.

Figure 12.2: MAE of recommendation framework, without adding any perturba-tions [119].

Results of experiments with perturbation on MAE are depicted in Fig.12.3. Fig12.3 plots the effects of random data on MAE with Gaussian distributions (12.3),and with Uniform distributions (12.3). In the case of Gaussian distribution we haveselected 0.5, 1, 2, 4 for β as respective values, and for uniform distributions we haveselected 1, 2, 3, 4 for δ as respective values. With respect to trust metric t valuesare selected from 0, 100 and for neighbors lists we have tested with list sizes of 2,3, and 5. In both plots, horizontal axes depict the possible intervals for differentdistributions of β and δ . Results of MAE experiments clearly state that if we uti-lize Gaussian distribution for random numbers the higher the β values, the betterprivacy is achieved and this is due to increasing randomness. We can witness thetrade-off here: The higher the δ values the more accuracy we will lose. Results ofUniform distributions also confirm this observation. If we utilize Gaussian distri-bution for random numbers the higher the δ values, the better privacy is achieved,and the higher the δ values the more accuracy loss we have. With respect to n, wecan observe in MAE results that are neither too high and nor too low values forn neighbors list size can give us decent results. This is also the case for t where

202 PAPERS (D1)

lowest value (t = 0), doesn’t give uniform and Consistent result, while highest valuefor t(t = 100) yields more reasonable MAE results. Overall observation of MAEstates that Gaussian distribution seems to be better than uniform distribution foraccuracy but they are both useful with selected appropriate β and δ values.

Figure 12.3: Effects of adding perturbations on MAE, having Gaussian distribution(a), and having Uniform distribution (b), to user data.

12.4. RECOMMENDATION FRAMEWORK EVALUATION 203

Accuracy under Sparse Masked User DataTo show effects of filling unrated items with random numbers, we have performedexperiments with varying f values.

Figure 12.4: Filling unrated items with random data having Gaussian distributionwith respect to f

In these experiments random numbers having Gaussian distribution with zero meanand standard deviation β = 1 were used. We selected f values from the intervals[0, d/2], [0, d] and [0, 2d]. With respect to f we have depicted resulting MAE inFig.12.4. We can observe from the results that filling unrated items with randomnumbers provides better privacy, but it decreases accuracy as expected. Also whenwe increase possible f ’s interval, we achieve higher privacy level.

Analyzing Trading-Off between Privacy and AccuracyNow that the base experiments were presented, we can take into account the privacyand accuracy metrics of current system to define and derive an optimal settingwhere system exhibits a transparent outcome. Considering the parameters fromboth privacy and trust mechanisms, Φ and Ψ are defined as follows:

Φ = {(β, δ, f)β ∈ [1, 5], δ ∈ [1, 5], f ∈ [0, 2d]} (12.15)

Ψ = {(n, t)n ∈ [1, 5], t ∈ [0, 100]]} (12.16)

Adopting the formalization introduced earlier, an optimum configuration is a jointsetting of Ψ = (β, δ, f, n, t) through which we maintain accuracy and privacy at thesame time. As a matter of fact, following the heuristic presented earlier;

204 PAPERS (D1)

1. First we perturb the overall user data using Gaussian and Uniform distribu-tions (δ, β), by comparing the results of MAE of framework under maskeddata 12.3, we can observe that set of (δ, β) = (1, 1) yield best results as itexhibits the minimal privacy loss. As a result we fix beta = 1, δ = 1 for thenext step.

2. In this step we observe the framework under variations of (n, t): With respectto this step, by comparing the results from MAE of framework under maskeddata (Fig.12.3), we observe that set of (n, t) = (3, 100), while being fixed on(δ, β) = (1, 1), yields reasonable accuracy, while privacy is maintained. So wefix the current set to (δ, β, n, t) = (1, 1, 3, 100).

3. In the final step we perturb the sparse user data with (δ, β, n, t) inferred fromprevious step for fine-tuning the privacy. To do so we utilize of different inter-vals of f with the system being fixed on (δ, β, n, t) configuration from previousstep. Through observation of consistent accuracy of different f intervals, wecan fine-tune the configuration from previous step and infer an optimum pri-vacy configuration. Taking into account the results (Fig. 12.4), we observeconsistent increase in intervals of f which finalizes the choice of δ, β, n, t andfinalizes the results in ordered set of n = 3, t = 100, δ = 1, β = 1 and f =[0, d] supporting both accurate and private recommendations:

Ψ(β, δ, f, n, t) = (1, 1, 3, 100, 0, d) (12.17)

Considering the existing range of φ configurations, experiment showed that Paretooptimality holds. These results were inferred with framework under masked userdata. To make sure that optimum result maintains the Pareto optimality effect,we compare the MAE results of non-masked framework (Fig.12.2) with frameworkunder masked results (Fig.12.3). In our work we inferred the optimum values for β= 1, n = 3 and t = 100 and for these parameters MAE = 0.7994, while for similarparameters without adding perturbations we achieve MAE = 0.881, which clearlyshows that Pareto optimality holds, while it also shows that we have increased theprivacy of the base framework with our architecture. Further observation shows thatour MAE results are still less than results of MAE without adding perturbations.According to (Fig.12.2), we achieve the best results with MAE = 0.863 for (n, t) =(50, 100) and this value is still greater than our optimum value. This observationalso states that our result with proposed framework still shows better accuracy thanthe base framework.


In this paper we proposed a framework for addressing the problem of privacy intrust recommenders. To overcome this obstacle, first we introduce a framework

12.6. ACKNOWLEDGEMENT 205

for enabling privacy-preserving trust-aware recommendation generation. After in-troduction of architecture, its building blocks and protocols, we pointed out theconflicting goals of privacy and accuracy. Within this context, we showed thata Pareto set can be always be found which can make a trade-off between theseconflicting aims and we presented a heuristic that experimentally infers this set.Through experimentation with predictive accuracy of private trust recommendersystem, we showed that we can infer such setting that holds even when trust rec-ommender is not under privacy measures. We also showed that privacy increasesunder proposed framework, while even optimal privacy of our framework is betterthan the best performance of base framework in its best configurations. As a resultprivacy can be introduced in trust recommenders and can be optimized to avoidprivate data loss and at the same time produce accurate recommendations.

As future work, we plan to strengthen our framework against shilling attacks. Wewill investigate how to extend our scheme when data is collected by a central server.

12.6 Acknowledgement

This work was partially supported by grant number 621-2007-6565 funded bySwedish Research Council and Grant 108E221 from TUBITAK.

Chapter 13

Modeling and Measuring Trust inTopical Recommender Systems

(original version)R. Krestel and N. Dokoohaki,Diversifying Product Review Rankings: Getting the Full Picture, 2011 IEEE/WIC/ACMInternational Conferences on Web Intelligence and Intelligent Agent Technology(WI-IAT ’11), IEEE Computer Society, pp. 138-145, Aug. 2011.

(extended version)R. Krestel and N. Dokoohaki,Ranking Product Reviews, Regular Issue of ACM Transactions on Intelligent Sys-tems (TIST), ACM Digital Library, Sep. 2012 (Submitted for Review).

207

Ranking Product ReviewsRalf Krestel1, Nima Dokoohaki2

2Software and Computer Systems (SCS), Information and Communica-tions Technology (ICT), KTH - Royal Institute of Technology, Forum120, 16440- Kista, [email protected] of California, Irvine, [email protected]

Abstract

E-commerce Web sites owe much of their popularity to consumer reviewsprovided together with product descriptions. On-line customers spend hoursand hours going through heaps of textual reviews to build confidence in prod-ucts they are planning to buy. At the same time, popular products havethousands of user-generated reviews. Current approaches to present them tothe user or recommend an individual review for a product are based on thehelpfulness or usefulness of each review. In this paper we look at the top-kreviews in a ranking to give a good summary to the user with each reviewcomplementing the others. To this end we use Latent Dirichlet Allocation todetect latent topics within reviews and make use of the assigned star ratingfor the product as an indicator of the polarity expressed towards the productand the latent topics within the review. We present a framework to cover dif-ferent ranking strategies based on the user’s need: Summarizing all reviews;focus on a particular latent topic; or focus on positive, negative or neutralaspects. We evaluated the system using manually annotated review data froma commercial review Web site.

13.1 Introduction

It has become a routine among on-line and off-line consumers to inform themselveson review Web sites before purchasing a certain product. This has given rise to aconsiderable amount of customer reviews on e-commerce Web sites. To this endpotential customers usually browse through a lot of on-line reviews in order to

209

210 PAPERS (F1)

build confidence in a particular item prior to purchasing it. While reviews havebecome an important factor in helping Web crowds to further assess the qualityof products on-line, increasing volume of reviews themselves has led to an infor-mation overload [80]. Popular products have thousands of reviews. While excessof reviews is a growing problem, recommending unbiased and helpful reviews is agrowing research field. The quality of reviews may vary drastically [275] and mightmislead potential buyers. Such humongous amount of information not only dis-tracts the confidence seeker, it might hinder the original goal of users in the firstplace: They will give up buying a certain product. To deal with these problems,review recommendation techniques are proposed. Review recommendation involvesimplementing machine-learning techniques for analyzing the product reviews basedon their lexico-semantic features in order to classify the reviews and recommendbalanced and useful reviews to the readers.

While review recommender systems aim at automatically classifying reviews,some commercial Web sites such as Amazon and TripAdvisor1 approach this prob-lem by allowing users to rate the reviews using star ratings to improve the rankings(e.g. this review was helpful vs. not helpful). There are two inherent problems tothese ranking based on user feedback: First, good objective reviews contain quitelikely redundant information and ranking them based on the helpfulness score willnot cover all aspects. Second, these Web sites do not take into account the personalbias. Not all reviews are helpful to everybody. Due to the fact that different usersput different emphasis on different aspects, (e.g. I don’t care about battery life, butreally need lots of memory), helpfulness can only be used to filter out very badreviews. Therefore, researchers are increasingly distinguishing between the task ofreview recommendation [4] and review ranking [135]. To improve existing reviewrecommendation techniques and at the same time improve the ranking used forevaluating helpfulness merits of existing reviews, we propose a novel approach tomodel and rank reviews. The two main components of our system rely on LatentDirichlet Allocation (LDA) to model the reviews and on Kullback-Leibler diver-gence to generate an adequate ranking. We make use of the assigned star ratingfor the product as an indicator of the polarity expressed in the review towards thelatent topics. Our framework covers different ranking strategies based on users’needs to adapt to various user scenarios. We currently support three strategies tosummarize all reviews, to focus on a particular latent topic, or to focus on positive,negative or neutral aspects. We evaluated the system using manually annotatedreview data gathered from a popular review Web site.

The main contributions of this paper are: (1) Introducing an algorithm to modelreviews using latent topics and star ratings. (2) Ranking of reviews to summarize allreviews for a product within the top-k results. (3) Diversification of review rankingsbased on star ratings and/or latent topics. The remaining of the paper is organizedas follows: We present related work in Section 13.2; Section 13.3 gives an overviewof our framework. Section 13.4 describes the modeling approach, while Section 13.5

1http://www.amazon.com and http://www.tripadvisor.com

http://www.amazon.com

http://www.tripadvisor.com

13.2. OVERVIEW OF THE FIELD 211

describes the ranking approach. We present the evaluation in Section 13.6 and closewith conclusions and future work.

13.2 Overview of the Field

With respect to our contribution as well as relevance to our work we divide thisoverview into two sections: first and foremost, review mining and summarization,followed by ranking and diversification of reviews.

Mining and Summarizing ReviewsExisting literature on review summarization techniques have a large focus on reviewclassification, summarization and recommendation. While reviews in general havebeen the focus of majority of works done in this field, a breed of new work focus onopinion mining in general while taking online reviews as case studies.

Generally, review recommendation techniques are seen as an explanation [339,340] or classification problem [274]. O’Mahony et al. [274] give an overview overexisting machine learning methods for review recommendation. Kim et al. [192] useSVM regression on structural, lexical, syntactic, semantic, and sentiment featuresto classify reviews, and stated that helpfulness is very dependent on the length ofa review, its unigrams and score. Liu et al. [225] show that helpfulness of moviereviews are expertise and time dependent. While the majority of existing workutilize text categorization techniques for recommending reviews Harper et al. [155]train their classifier according to features relating to question categories, text cate-gorization and social networking metrics. Credibility assessment was considered byWeekamp et al. [371] taking into account features such as timeliness of posts, postlength, and spelling quality in topical reviews. Review summaries include struc-tured summaries of review text that provide an organized breakdown by aspectsor topics, and various formats of sentiment and sentence summaries [35]. Varioussummaries formats complement each other by providing a different level of under-standing. For instance, sentiment prediction on reviews of a product can provide avery generic picture of what the users feel about the product. While If user requiresmore specific details, then the topic-based summaries or sentence/sentiment sum-maries could be more useful. Two state of art studies about opinion summarizationby Pang and Lee [279], and Liu [222] give a broad overview of the field. Bothsurveys cover previous as well as current work but their focuses vary. Followingtheir backgrounds, review summarization falls under subjective classification [379],sentiment analysis [77], or under traditional text summarization. While researchersdifferentiate between review summarization methods and classic text summariza-tion techniques [398], the connection is obvious. Both aim at identifying salientinformation: terms, sentences, or paragraphs. Sentiment analysis techniques tryto produce a summarized sentiment consisting of sentences from a source docu-ment, a single paragraph [22], a structured sentence [168], attribute-value pairs,or just a sentiment score. To build summaries of sentence list structures, Hu and

212 PAPERS (F1)

Liu [168] introduced a method utilizing word attributes such as frequency of oc-currence, part-of-speech tagging and WordNet synsets. Following this approachfeatures are extracted, combined with their contextually close words, and finallyused to generate a summary by selecting and re-structuring the sentences follow-ing the extracted features. Another approach called Opine [290] uses relaxationlabeling to find the lexico-semantic orientation of words, whereas Pulse [126] usesbootstrapping to train a sentiment classifier using features extracted by labelingsentence clusters with respect to their key terms.

In a more recent study, Kim et al. [35] give a multi-perspective classificationof approaches to opinion summarization. They classify existing approaches un-der two main categories: aspect oriented summarization and non-aspect orientedsummarization. The most common category of opinion summarization techniqueis aspect-based opinion summarization [168,225,290,341,390], which involves gen-erating opinion summaries containing a set of topics (also known as aspects orfeatures). Aspect-based summarization involves three steps: feature identification,sentiment prediction, and summary generation. For instance a summary of âStar-bucksâ, can help to identify topics such as âcoffee tasteâ, âatmosphereâ, âpriceâ,etc. Non-aspect oriented summarization contains the rest of approaches that couldnot be categorized including: sentiment summarization [77], basic and advancedtext summarization [22, 191] and entity-based summarization [331]. In our workwe mine and summarize reviews by choosing complementing reviews and rankingthem according to different strategies. The product ratings serve as an indicator forthe sentiment, and the extracted latent topics ensure topical coverage of relevantaspects.

Ranking and Diversifying ReviewsThe problem of personalized ordering of results has been subject to research in bothclassic retrieval of documents as well as increasingly popular recommender systemsresearch. A first approach based on maximum marginal relevance (MMR) [60] wasused as a ranking metric which balances relevance as the similarity between queryand search results with diversity as the dissimilarity among search results. Ziegleret al. [403] take into account a user’s full range of interests through diversifyinggenerated recommendation lists and by doing so they minimize redundancy amongthe recommended items. Reranking methods are mainly used for diversifying searchresults. Radlinski and Dumais [297] use a log-driven query reformulation with focuson personalized search results. Chen and Karger [78] introduce a Bayesian rerankingmethod to maximize the coverage of various semantics of an issued query among top10 results visited before. Zhai and Lafferty [389] introduce subtopic retrieval thatconsiders dependencies between search results. They use statistical models to modeluser preferences as loss functions and the retrieval process as a risk minimizationproblem. Sanderson et al. [312] consider diversity in image search results and theystudy the relation between precision and result diversity.

Recent approaches to diversification balance relevance with diversity, though

13.2. OVERVIEW OF THE FIELD 213

they differ in estimation of relevance and similarity, and choice of diversification ob-jective. Gollapudi and Sharma [149] consider existing approaches to diversificationas variants of facility dispersion. They analyze and evaluate various diversificationobjectives such as MaxSum, MaxMin and MonoObjective. Wang and Zhu [363]introduce an approach for search result diversification adopting the modern portfo-lio theory of finance. They generalize this well-known principle by maximizing therelevance of top-k as well as minimizing the (co-)variance of the results. A greedyalgorithm is used for ranking search results such that relevance is maximized whilevariance is minimized. Rafiei et al. [298], introduce a similar framework based onPortfolio Theory for reranking Web search results. The problem of result diversifi-cation is also investigated in the area of structured data queries. Agarwal et al. [5],classify queries and results to categories of the ODP taxonomy, and diversify resultsby maximizing sum of categories covered by top-k results, weighted by the proba-bility of categories given the query. Recommending a set of items to a user, as wellas returning a query results have been subject to result diversification as well. Veeet.al. [352] propose an algorithm for finding a representative, diverse set of top-kresults for a given query. All attributes of an object are ordered according to theirpriority for diversification by a domain expert. Demidova et. al. [94], introducean approach for diversifying queries against structured databases based on theirschema rather than diversifying the results. Tong et al. [345], approach diversifi-cation of ranking from optimization point of view on a graph. First, they proposea goodness measure for a given top-k ranking list.Goodness measure captures therelevance between each individual node in the ranking list and the diversity amongdifferent nodes in the ranking list. Raman et al. [301] propose an online learningmodel and algorithm for learning rankings that balance relevance and diversity. Ineach step, the algorithm presents a ranking to the user. As feedback, the algorithmobserves the set of documents the user reads in the presented ranking. While mostexisting work focuses on the task of diversification of search results, there is alsosome recent work on review mining. Yu et al. [383] look at ranking aspects ofreviews. The aspect ranking algorithm identifies important aspects by taking intoaccount the aspect frequency and influence of consumersâ opinions given to eachaspect. When evaluating sentiment classification and aspect rating, they reportbetter Kullback-Leibler divergence (KLD) compared to Hu [168]. Similar to ourwork, Xu et al. [381], state that two requirements should be taken into accountwhile generating a good summary: representativeness and diversity, in addition toaspect-relevance and sentiment intensity. They present an aspect-based summa-rization method for online reviews, that incorporates an aspect-sensitive Markovrandom walk model to satisfy the representativeness requirement, as well as a greedyredundancy to meet the diversity requirement. We propose a greedy algorithm tominimize the Kullback-Leibler divergence (KLD) between the language models ofthe top-k ranked reviews and all reviews for a product. KLD has, e.g., been usedas a similarity measure for audio files [318], while we use it to measure similaritybetween word distributions. In addition, we diversify review rankings based onlatent topics and language models to get an optimal coverage for all topics within

214 PAPERS (F1)

the top-k results.

13.3 How to Rank Reviews?

In contrast to Web search results, reviews for a product can not be ranked basedmainly on relevance since all reviews are supposed to be equally relevant for theproduct the review is about2. As discussed in Section 13.2, review recommendationor classification is a well-studied problem. However, most approaches don’t optimizea ranking of reviews but evaluate the reviews individually. Our goal is not to find thebest or most helpful individual reviews for a product but to find the top-k reviewswhich provide the user with a good summary of the opinions about a product.To this end, we model reviews using latent topics extracted with Latent DirichletAllocation (LDA) [36] and language models (LM) [288], a probabilistic bag-of-words representation. We combine these models with the assigned star ratingsassociated with each review. The ranking of the reviews is based on Kullback-Leibler divergence (KLD) to get an optimal summary of all reviews for a productwith the largest possible topical diversity. Our framework also allows to set adifferent goal when computing the optimal ranking, e.g. cover all positive aspects ofa product, or cover all sentiments associated with an aspect/feature of the product.In the following section we describe the conceptional architecture of the frameworkin more detail.

System OverviewReflecting the two conceptual steps of modeling the reviews and creating the rank-ings, our framework consists of two main components:

1. The LDA and LM component to model the review data

2. The ranking component to optimize the ranking based on different strategies

A graphical overview of the framework can be seen in Figure 13.1. To model thereviews of a product, the LDA/LM component takes all reviews written for thisproduct together with the associated ratings assigned by the review authors. Ourhypothesis is that users who assign five stars (on a five point Likert scale) mainlytalk about positive experience with the product or its features, whereas a reviewaccompanied by a one star rating indicates a review with rather negative points3. Tonot exclude the possibility that also in a 5 star review a minor negative point couldbe expressed, we use a matrix allowing to smooth the assignment of reviews to ratingclasses. Especially a 3 star rating can contain negative as well as positive aspectswhich can be modeled using this matrix. Based on the topic models for each review,

2We don’t consider in our setting obvious spam or fake reviews, but rely on the review platformto take care of these.

3In Section 13.6 we back up this hypothesis by analyzing user-assigned ratings and reviewcontent

13.3. HOW TO RANK REVIEWS? 215

Figure 13.1: Overview of the Review Ranking System: Reviews together withratings are used to extract topic distributions using LDA. Rankings are computedminimizing KL-Divergence with task-specific target distributions.

we then rank the reviews by minimizing the Kullback-Leibler divergence betweenthe aggregated reviews of the ranking and three other distributions depending onthe optimization strategy. After discussing the preprocessing steps in the nextsection, we describe the modeling approach in Section 13.4 and the ranking inSection 13.5.

Preprocessing

Since reviews are user-generated they may contain grammatical errors, sloppy lan-guage, or spelling errors. Therefore, preprocessing the raw data is an importantstep we briefly want to elaborate on. We used the Stanford POS Tagger [348] fortokenization and part-of-speech tagging. Then WordNet [121] was used to get thelemmas of the terms and afterwards all terms that are not verbs, adverbs, nouns,or adjectives are removed thus getting rid of most stopwords.

Since uni-grams might not give an accurate picture of what a review is aboutwe extract n-grams of variable lengths in the next step. Especially in the contextof product reviews, multi-term phrases are important to model the data, e.g. “Mi-crosoft Windows 7 Professional”, “not recommended”, or “graphic card”. Therefore,we partition our data into meaningful n-grams first. Based on the work of Deligneand Bimbot [93], we compute multigram models for the documents in our corpusthe following way: Each sentence is considered as a sequence of n-grams with vari-able length. The likelihood of a sentence is computed by summing up the individuallikelihoods of the n-grams corresponding to each possible segmentation of the sen-tence. This is done using a Viterbi-like algorithm to find the maximum likelihoodsegmentation. In an iterative fashion, we re-estimate and update the probabilities

216 PAPERS (F1)

“. . .Wish burger - also known as a veggie burger (no meat) Ketchup andMustard are actually available at In and Out.. just ask, its really easyDouble Meat - is la double double with no cheese Flying Dutchman. . . ”

wish.n burger.n also.r know.v veggie.n_burger.n meat.n actually.r available.aketchup.n mustard.n just.r ask.v really.r easy.a double.r_meat.n double.a_double.acheese.n flying.n_dutchman.n

Figure 13.2: Preprocessed Review Snippet: Original on Top; Segmented and POS-tagged on Bottom

until convergence. More details on variable-length n-grams can be found in Bimbotet al. [34].

As a result, all documents in our corpus of product reviews are segmented intovariable-length n-grams and the latent topics can now be based on n-grams orphrases instead of fixed sized units or single terms. Figure 13.2 shows a snippetof an original product review together with its preprocessed version without stopwords but with part-of-speech information and multi-grams.

13.4 Modeling Reviews

To model the review data we make use of language models [288] and probabilistictopic models [213] to extract latent topics within the review corpus. We combinethis information with the assigned star ratings for the reviews to cover positiveand negative statements associated with a particular latent topic. In the followingwe describe LDA and LM in more detail and in Section 13.4 we explain how wecombine the star ratings and the language model and topic model representations.

Finding Latent TopicsA product review usually covers different aspects or features of a product. Forexample, users have an opinion about the price of a product or the service of acompany. Instead of a fine-grained extraction of features and sentiment, as donefor instance by Bross and Ehrig [47], we rely on a statistical approach to findfeatures or aspects.

To identify the latent topics we employ Latent Dirichlet Allocation [36], whichmodels each review as a mixture of latent topics4. This probabilistic assignment ofdifferent topics to a single review allows later to identify topically similar reviews.Figure 14.3 shows the plate notation for Latent Dirichlet Allocation. LDA identifiesa given number of |Z| topics within a corpus of |D| documents. Each term t in a

4We use the LDA implementation in the Mallet library [240], which makes use of Gibbssampling to compute the latent topics.

13.4. MODELING REVIEWS 217

DZNd

α Θd z t Φj β

Figure 13.3: Plate Notation for Latent Dirichlet Allocation

review with Nd terms is associated with a topic z. Being the most importantparameter for LDA, the number of latent topics |Z| determines the granularity ofthe resulting topics, as we will see later. In order to find the latent topics, LDArelies on stochastic modeling.

The modeling process of LDA can be described as determining a mixture oftopics for each document in the corpus, i.e., P (z | d), with each topic described bymultigrams following another probability distribution, i.e., P (w | z). This can beformalized as:

P (wi | d) =|Z|∑j=1

P (wi | zj)P (zj | d), (13.1)

where P (wi | d) is the probability of the ith multigram for a given document d andzi is the latent topic. P (wi | zj) is the probability of wi within topic zj . P (zj | d)is the probability of picking a term from topic zj in the document.

With LDA at hand, we are able to represent latent topics as a list of multigramswith a probability for each multigram indicating the membership degree within thetopic. Furthermore, for each document in our corpus (reviews in our case) we candetermine to which topics it belongs, also associated with a degree of membership(topic probability P (zj | di)).

An example for two extracted latent topics represented by the top 10 terms isshown in Table 14.1. Beside the terms also the probability for the terms belongingto the topic are shown. For this example we used |Z| = 50 latent topics.

Building language models

In its simplest form, a language model for a document d with words wi can beformalized using a maximum likelihood estimate:

P (w | d) = c(w, d)∑wi∈d c(wi, d) (13.2)

218 PAPERS (F1)

Table 13.1: Top terms composing the latent topics “ticket” and “waiting” for Amer-ica West AirlinesTerm Prob. Term Prob.ticket.n 0.038 concourse.n 0.015voucher.n 0.027 miss.v 0.015clerk.n 0.016 take.v_off.r 0.015care.v 0.011 hour.n_late.r 0.012availability.n 0.008 change.n 0.009complain.v 0.008 delay.n 0.009look.v 0.008 flight.n_attendant.n 0.009nightmare.n 0.008 meeting.n 0.009suggest.v 0.008 not.r 0.009america.n_worst.r 0.006 reggie.n 0.009

where c(w, d) is the count of word w in document d. To prevent the probabilityPlm(w | d) from being zero in case word w does not appear in document d, varioussmoothing methods have been introduced and compared. The unsmoothed modelusing a maximum likelihood estimate shown in Equation 13.2 can be complementedby different components, with the Laplace smoothing being the simplest, adding 1to each count:

P (w | d) = c(w, d) + 1∑wi∈d c(wi, d) + 1 (13.3)

This allows us to model reviews as multinominal distributions either over terms(LM) or over topics (LDA). In the next section we describe how to incorporate theuser ratings into these probability distributions.

Combining Probability Distributions and Star Ratings

Each review d can now be modeled as a mixture of latent topics P (zi | d) or termsP (wi | d) . In the following we describe only the topic model case, for languagemodels the computation is analogous.

Together with the rating of each review r(d) we can transform the topic modelinto a topic-rating model by considering the topics for each rating class r ∈ R ={1, . . . , 5} separately:

P (z′k | d) =∑r∈R

mr(d)−1,r−1 ∗ P (zk mod |Z| | d), (13.4)

13.5. RANKING REVIEWS 219

where k = {0, . . . , |R| ∗ |Z|} and mi,j an entry in the rating smoothing matrix:

M =

0.6 0.3 0.1 0.0 0.00.4 0.5 0.1 0.0 0.00.0 0.2 0.6 0.2 0.00.0 0.0 0.1 0.5 0.40.0 0.0 0.1 0.3 0.6

(13.5)

The matrix defines how likely it is that, e.g. a negative review contains neutral orpositive aspects. This is also dependent on the dataset and the typical user ratingbehavior on a review platform.

All latent topics extracted by LDA are now represented individually for eachrating class. Each review is modeled as a topic mixture depending on its ratingwith some overlap according to the rating smoothing matrix. In the next sectionwe describe how to compute the reference topic models to compute the differentrankings corresponding to various strategies.

13.5 Ranking Reviews

Depending on the user’s information need, we define three ranking strategies:

1. Summary-focused Ranking (Section 13.5)

2. Sentiment-focused Ranking (Section 13.5)

3. Topic-focused Ranking (Section 13.5)

To compute these rankings we take the topic-rating models of the reviews computedin the previous step and try to minimize the distance between the aggregated top-kreviews and a strategy-dependent target distribution. We use a greedy algorithmto find the best review for each position in the ranking.

As a measure for how well the top-k reviews approximate the corresponding tar-get distribution we calculate the Kullback-Leibler divergence between the smoothedtopic-rating models for the top-k results and for the target distributions. Kullback-Leibler divergence estimates the number of additional bits needed to encode thedistribution U , using an optimal code for Q, and having a combined vocabularysize of |Z ′|; in our case the number of latent topics |Z| times the number of ratingclasses |R|.

DKL(U ||Q) = H(U,Q)−H(U) =|Z′|∑i=1

ui ∗ log2(uiqi

) (13.6)

In our setting, distribution Q is the combined topic-rating model of the top-kreviews and thus DKL(U ||Q) can be directly used to measure the similarity withthe target distribution (H(U,Q) is the cross entropy of U and Q and H(U) is theentropy of U).

220 PAPERS (F1)

DKL A B C1 A + B + C 0.5 0.3 0.7

Rank 1DKL B + A – B + C

2 A + B + C 0.2 – 0.3Rank 2 Rank 1

DKL – – A + B + C3 A + B + C – – 0.0

Rank 2 Rank 1 Rank 3

Figure 13.4: Example of the Greedy Algorithm to find a Ranking Summarizing theThree Reviews A,B, and C

Summary-focused Ranking

In most cases, users reading reviews are interested in getting an overview of theexperiences of other users with the product. A ranking which gives a good overviewsummarizes the views expressed in all reviews. The goal for a review ranking systemis therefore to approximate all reviews by the top-k in the ranking. Thus, the top-kreviews summarize the opinions about a product present in all reviews.

With the topic-rating models computed for each review we try to find a rankingof reviews that approximates the aggregated topic-rating models of all reviews for aproduct. This means we try to minimize the Kullback-Leibler divergence betweenthe top-k ranked reviews and the aggregation of all reviews.

To clarify the functioning of the greedy algorithm let’s consider a product withthree reviews represented by A, B, and C. We compute the aggregated topic-rating model A+B+C and measure the Kullback-Leibler divergence DKL for eachposition in the ranking. The example is shown in Figure 13.4.

Sentiment-focused Ranking

Instead of approximating all reviews, the sentiment-focused ranking tries to summa-rize only one particular class of ratings, for example negative aspects as representedby the topic-rating model with rating one. It could also be interesting to see whichfeatures of a product are discussed mainly in a neutral review or which aspectsare only discussed in positive reviews. Depending on the rating smoothing matrixaspects from reviews having a slightly different rating can influence the ranking.

The target distribution that we try to approximate with the review ranking inthis case is a (smoothed) uniform distribution over all topics for one rating. Thatmeans we get a diverse ranking covering all latent topics associated with a particularrating.

13.6. EXPERIMENTS 221

Topic-focused RankingCorresponding to the previous sentiment-focused ranking, we can focus the reviewranking on a particular latent topic. This allows to get all opinions – positive,neutral, and negative – about a certain aspect. This might be useful for users whoare interested in a particular feature of a product and the experience other usersreport in their reviews.

This type of ranking can be achieved by minimizing the Kullback-Leibler diver-gence of the reviews in the ranking and a (smoothed) uniform target distributionover all ratings for one topic. We refrained from evaluating this strategy due to thelack of large scale user data to test this type of ranking and the difficult mappingof all latent topics to well-defined product features.

13.6 Experiments

To evaluate our system and the different rankings we adopt a method from informa-tion retrieval to judge rankings based on novelty and diversity. The ideal rankingwould cover all different aspects and all different opinions about the aspects. Thefirst review in the ranking should cover many aspects of the product to serve asa good overview. This can be compared to sub-topic retrieval where Web searchengines try to find an optimal ranking to cover as many sub-topics as possible (see,for example, TREC 2009 Web Track, Diversity Task [84]). This evaluation ap-proach requires annotated results, namely each review needs to be annotated withthe sub-topics discussed in it. In the following we describe our dataset and theannotation of the test data.

Dataset

We crawled the Epinions5 Web site to get around 30,000 reviews for 300 products.Epinions is a general review Web site which hosts trusted opinions from customersand purchasers of products and services. We directly crawled the dataset fromthe publicly available reviews on the site. We used Paolo Massa’s [237] crawlerwritten in Perl and we extended it to be able to crawl and store textual reviewsin addition to rating profiles. The crawler was running consecutively over a threemonths period.

For the evaluation we manually annotate the reviews with features of the dis-cussed product together with the polarity. Out of the 300 products we randomlypicked four which had not only positive or negative ratings: “America West Air-line”, “Pokemon Snap for Nintendo 64”, “Starbucks”, and “Microsoft WindowsME”. Table 13.6 shows the distribution of ratings for these products in our corpus.

Figure 13.5 shows the distribution of positive and negative mentions of prod-uct aspects for the different rating classes. As to be expected, reviews with only

5www.epinions.com

222 PAPERS (F1)

Table 13.2: Distribution of the Ratings for the Test Products

Rating Number of Reviews for“Pokemon” “America West” “Starbucks” “Windows ME”

1.0 13 43 21 172.0 23 29 13 203.0 22 23 18 234.0 32 17 36 415.0 13 5 41 25∑

103 117 129 126

Figure 13.5: Average number of positive (1.0) and negative (−1.0) mentions ofaspects grouped by given rating

one star report mainly negative experience whereas five-star ratings correlate withmainly positive mentions of aspects in the review. Some features tend to be morepositive across all rating classes, e.g. if “graphics” are mentioned within a Pokemonreview it is mostly in a positive context, even if the associated rating was only “1”.“Seating/Space” in American West Airlines reviews on the other hand is mostlymentioned in a negative way across rating classes.

For manually annotating the reviews we first identified different features of the


Table 13.3: Sample Annotation Form for “America West Airlines” Reviews

Feature/Aspect Positive Negative"Timeliness" X"Customer Service""Seating/Space""Food" X"Pricing" X"Gate Changes""Frequent Flyer Program" X"Plane Quality""Luggage Condition" X"Overall Impression" X

products and then annotated each review with respect to these features. Table 13.6shows the annotation form to annotate reviews for “America West Airlines”.

ResultsWe report results comaring the two proposed models for review data representation– topic models and language models – and a combination of both. We also comparethese results with a baseline approach ranking the newest review highest. Time-based ranking is a common way on commercial review ranking sites to rank productreviews and therefore a good baseline for us.

We evaluate the summary-focused ranking and the sentiment-focused rankingon the annotated test data. The topic-focused ranking can not be evaluated au-tomatically using the test data since there is no inherent mapping between latenttopics and product aspects.

Summary-focused RankingTo evaluate the summary-based ranking we computed α-nDCG [85] for our rankingsusing the manually annotated reviews to assess novelty and diversity. α-nDCG onlyaccounts for positive and negative features and does not take different degrees ofpolarity into account in contrast to our optimization approach.

The results for the top-20 reviews ranked based on recency (time), latent top-ics (LDA), language models (LM), and a combination (LM+LDA) are shown inFigure 13.9, 13.7, 13.8, 13.6. For all four products, combining LDA and LM out-performs the other approaches followed by using a language model representation.For a discussion on the optimal number of latent topics for LDA see [204]

The influence of smoothing the rating by assigning a fuzzy membership degreefor each review to the review classes is shown in [204]. Using a rating smoothingmatrix as depicted in Equation 13.5 yields better results than using no smoothing

224 PAPERS (F1)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Lm-lda5

lm

lda5

time

Rank

α-nDCG

Figure 13.6: Summary Strategy: Comparing Recency with LDA and LM (α =0.99): ”America West Airlines”

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Lm-lda5

lm

lda5

time

Rank

α-nDCG

Figure 13.7: Summary Strategy: Comparing Recency with LDA and LM (α =0.99): “Pokemon Snap for Nintendo 64”

(diagonal rating smoothing matrix M = (mi,j) with mi,j = 1.0 if i = j and elsemi,j = 0.0). The results for using a rating smoothing matrix M = (mi,j) withmi,j = 0.2,∀i, j ∈ {1, . . . , |R|} are also worse. Different variations of the ratingsmoothing matrix could be necessary for different datasets depending for exampleon the skewness of the rating distribution over the classes or on individual userpreferences.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

lm+lda75

lm

lda75

time

Rank

α-nDCG

Figure 13.8: Summary Strategy: Comparing Recency with LDA and LM (α =0.99): “Microsoft Windows ME”

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

lm+lda50

lm

lda50

time

Rank

α-nDCG

Figure 13.9: Summary Strategy: Comparing Recency with LDA and LM (α =0.99): “Starbucks”

Sentiment-focused RankingThe results for sentiment-focused ranking focusing on either positive or negativeaspects are shown in Figure 13.10,13.11, 13.13, 13.12. To compute the α-nDCGvalues we only considered the positive, respectively negative, manually annotatedaspects to be relevant. As can be seen in the figure, summarizing the negativeopinions with the top-k reviews in the ranking for “America West’ is easier thanthe positive opinions. For “America West” the negative and positive aspects are

226 PAPERS (F1)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

lm+lda5 neg

time neg

lm+lda5 pos

time pos

Rank

α-n

DC

G

Figure 13.10: Sentiment Strategy: Comparing Recency with LM+LDA focusingonly on positive or negative aspects respectively (α = 0.99): ”America West Air-lines”

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

lm+lda5 neg

time neg

lm+lda5 pos

time pos

Rank

α-n

DC

G

Figure 13.11: Sentiment Strategy: Comparing Recency with LM+LDA focusingonly on positive or negative aspects respectively (α = 0.99): “Pokemon Snap forNintendo 64”

quite well covered after the first three reviews using LDA+LM whereas the coverageof negative opinions for TIME reaches a local maximum after four reviews and forpositive opinions it takes even more (eight).

13.7. CONCLUSIONS & FUTURE WORK 227

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

lm+lda75 neg

time neg

lm+lda75 pos

time pos

Rank

α-n

DC

G

Figure 13.12: Sentiment Strategy: Comparing Recency with LM+LDA focusingonly on positive or negative aspects respectively (α = 0.99): “Microsoft WindowsME”

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

lm+lda50 neg

time neg

lm+lda50 pos

time pos

Rank

α-n

DC

G

Figure 13.13: Sentiment Strategy: Comparing Recency with LM+LDA focusingonly on positive or negative aspects respectively (α = 0.99): “Starbucks”

13.7 Conclusions & Future Work

We presented in this paper an approach to rank reviews for products based on latenttopics and language models in combination with user-assigned ratings. The maingoal was to summarize the opinions expressed in all reviews for a product in thetop-k results of a ranking. In contrast to recommending single reviews we aimed atrecommending an optimal diverse set of reviews using methods from information

228 PAPERS (F1)

retrieval. We showed that diversified rankings of reviews allow users to grasp theoverall opinions about a product faster and more reliably, thus unburden the userfrom having to read many reviews to get an overview. We investigated reviewsfor products from four very differnt product categories, which exhibit differentcharacteristics in terms of features and user experience. Manual annotation of thereviews allowed an automatic evaluation of the proposed approach and a comparisonof different ranking strategies and different algorithms.

For future work we will investigate the possibility of personalizing the reviewrankings by taking personal preferences of users into account. For example, a usermight be more interested in the battery life of a product than the screen size.Another interesting direction is analyzing and categorizing product reviews on alarge scale to identify different types of reviews. As trust is a factor that directlyaffects the user confidence, we will also investigate the possibility to incorporatetrust between users and authors of reviews.

Chapter 14

Modeling and Measuring Trust inTopical Recommender Systems

N. Dokoohaki and M. Matskin,Mining Divergent Opinion Trust Networks through Latent Dirichlet Allocation,International Symposium on Foundations of Open Source Intelligence and Secu-rity Informatics (FOSINT-SI2012), 2012 IEEE/ACM International Conference onSocial Network Analysis and Mining (ASONAM ’12), IEEE Computer Society. Au-gust 2012.

229

Mining Divergent Opinion TrustNetworks through Latent DirichletAllocationNima Dokoohaki1, Mihhail Matskin2


Abstract

While the focus of trust research has been mainly on defining and model-ing various notions of social trust, less attention has been given to modelingopinion trust. When speaking of social trust mainly homophily (similarity)has been the most successful metric for learning trustworthy links, speciallyin social web applications such as collaborative filtering recommendation sys-tems. While pure homophily such as Pearson coefficient correlation and itsvariations, have been favorable to finding taste distances between individualsbased on their rated items, they are not necessarily useful in finding opiniondistances between individuals discussing a trending topic, e.g. Arab spring.At the same time text mining techniques,such as vector-based techniques,are not capable of capturing important factors such as saliency or polaritywhich are possible with topical models for detecting, analyzing and suggest-ing aspects of people mentioning those tags or topics. Thus, in this paperwe are proposing to model opinion distances using probabilistic informationdivergence as a metric for measuring the distances between people’s opinioncontributing to a discussion in a social network. To acquire feature sets fromtopics discussed in a discussion we use a very successful topic modeling tech-nique, namely Latent Dirichlet Allocation (LDA). We use the distributions

231

232 PAPERS (F2)

resulting to model topics for generating social networks of group and indi-vidual users. Using a Twitter dataset we show that learned graphs exhibitproperties of real-world like networks.

14.1 Introduction

Social Web is collectively perceived as an aggregate notion of users identities (pro-files) and belongings (contributed or customized content) linked across multiplenetworks [39]. As user generated content remains the most significant proportionof users belongings across social sites on-line today, information overload poses achallenge for service providers trying to leverage this content to users benefit andvalue through customization or personalization functions. A significant amount ofthis contribution is basically natural language spoken content such as tweets, in thecase of Twitter or status updates, in the case of Facebook.

While leaving size and format of this content aside, analyzing it is a challengefor both content providers and consumers. Thus an increasing number of works arefocused on proposing analyzing natural language content from social services forthe benefit of users and services [171]. While computational techniques are beingproposed for analyzing spoken text on social networks, opinion mining techniquesare increasingly attractive to analyze networks of users informing other like mindedones across the social sphere.

Topic modeling mechanisms [36] are increasingly attractive, due to their suc-cess in mining diverse opinions from e-commerce web, specially consumer reviewsites [204]. Thus an increasing number of researchers are proposing for their adop-tion within social web domain. Due to their probabilistic nature, it’s possibleto build social networks out of resulting mixture of topics and their associateddistributions [239]. While these networks have been limited to associating termsand authors [330], or communities and tags [67], modeling trust networks havenot been of significant attention. Moreover, due to their probabilistic learning ap-proach proposing divergence metrics as distances between nodes on the networkis of novelty. Since it can model diverse relations among users, while using thetopic modeling can allow for aspects like saliency, relevancy and even polarity tobe measured amongst networked opinions.

Thus in this work we are proposing for a topic modeling framework, within whicha trend corpora can be mined and by using a Latent Dirichlet Allocation (LDA)technique, both collective, and individual models can be defined. Resulting modelsare eventually used to generate social networks which reflect divergences of collectiveand individual opinions. This is followed by an experiment on a Twitter dataset tojustify that resulting networks own properties of real-world social networks, both instructure and in content. While these weighted graphs are focus of social networkanalytics within this manuscript, we plan to leverage them for tasks of filtering andrecommendation.

Finally, this paper is segmented into the following manner: first a comprehensive

14.2. RELATED WORK 233

background is presented. This is followed by framework description, within whicheach step is outlined and described in detail. Then experiment is outlined andfollowed by conclusion remarks and future works.

14.2 Related Work

We divide this part into two overlapping sections, one outlining topic mining tech-niques for social network analysis which is followed by a section focusing on usingtopic modeling for trust modeling and mining.

Social Network Analysis and Topic ModelsTopic models are of great importance in opinion mining and summarization liter-ature. So before focusing on topic models and their importance with respect tosocial network analysis and mining a brief introduction to opinion mining deemsnecessary.

An opinion summary encompasses any study that attempts to generate a conciseand digestible summary of a large number of opinions. A modern opinion summaryboils down to a structured digest that provides a well-organized breakdown by as-pects/topics, various formats of textual summaries and temporal visualization. Ina recent study, Kim, et al. [35] give a multi-perspective classification of approachesto opinion summarization and integration. Chen and Zimbra report on several ap-plications of opinion mining in various contexts [79].

Opinion summarization is more and more attractive to social networking ana-lysts. Opinion summarization and mining encompasses an increasingly large rangeof applications on social web but a few are most common among others in re-cent literature: personalized recommendation systems i.e. tag-based or collabora-tive filtering-based [397], sentiment analysis for instance in Twitter verse [31, 266].Among such techniques under focus, topic models have been very successful. Topicmodels are generative probabilistic models which utilize vocabulary distillations tospot topics within text corpora. Most widely utilized topic modeling techniquesinclude Probabilistic Latent Semantic Analysis (PLSA) [166] and Latent DirichletAnalysis (LDA) [36]. There exists applications of LDA to social web mining in theliterature [239, 330]. McCallum et al. [239] propose for Author Topic model (AT),as three Bayesian hierarchical models to deal with roles with email datasets. TheAuthor Recipient Topic model (ART) is a directed graphical model which modelssocial role as an explicit graphical model through a latent random variable. Arole is therefore a topic mixture characterizing the relation of two persons (that is,the author and the recipient). This work was applied to academic and email net-works [239],is improved later on by Rosen-Zvi et al, [330]. Daud et al. [88], focusingon the task of Conference Mining propose for an original model for discoveringthe latent topics between the authors, venues (conferences or journals) and timesimultaneously.

234 PAPERS (F2)

Topical Models of Trust

As trust has been the sole focus of artificial intelligence domain, multi- or even inter-disciplinary models of trust are very recent [67]. Models of opinionated trust hasbeen put forth in two distinct but correlated models. Closest to this idea is topicaltrust [146, 195], using topic labels as edge labels on a social network exemplifyingcontext or nature of a trustworthy relation. More recently, using natural languageprocessing techniques have been leveraged to summarize, integrate or recommendopinion summaries in form of trustworthy topic sets, which would be a part of thiscontribution. Golbeck and Hendler first set forth the concept of topical trust onthe web [146], for applications in trust network building and inference [138] andsocial recommender systems [388].

While computer scientists can understand topical models, social scientists needbetter tools to help make sense of that data. Despite popularity of topic modelingto similar problems, word counts and tag clouds are still adopted to interpret in-formation from textual data [300]. For social scientist to be able to leverage topicalmodels for network mining, new models and new metrics need to be proposed [224].Liu and Fang [224] propose for a tag recommendation algorithm that takes into ac-count usersâ social relations. they model user-created annotations, and the socialrelations between them using a topic model. They associate each node in this graphwith a tag preference vector. They use cosine similarity to find trust and establishlinks between users and resources through the tags. While Liu [224] et al, focuson combining LDA and trust modeling, distances used are vector-driven, e.g. Co-sine similarity, that does not capture latent similarity between two mixtures. Whileweighing on cold-start problem, Wang et al [366], propose for exploiting tag data todeal with the sparsity problem. Thus to improve the recommendation quality theypropose to combine a tag-based neighborhood method with a traditional rating-based CF. To improve the quality of recommendation, they use a tag matrix, whichis ultimately used as input to an LDA to generate probabilistic estimation which isjointly given as input to prediction module later on. Their results show improvedrecommendation quality in turn.

Weng et al. [375], propose for a heuristic measuring the influence of individualtwitter users taking both the topical similarity between users and the link structureinto account. They utilize an LDA algorithm to distillate and acquire topic setsfrom twitter users. This is followed by constructing links between twitter users.They show that through existing homophily in twitter, a notion of reciprocity canbe observed. While we use the same two steps, A problem with this work is that noinformation on pre- or post-processing has not been given. Caverlee et al [67], haveproposed for SocialTrust++ within which they develop and analyze algorithms forand leveraging community-based notion of trust. While in their modeling theyweigh a lot on community model of trust, in order to model and mine implicit com-munities they emphasize on usefulness of probabilistic topic modeling techniquesspecifically on LDA. In addition they report that by leveraging LDA-based retrieval,community oriented ranking model results in a significant improvement over other

14.3. MINING TOPIC FACTS AND OPINIONS FROM SOCIAL MEDIA 235

alternatives [187]. As we are using LDA for modeling and inferring relations be-tween users, this work becomes quite close to our idea. Though the Dirichlet distri-butions are used to model communities rather than individual opinions, as well asthe relationship weights are taken as mix of communities, individuals and resources.This is while we model individuals and groups from a trending central topic, anduse the distance between their respective latent models as the relationship weights.

14.3 Mining Topic Facts and Opinions from Social Media

Modern social web is the prominent location for individual and group opinions tobe documented and shared. This has attracted marketing businesses to channelexisting social media as their marketing playground and promote their ideas orproducts and in return get feedback from the masses. 70 percent of bloggers areorganically talking about brands on their blog, while 38 percent of bloggers postbrand or product reviews [337]. Although some social media channels are best forsharing factual messages, others are well utilized for sharing an opinion instead.Since our respective experiment is focused on data gathered from Twitter 1., fromthis point forward we focus on content from this social service. While Twitter is anideal channel for marketing facts, many individuals including bloggers, politiciansand celebrities leverage it for sharing their opinions with public. Irregardless ofthe content being shared on Twitter, timely dissemination of facts and opinions iscrucial as it allows building reader confidence through creating a trusted source.Twitterâs hash-tag feature, similar to tag in tagging services like Delicious 2, allowsfor a person to define the audience of a message (e.g. #bigdata will document afact or opinion to Twitter users whose interest are on large scale analytics).

Framework for Topic Mining and Analysis

While the focus of this paper is on learning opinion networks from any gatheringof users, its important to realize the obstacles faced when it comes to mining topicsets and their respective distributions from update streams on social services. Mainconcerns for modeling of topics on social web are twofold: first, the size of thetext is often very short, e.g. in case of Twitter only 140 characters, while in thecase of Facebook, accessing updates are bound to privacy access rights and oftenimpossible. Second, features are not as focused as a column written in a reviewwebsite, so saliency and relevancy are always question. For instance a twitteruser tweeting about Economics might be talking about his negative experience at auniversity course while we might be searching for updates related to Eurozone crisisinstead. This needs also to be stated that a popular social network is opinionatedmulti-lingually.

1Twitter, http://twitter.com2Delicious, http://delicious.com/

http://twitter.com

http://delicious.com/

236 PAPERS (F2)

Figure 14.1: Overall Framework for Opinion Trust Modeling and Mining: Topicacquisition, involving text preprocessing to remove language faults as well clusteringtagged corpora through LDA, followed by dividing resulting distributions into userand trend models. Finally, trust estimation allows resulting topic models to bemapped onto corresponding cells of trust matrix through divergence metrics.

To overcome aforementioned problems, we limit the scope of this manuscriptto proposing for using topic models on a tweets surrounding a trending event orproduct, such as #occupy movement or #iPhone product. Topic modeling allowsfor correlations between topics be found, in addition to the word correlations whichconstitute topics in the corpora of tweets at hand. This allows for relevant abstracttopics to be extracted and pointed out, which ultimately addresses the saliency andrelevancy problem, the generative nature of topic models, allows for topics to beinferred from the existing corpora of tweets, which is useful for summarization ofa large number of tweet contents. In addition topic modeling can be completelyunsupervised, which can easily allow applications to be defined to consume, analyzeand summarize streams of updates in real-time fashion. To limit the focus of topicsand also overcome the short nature of updates and posts on Twitter for instance,we have chosen also trending issues to make sure that not only we retrieve postsrelated to our task at hand, but also enough data can be retrieved that does notundermine the performance of algorithm at hand.

Following the justifications of our approach, we have proposed for the followingframework 14.1. First we need to model respective topics surrounding the discussionat hand by acquiring the corresponding features or topics from the corpora of tweetsat hand. This is done through a probabilistic topic model. Since we model networkof users, we need to do separate modeling of both users and trending topics athand. While the resulting model helps us to identify common features betweenthe topics in the discussion, it also helps us to eliminate irrelevant topics andallows us to create a main model to compare individual as well as group of users’contribution. To measure such distance, we utilize probabilistic information gains(relative and total) to both analyze the divergent opinions of users towards eachother. This will in turn allow us to build resulting opinion matrices (final step ofour approach), which in turn can show us divergence of user groups from trendingmodel, or visualize the distance of user opinions from each other in case of user touser comparison. Later on, in experiment part we show case evaluation of a subsetof tweets from 2011 where we analyze the results of both matrices using different

14.4. MODELING TWEETS 237

“. . . Eurozone PMI Services rises to 55.9 in January from 54.2 in Decem-ber http://bit.ly/xxxx ” . . . “europe IMF Supports Extension of GreekLoan Repayment Period: Eurozone leaders reportedly reach consensushttp://bit.ly/xxxx”

Eurozone ^, PMI ^, Services N, rises V to P 55.9 $ in P January ^ from P 54.2 $ inP December ^ http://bit.ly/xxxx,U . . .europe ^ IMF^ Supports ^ ExtensionN of P Greek A Loan N Repayment N Period N: , Eurozone ^ leaders N reportedly R reach V consensus N http://bit.ly/xxxx U

Figure 14.2: Preprocessed Tweet lines using TweetNLP: Entry on Top; Tokenizedand POS-tagged on Bottom. Tags are presented in distinct colors. For instance,$tag represents a numerical value, U tag represent a link or URL.

configurations. In following sections, each part of our framework will be detailedout.

14.4 Modeling Tweets

To model the tweets we will use probabilistic topic models [205]. In the followingsection we detail out what is LDA and how we use it in our task. As pointed outearlier we need to model both the trend (corpora of all tweets) to get an overviewof overall opinions, and each and individual user (corresponding users which theirtweets are subset of collective corpora at, along with their relevant tweets).

Preprocessing

Since we are dealing with user-generated content, prior to any steps to be taken weneed to make sure that no user-asserted spelling errors or bad language can get intoour way and affect the performance of our algorithm. More over, when dealing withmultilingual data it’s important to be able to find and filter out tokens of languagesunder study. Since dealing with lemmatization, segmentation and part-of-speechtagging has been an important problem when analyzing corpora from Twitter, thereare existing efforts on this subject [31,55,137,266]. We have used Carnegie Mellonuniversity’s TweetNLP 3 tool set [137]. In this tool set authors propose for a tagset, annotated data and features. We used TweetNLP for tokenization and part-of-speech tagging. Figure 14.2 shows a sample tweet (above box) and its resultingprocessed output (below box).

3TweetNLP,http://www.ark.cs.cmu.edu/TweetNLP/

http://www.ark.cs.cmu.edu/TweetNLP/

238 PAPERS (F2)

TweetNLP’s Tagger is a Conditional Random Field (CRF) classifier [333], whichincorporates arbitrary local features in a log-linear model. The base features in-cludes a feature for each word type, a set of features that checks whether the wordcontains digits or hyphens, suffix features up to length 3, and features lookingat capitalization patterns in the word. This is followed by added features thathelp leveraging domain-specific properties, unlabeled in-domain data, and externallinguistic resources including twitter orthography, traditional tag dictionary anddistributional similarity [137]. When we tested our experiment corpus containing1600 tweets with TweetNLP, base classifier gave a total accuracy of 0.95 (95%confidence interval 0.945 +/- 0.008) which is very decent for such randomly se-lected corpora. Since the tool set is based on an English dictionary we filtered outnon-English tokens easily on second iteration. Although unsupervised POS tag-ging has been suggested widely in literature for Twitter, our experiment shows thatsemi-supervised POS-tagging with human annotation could result in more focusedresults.

Acquiring Topics

The goal of the topic acquisition step is automatic identification and extractionof topics that social users are interested in based on the text updates they post.Latent Dirichlet Allocation (LDA) model [36] is an unsupervised machine learn-ing technique that helps identifying latent topic information from large documentcollection. LDA utilizes âbag of wordsâ modeling, for categorizing each documentwith respect to count of vector of words. Based on this assumption, each documentis represented as a probability distribution over some topics, while each topic isrepresented as a probability distribution over a number of words. An atomic tweetmight contain only a single aspect or feature of an event or product, e.g. as op-posed to a review of a product. That is why it’s important to model the collectiveopinion of a crowd sharing opinion about a product or an event. These features oraspects could also vary, that is why we have adopted a statistical topic modelingapproach to find features or aspects. We used Mallet 4, for modeling topic fromexisting corpora. which makes use of Gibbs sampling for computing the latent top-ics. Latent Dirichlet Allocation, models each group of tweets as a mixture of latenttopics. Figure 14.3 shows the graphical notation for LDA. By default of the library,we have used multi-grams [240].

As mentioned earlier, we model two separate groups respectively (resulting intwo separate matrices): a generative model for trend, and a generative model foruser profiles. Instead of altering the generative process of LDA, we model bothmodels through the same approach but with separate distributions as described asfollows.

We take a corpora of K documents, each representing i documents (e..g tweets),such that Ki will be count of all words in corpora in total of d documents.

4Mallet.http://mallet.cs.umass.edu [240]

http://mallet.cs.umass.edu


DZ Ki

α θ

z

tφ

β

Figure 14.3: Graphical Presentation of Latent Dirichlet Allocation

1. Select θi ∼ Dirichlet(α) where i ∈ 1, . . . , D

2. Select φt ∼ Dirichlet(β) where i ∈ 1, . . . ,K

3. For each words wij where j ∈ 1, . . . , Ni

• Choose a topic tij ∼Multinomnial(θi)• Choose a word wij ∼Multinomnial(φtij )

This generative model eventually codes the corresponding trend and respectiveprofiled interests, from which we can infer the unobserved topic and user interesttopics through learning model parameters.

Learning process, as mentioned earlier is done through sampling. Learningrespective distributions, e.g. the set of topics, their associated word probabilities,the topic of each word, and the particular topic combination of each document)is a problem of Bayesian inference [36]. There are various methods for evaluatingLDA and respectively estimating the inferred sets [361]. One of the approaches thatcurrently MALLET [240] is making use of is importance sampling, the generativeprocess for Empirical likelihood evaluation method [239] was made use of in thiswork. LDA finds a pre-specified set of |Z| topics within |D| documents. Eachterm t in a tweet with Ki terms then ends up correlated with a topic z. Z ={z1, z2, z3, . . . , zn} is the set of n latent topics which exemplifies coarseness andresulting final set of topics.

LDA determines a combination of topic sets for each document in the inputdata throughout the modeling process outlined earlier. Thus, through importancesampling P (w|θ(d)) will be generated as follows:

240 PAPERS (F2)

Table 14.1: Sample top 5 words in topics with proportions for tweets presentingEurozone trendTerms Prob.economics (6) political (3) reading (2) economic (2) current (2) 0.053economics (6) make (2) talking (2) politics (2) builders (1) 0.042economics (9) politics (4) hypnosis (2) year (2) caexpo (2) 0.032economics (6) pricing (4) effect (3) network (3) 0.023

P (w|θ(d),Φ) =∏

n

∑zn

P (wn, zn|θ(s),Φ) (14.1)

where P (w|θ(d)) is the probability of all multi-grams for a given input document d,zi is ith latent topic, wi is ith word of document input D. θ is the document-specifictopic distribution probability. P (wn|θ(s)) are estimated from a synthetic document,randomly-generated using θ(s). This can be simplified through prior knowledge ofd documents before hand thus 14.1 will be simplified as follows:

P (wi | d) =∏

n

|Z|∑j=1

P (wi, zj | d), (14.2)

Following this we can represent latent topics as a list of multi-grams with aprobability for each multi-gram indicating the membership degree within the topic.Furthermore, for each document in our corpus we can determine to which topics itbelongs, also associated with a degree of membership (topic probability P (wi, zj |d)). An example of two extracted latent topics represented by the top 30 terms isshown in Table 14.1. Beside the terms also the probability for the terms belongingto the topic are shown. For this example we used |Z| = 30 latent topics.

Divergence Metrics for Trust ModelingFollowing the modeling of trend and profiles, two opinion matrices will be made:A matrix for trend-topic divergence (TM), and a matrix for user-user divergence(UM). While the former matrix allows us to be able to measure the distances be-tween individual contributing opinions and the collective opinion from trend model,the latter matrix helps us to be able to measure distances between individual opin-ions involved in the stream of trending discussions at hand.

As emphasized earlier cosine-based metrics, i.e. tf − idf distance, capture vectordistances and no latent information such as saliency, relevancy and polarity whichare of high importance in modern information retrieval [85] can be measured inreturn. Taking this into account for the task of social network analysis through topicmodeling, more justified metrics are needed. In probabilistic information theory,


Kullback-Leibler Divergence, or simply relative divergence, is of high importancewhen evaluating probabilistic information retrieval tasks, as it can measure thedistance between two resulting distributions from topic models. Since measuringdistances of collective opinion of groups, or individuals on a trending ground, canbe modeled through information divergence. Kullback-Leibler divergence estimatesthe number of additional bits needed to encode the distribution Q, using an optimalcode for P , and having a combined vocabulary size of |Z ′|; in our case the numberof latent topics |Z| times the number of rating classes |R|.

DKL(Q||P ) = H(Q;P )−H(Q) =|Z′|∑i=1

qi ∗ Log2( qipi

) (14.3)

A problem with adopting divergence metrics is non-symmetric nature of thismetric, e.g. DKL(Q||P )6=DKL(P ||Q). This nature of divergence metrics can beused to model directed social networks. Then DKL(Q||P ) can model a networkof two nodes Q and P , with a directed edge from Q towards P weighting as|DKL(Q||P )|. In addition to lack of symmetry, a latter objection to using Kullback-Leibler as a metric is lack of normalization, as the resulting values might fall betweenany range of numbers.

This is while a weighted graph needs a normalized weight to present the distancesbetween any set of nodes on the resulting network. Thus instead of relative diver-gence, total divergence of two respective distributions can be used instead. Thuswe use Jensen-Shannon Divergence to model similarity between two respective dis-tributions at hand, as a result with two distributions of Q and P :

DJS(Q||P ) = 12(DKL(Q‖M) +DKL(P‖M)) (14.4)

M is the average of two probability distributions and is calculated as M = 12 (Q+

P ) and DKL(Q||P ) is Kullback-Leibler divergence of Q from P calculated usingEq.14.3. We further normalize the result of Eq.14.4. We can define topical distanceof two profiled users U and V , as divergent opinion trust between U and V :

trustu,v = 2 ∗√DJS(U, V )2 (14.5)

where trustu,v is trust between users U and V , which is in turn measured throughnormalized distance of DJS . Using Jensen-Shannon Divergence as a metric givestwo benefits over Kullback-Leibler: since the results are normalized they can bemapped onto continuous range of [0,1], moreover since Jensen-Shannon is symmetricwe can model free-form, .e.g undirected, edges on resulting networks that we modelthrough our framework as result.

242 PAPERS (F2)

trustu,v ={

0, DJS(U ||V ) = 02 ∗

√DJS(U, V )2

, DJS(U ||V ) ≥ 0

Divergence metrics either find a latent-level similarity between two distributionsor not, thus allowing the trust levels over these relations be uniformly distributed.

14.5 Experiment: Mining Networks of Eurozone TrendingNews Corpora

As a proof of concept, in this section we are presenting our experiment with aTwitter corpus. To begin with, first we present our dataset briefly, followed byevaluation section where we detail out the evaluation results with respect to corpusand data at hand. Our analysis will be focused on the features of generated networksfrom perspectives of social network analysis.

DatasetWe have used Tweets 2011 [263] dataset, which is part of TREC 2011 MicroblogsTrack. This dataset contains identifiers to more than 16 million tweets and servesas a realistic representation of Twitterosphere as dataset is not reprocessed, nor ithas been normalized to filter out Spam for instance. Since data is gathered in 2011,we decided to focus on trending subjects from 2011 year with prior knowledge oftrending events or subjects on Twitter. Among subjects we decided to focus on atrending news story, namely Eurozone’s economic crisis. Other subjects are understudy and are subject to publication for future work. To make sure that we pull outcorresponding tweets we expanded the queries with hash-tags related surroundingEurozone. We have used an on-line hash-tag search engine, i.e. Hashonomy 5,that analyzes trends from Twitter to gather related hash-tags to our work. As asoftware stack is being developed around our current framework we plan to useweb semantics for query expansion by analyzing existing twitter content, at hand.After query expansion, the retrieved results grossed to 1600 distinct tweets. Afternetwork extraction a total of 695 user profiles were extracted. We additionallyexpanded the profiles with related tweets made by the same person to increase theperformance of algorithm as well.

EvaluationWe divide the analysis part of the paper to two sections: first section, we presentevaluation of results generated through Trend matrix (TM), followed by evaluationof results generated through User matrix (UM), respectively.

5Hashonomy,www.hashonomy.com

www.hashonomy.com

14.5. EXPERIMENT: MINING NETWORKS OF EUROZONE TRENDINGNEWS CORPORA 243

Figure 14.4: Average divergence of trend distributions measured against variousdensities of user clusters.

Trend Analytics

While corresponding topic model to trend matrix (TM) helps us to generate in-ferred topics for all pulled-out tweets and their respective distributions, this caneasily allow us to digest the collective opinion of users discussing the correspondingtopics. This ultimately helps to set this model as a basis to compare each user,or groups of users to the model and estimate their divergence and model the tiesbetween them on the graph.

As a result when generating the resulting trend matrix, we initially build a singledimensional matrix, i.e. to model < user, trend > pairs, with divergence value inthe cells. As the size of our test corpora is fixed, we decided to group the usersin fixed cluster sizes, based on their common topics (similar interests). Thus thematrix is squeezed down to resulting cluster per trend < clusternumber, trend >.This consideration will helps us see how much resulting group sizes and their cor-responding distributions can increase or decrease the gain. To normalize result ofdistances between user clusters per trend, we average the divergence metrics .e.g

¯DJS(Q||P ) = 1l

∑i=1

DJS(Q||P ).

Fig.14.4 plots the variation of average divergence of each cluster’s correspondingdistributions with respect to trend distribution. What is obvious from results is thatthe larger the group, the lower the distances so the higher the similarity betweentrend and corresponding group of users. While the result converge the larger thesize of user clusters, we see the effect of expanding the user profiles in determinismof results. It is clearly visible that since Kullback-Leibler divergence ¯DKL are notnormalized their results converge later than Jensen-Shannon’s, ¯DJS . Taking intoaccount the size of the corpora at hand, perhaps we can easily see that increasingthe size of data has great impact on measuring contributing opinions to a trendingdiscussion. This is while with fixed-sizes of clusters, expanding the profiles willnot necessary improve the results. This could be due to using existing corpora for

244 PAPERS (F2)

Figure 14.5: Fruchterman-Reingold visualization of evolution of learned trust net-work of Eurozone twitter(er)s: from left to right, network of user cluster of2%,10%,20% and 30% size. Weights on the edges of graph are rescaled to reflectthe impact of divergence. Larger sizes of networks were not presented as they losetheir structural visibility. Weight values are presented using visualized pressureson network links.

expansion of profiles, rather than considering crawling more content from relateduser’s Time-lines from Twitter instead.

Social Network Analytics

User matrix which is the latter product, and perhaps most important output,ofthe framework presented to you so far is a two-dimensional matrix storing <user, user > pairs with their divergences associated. Dimensions are based onthe size of the users extracted from the corpora under focus. Similar to previouspart, we squeeze the matrix to various sizes based on the size of networks we areinterested in extracting. We follow the same methodology by grouping users intofixed sizes of clusters. Now being capable of generating generating networks of var-ious diameters, we use two social network analysis metrics to study the resultingnetworks, on local and global scales [229]. Fig 14.5 showcases visualizations of fourdifferent diameters of resulting networks using Gephi6.

With respect to node level analytics, average weighted degree of a node is studied.we know that count of edges attached to nodes is an effective measure of impor-tance of nodes. The higher the value, the more important a node is in a graph.Proportion of nodes directly connected in the entire graph is as a result measuresthe reachability of nodes. The weighted degree of node i is simply the total ofvalues wij associated with L links in total, as follows:

6Gephi,www.gephi.org.

www.gephi.org

14.5. EXPERIMENT: MINING NETWORKS OF EUROZONE TRENDINGNEWS CORPORA 245

Figure 14.6: Node level analytics on trust graph: average sum of weights of theedges (average Weighted degree). Horizontal axis presents the percentage of totaluser groups sampled for the experiment, while vertical axis plots the respectivedegree.

ki = CD(i) =L∑j

wij (14.7)

With respect to network level analysis, we have used Clustering coefficient whichis a measure of degree to which nodes intend to cluster together. The clusteringcoefficient for the whole network is given by Watts and Strogatz [368] as average ofthe local clustering coefficients of all the links n as follows:

C̄ = 1n

n∑i=1

Ci (14.8)

Where Ci is local clustering coefficient calculated as follows:

Ci = {ejk}ki(ki − 1) (14.9)

Following equations 14.7, 14.8, 14.9, results of social network metrics for graphsgenerated from user matrix (UM) are plotted in Fig.14.6 and Fig.14.7.It is generally accepted that if a social graph is to reflect characteristics of real-world networks, within its structure nodes will tend to create tightly knit groupscharacterized by a relatively high density of ties. Taking into consideration this

246 PAPERS (F2)

Figure 14.7: Network level analytics on trust graph: degree of clustering of nodes(Clustering coefficient). Horizontal axis presents the percentage of total user groupssampled for the experiment, while vertical axis plots the respective coefficient.

hypothesis and focusing on the plotted results for various densities of user clusters,we can observe easily that weights on the first plot, that the average weighteddegree sharply increases for less than 50% of networks. This claim can easier bejustified through clustering coefficients, as even a small diameter network is denselyclustered (starting from 0.923). As a result, we can easily observe that networksgenerated via our framework are much suitable to model real-world social networks.Moreover probabilistic generative process of our algorithm, never leaves a distanceempty unless two profiles are completely distant from each other.


Within this paper we have proposed a framework for opinion-mining from Twitter’scontent corpora, through which latent topics are acquired and then used for generat-ing opinion trust matrices. These matrices are then used to generate weighted socialnetworks. An analysis presented showed that these networks represent real-worldmodels of profiles and trending news or events that can be used for applications ofbusiness intelligence such as advise giving, viral analytics and influence metrics forinstance. Being an initial analysis, we are planning to study more trending storiesand events to better establish the concept set forth in this manuscript. Since itwas evident that data size can affect the performance of algorithm, we are planningto use larger data sizes as a future work. This is followed by leveraging resultingnetworks for ranking, recommendation and summarization tasks.

Part III

References

247

Bibliography

[1] Alfarez Abdul-Rahman and Stephen Hailes. A distributed trust model. InNSPW ’97: Proceedings of the 1997 Workshop on New Security Paradigms,pages 48–60, New York, NY, USA, 1997. ACM.

[2] Fabian Abel, Dominikus Heckmann, Eelco Herder, Jan Hidders, Geert-JanHouben, Daniel Krause, Erwin Leonardi, and Kees van der Slujis. A Frame-work for Flexible User Profile Mashups. In International Workshop on Adap-tation and Personalization for Web 2.0 (AP-WEB 2.0 2009), volume Vol-485,pages 1–10. CEUR-WS.org, June 2009.

[3] Fabian Abel, Nicola Henze, Eelco Herder, Geert-Jan Houben, Daniel Krause,and Erwin Leonardi. Building blocks for user modeling with data from thesocial web. In Proceeding of the International Workshop on Architectures andBuilding Blocks of Web-Based User-Adaptive Systems 2010 (WABBWUAS-2010), 2010.

[4] Silvana Aciar, Debbie Zhang, Simeon Simoff, and John Debenham. Informedrecommender: Basing recommendations on consumer product reviews. Intel-ligent Systems, IEEE, 22(3):39–47, 2007.

[5] Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong.Diversifying search results. In Proceedings of the Second ACM InternationalConference on Web Search and Data Mining (WSDM ’09), pages 5–14, Newyork, USA, 2009. ACM Press.

[6] David W. Aha, Dennis Kibler, and Marc K. Albert. Instance-based learningalgorithms. Mach. Learn., 6(1):37–66, 1991.

[7] Gail-Joon Ahn, Mohamed Shehab, and Anna Cinzia Squicciarini. Securityand privacy in social networks. IEEE Internet Computing, 15(3):10–12, 2011.

[8] Boanerges Aleman-Meza, Meenakshi Nagarajan, Cartic Ramakrishnan,Li Ding, Pranam Kolari, Amit P. Sheth, I. Budak Arpinar, Anupam Joshi,and Tim Finin. Semantic analytics on social networks: experiences in ad-dressing the problem of conflict of interest detection. In Proceedings of the

249

250 BIBLIOGRAPHY

15th International Conference on World Wide Web (Edinburgh, 2006). ACMPress, 2006.

[9] Rehab Alnemr, Adrian Paschke, and Christoph Meinel. Enabling reputa-tion interoperability through semantic technologies. I-SEMANTICS ’10. ACMPress, 2010.

[10] Sarabjot S. Anand, Patricia Kearney, and Mary Shapcott. Generating se-mantically enriched user profiles for web personalization. ACM Trans. Inter.Tech., 7(4), October 2007.

[11] Sarabjot S. Anand and Bamshad. Mobasher. Intelligent techniques for webpersonalization. Lecture Notes in Computer Science, 3169:1–36, 2005.

[12] Alex Sinner Andreas von Hessling, Thomas Kleemann. Semantic user profilesand their applications in a mobile environment. Artificial Intelligence inMobile Systems 2004, 2004.

[13] Roberto Aringhieri, Ernesto Damiani, Sabrina De Capitani di Vimercati,Stefano Paraboschi, and Pierangela Samarati. Fuzzy techniques for trustand reputation management in anonymous peer-to-peer systems. JASIST,57(4):528–537, 2006.

[14] Robert M. Arlein, Ben Jai, Markus Jakobsson, Fabian Monrose, andMichael K. Reiter. Privacy-preserving global customization. In ACM Con-ference on Electronic Commerce, pages 176–184, 2000.

[15] D Artz and Y Gil. A survey of trust in computer science and the semanticweb. Web Semantics: Science, Services and Agents on the World Wide Web,5(2):58–71, 2007.

[16] Paolo Avesani, Paolo Massa, and Roberto Tiella. A trust-enhanced recom-mender system application: Moleskiing. In In SAC â05: Proceedings of the2005 ACM symposium on Applied computing, pages 1589–1593. ACM Press,2004.

[17] Paolo Avesani, Paolo Massa, and Roberto Tiella. A trust-enhanced recom-mender system application: Moleskiing, pages 1589–1593. ACM, 2005.

[18] Jeremy Avnet and Jared Saia. Towards robust and scalable trust metrics,2003.

[19] Baharum Baharudin, Lam Hong Lee, and Khairullah Khan. A review ofmachine learning algorithms for text-documents classification. Journal ofAdvances in Information Technology, 1(1), 2010.

[20] William Sims Bainbridge, Edward E Brent, Kathleen M Carley, David RHeise, Michael W Macy, Barry Markovsky, and John Skvoretz. Artificialsocial intelligence. Annual Review of Sociology, 20(1):407–436, 1994.

BIBLIOGRAPHY 251

[21] Nilanjan Banerjee, Dipanjan Chakraborty, Koustuv Dasgupta, S. Mittal,A. Joshi, Seema Nagar, A. Rai, and S. Madan. User interests in social mediasites: an exploration with micro-blogs, pages 1823–1826. ACM, 2009.

[22] Philip Beineke, Trevor Hastie, Christopher Manning, and ShivakumarVaithyanathan. An exploration of sentiment summarization, pages 1–4.AAAI, New York, USA, 2003.

[23] France Bélanger and R.E. Crossler. Privacy in the digital age: A reviewof information privacy research in information systems. MIS Quarterly,35(4):1017–1041, 2011.

[24] R. Bell and Y. Koren. Scalable collaborative filtering with jointly derivedneighborhood interpolation weights. In In IEEE International Conference onData Mining (ICDM’07), pages 175–186. IEEE, 2007.

[25] Izak Benbasat and Weiquan Wang. Trust in and adoption of online recom-mendation agents. J. AIS, 6(3), 2005.

[26] Shlomo Berkovsky, Paolo Busetta, Yaniv Eytani, Tsvi Kuflik, and FrancescoRicci. Collaborative filtering over distributed environment. In in proc. of theDASUM Workshop, 2005.

[27] Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic web. Sci-entific American, 284(5):34–43, May 2001.

[28] Marco Berni, Nima Dokoohaki, Elena Fani, Eero Hyvönen, Tomi Kauppinen,Mihhail Matskin, Eetu Mäkelä, and Tuukka Ruotsalo. Smartmuseum: acultural heritage knowledge exchange platform based on ontology-oriented,context-aware and profiling systems. Proceedings of 2009 Electronic Imagingand the Visual Arts EVA’09, Apr 2009.

[29] Diego Berrueta, Dan Brickley, Stefan Decker, Sergio Fernández, ChristophGörn, Andreas Harth, Tom Heath, Kingsley Idehen, Kjetil Kjernsmo, AlistairMiles, Alexandre Passant, Axel Polleres, Luis Polo, and Michael Sintek. Sioccore ontology specification. W3c member submission, W3C, June 2007.

[30] Kamal K. Bharadwaj and Mohammad Yahya H. Al-Shamri. Fuzzy com-putational models for trust and reputation systems. Electronic CommerceResearch and Applications, 8(1):37–47, 2009.

[31] Albert Bifet and Eibe Frank. Sentiment knowledge discovery in twitterstreaming data. In Bernhard Pfahringer, Geoffrey Holmes, and Achim G.Hoffmann, editors, Discovery Science, volume 6332 of Lecture Notes in Com-puter Science, pages 1–15. Springer, 2010.

252 BIBLIOGRAPHY

[32] Daniel. Billsus and Michael J. Pazzani. A hybrid user model for news storyclassification. In 7th International Conference on User Modeling, pages 99–108, Banff, Canada, June 1999.

[33] Daniel Billsus and Michael J. Pazzani. Adaptive news access. In The AdaptiveWeb: Methods and Strategies of Web Personalization, chapter 18, pages 550–570. Springer, 2007.

[34] F. Bimbot, R. Pieraccini, E. Levin, and B. Atal. Variable-length sequencemodeling: multigrams. Signal Processing Letters, IEEE, 2(6):111 –113, June1995.

[35] H. Binali, V. Potdar, and Chen Wu. A state of the art opinion mining andits application domains. In Industrial Technology, 2009. ICIT 2009. IEEEInternational Conference on, pages 1 –6, feb. 2009.

[36] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allo-cation. J. Mach. Learn. Res., 3:993–1022, March 2003.

[37] Eric Bloedorn, Inderjeet Mani, and T. Richard MacMillan. Machine learningof user profiles: Representational issues. CoRR, cmp-lg/9712002, 1997.

[38] Uldis Bojars, John Breslin, and Alexander Passant. Sioc ontology: Applica-tions and implementation status. W3C Member Submission, 12, 2007.

[39] Danah M. Boyd and Nicole B. Ellison. Social network sites: Definition, his-tory, and scholarship. J. Computer-Mediated Communication, 13(1):210–230,2007.

[40] J. Brank, M. Grobelnik, and D. Mladenic. A survey of ontology evaluationtechniques. In Proceedings of the Conference on Data Mining and Data Ware-houses (SiKDD 2005), pages 166–170. Citeseer, 2005.

[41] John S. Breese, David Heckerman, and Carl Myers Kadie. Empirical analysisof predictive algorithms for collaborative filtering. In Gregory F. Cooper andSerafÃn Moral, editors, UAI, pages 43–52. Morgan Kaufmann, 1998.

[42] John G. Breslin, Andreas Harth, Uldis Bojars, and Stefan Decker. Towardssemantically-interlinked online communities. In A. Gomez-Perez and J. Eu-zenat, editors, European Semantic Web Conference (ESWC), volume 3532 ofLecture Notes on Computer Science, pages 500–514. Springer, 2005.

[43] Dan Brickley. Web of trust rdf ontology. http://xmlns.com/wot/0.1/, 2002.

[44] Dan Brickley and Libby Miller. Foaf vocabulary specification. http://xmlns.com/foaf/spec/, 2005.

[45] Dan Brickley and Libby Miller. The Friend Of A Friend (FOAF) vocabularyspecification, November 2007. http://xmlns.com/foaf/spec/.

http://xmlns.com/wot/0.1/

http://xmlns.com/foaf/spec/



BIBLIOGRAPHY 253

[46] David Brondsema and Andrew Schamp. Konfidi: Trust networks using pgpand rdf. In Tim Finin, Lalana Kagal, and Daniel Olmedilla, editors, MTW,volume 190 of CEUR Workshop Proceedings. CEUR-WS.org, 2006.

[47] Juergen Bross and Heiko Ehrig. Generating a context-aware sentiment lex-icon for aspect-based product review mining. In Proceedings of the 2010IEEE/WIC/ACM International Conference on Web Intelligence and Intel-ligent Agent Technology (WI-IAT’10), Washington, DC, USA, August 31–September 3 2010. IEEE Computer Society.

[48] P. Brusilovsky, A. Kobsa, and W. Nejdl. The adaptive web: methods andstrategies of web personalization, volume 4321. Springer, 2007.

[49] Peter Brusilovsky. From adaptive hypermedia to the adaptive web. In GerdSzwillus and Jürgen Ziegler, editors, Mensch and Computer, pages 21–24.Teubner, 2003.

[50] Peter Brusilovsky and Carlo Tasso. Preface to special issue on user modelingfor web information retrieval. User Modeling and UserAdapted Interaction,14(2/32/3):147–157, 2004.

[51] Ramona Bunea, Shahab Mokarizadeh, Nima Dokoohaki, and MihhailMatskin. Exploiting dynamic privacy in socially regularized recommenders.In Data Mining Workshops (ICDMW), 2012 IEEE 12th International Con-ference on, pages 539 –546, dec. 2012.

[52] Robin Burke. Hybrid recommender systems: Survey and experiments. UserModeling and User-Adapted Interaction, 12(4):331–370, 2002.

[53] Robin Burke. Hybrid web recommender systems. In Peter Brusilovsky, AlfredKobsa, and Wolfgang Nejdl, editors, The adaptive web, chapter Hybrid webrecommender systems, pages 377–408. Springer-Verlag, Berlin, Heidelberg,2007.

[54] Robin Burke, Bamshad Mobasher, Roman Zabicki, and Runa Bhaumik. Iden-tifying attack models for secure recommendation. In A Workshop on the NextGeneration of Recommender Systems Research, pages 19–25. IUI, 2005.

[55] Luca Cagliero and Alessandro Fiori. Analyzing twitter user behaviors andtopic trends by exploiting dynamic rules. In Longbing Cao and Philip S. Yu,editors, Behavior Computing, pages 267–287. Springer London, 2012.

[56] John Canny. Collaborative filtering with privacy. In SP ’02: Proceedings ofthe 2002 IEEE Symposium on Security and Privacy, page 45, Washington,DC, USA, 2002. IEEE Computer Society.

254 BIBLIOGRAPHY

[57] John Canny. Collaborative filtering with privacy via factor analysis. In SIGIR’02: Proceedings of the 25th annual international ACM SIGIR conference onResearch and development in information retrieval, pages 238–245, New York,NY, USA, 2002. ACM.

[58] Iván Cantador and Pablo Castells. Multilayered semantic social network mod-eling by ontology-based user profiles clustering: application to collaborativefiltering. In Proceedings of the 15th international conference on ManagingKnowledge in a World of Networks, EKAW’06, pages 334–349, Berlin, Hei-delberg, 2006. Springer-Verlag.

[59] Ivan Cantador, Martin Szomszor, Harith Alani, Miriam Fernandez, and PabloCastells. Enriching ontological user profiles with tagging history for multi-domain recommendations. In 1st International Workshop on Collective Se-mantics: Collective Intelligence & the Semantic Web (CISWeb 2008), June2008.

[60] Jaime Carbonell and Jade Goldstein. The use of mmr, diversity-based rerank-ing for reordering documents and producing summaries. Proceedings of the21st annual international ACM SIGIR conference on Research and develop-ment in information retrieval, pp:335–336, 1998.

[61] Francesca Carmagnola, Federica Cena, Luca Console, Omar Cortassa,Cristina Gena, Anna Goy, Ilaria Torre, Andrea Toso, and Fabiana Vernero.Tag-based user modeling for social multi-device adaptive guides. User Mod-eling and User-Adapted Interaction, 18(5):497–538, November 2008.

[62] Francesca Carmagnola, Federica Cena, and Cristina Gena. User Modeling inthe Social Web, volume 4694, chapter 91, pages 745–752–752. Springer Berlin/ Heidelberg, 2010.

[63] Germano Caronni. Walking the web of trust. In WETICE, pages 153–158.IEEE Computer Society, 2000.

[64] Dustin Cartwright. An iterative method converging to a positive solution ofcertain systems of polynomial equations. Journal of algebric statistics Vol. 2,No. 1, pages 1–13, 2011.

[65] Sara Casare and Jaime Sichman. Towards a functional ontology of reputation,pages 505–511. ACM, 2005.

[66] Lillian N. Cassel and Ursula Wolz. Client side personalization. In DELOSWorkshop: Personalisation and Recommender Systems in Digital Libraries,2001.

[67] James Caverlee, Ling Liu, and Steve Webb. The socialtrust framework fortrusted social information management: Architecture and algorithms. Inf.Sci., 180(1):95–112, 2010.

BIBLIOGRAPHY 255

[68] Ugur Çetintemel, Michael J. Franklin, and C. Lee Giles. Self-adaptive userprofiles for large-scale data delivery. In ICDE, pages 622–633, 2000.

[69] Oscar Celma. Foafing the music: Bridging the semantic gap in music recom-mendation. In Isabel F. Cruz, Stefan Decker, Dean Allemang, Chris Preist,Daniel Schwabe, Peter Mika, Michael Uschold, and Lora Aroyo, editors, In-ternational Semantic Web Conference, volume 4273 of Lecture Notes in Com-puter Science, pages 927–934. Springer, 2006.

[70] Federica Cena, Nima Dokoohaki, and Mihhail Matskin. Forging Trust andPrivacy with User Modeling Frameworks: An Ontological Analysis, pages 43–48. ThinkMind, 2011.

[71] Federica Cena, Silvia Likavec, and Francesco Osborne. Propagating userinterests in ontology-based user model. In Roberto Pirrone and Filippo Sor-bello, editors, AI*IA, volume 6934 of Lecture Notes in Computer Science,pages 299–311. Springer, 2011.

[72] A. Cervini. Network connections: An analysis of social software that turnsonline introductions into offline interactions. Master’s thesis, InteractiveTelecommunications Program, NYU, available at: http://stage. itp. tsoa. nyu.edu/, alc287/thesis/thesis. html, 2003.

[73] Youngchul Cha and Junghoo Cho. Social-network analysis using topic models.In William R. Hersh, Jamie Callan, Yoelle Maarek, and Mark Sanderson,editors, SIGIR, pages 565–574. ACM, 2012.

[74] S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling a newapproach to topic-specific web resource discovery. In Proceedings of WWW-8, 1999.

[75] Elizabeth Chang, Ernesto Damiani, and TharamS. Dillon. Fuzzy approachesto trust management. In Bernd Reusch, editor, Computational Intelligence,Theory and Applications, volume 38, pages 425–436. Springer Berlin Heidel-berg, 2006.

[76] Elizabeth Chang, F.K. Hussain, and Tharam Dillon. Reputation ontology forreputation systems, pages 957–966. Springer, 2005.

[77] Pimwadee Chaovalit and Lina Zhou. Movie review mining: a comparisonbetween supervised and unsupervised classification approaches. In SystemSciences, 2005. Proceedings of the 38th Annual Hawaii International Con-ference on System Sciences (HICSS’05), pages 112c–112c, Washington DC,USA, 2005. Ieee.

[78] Harr Chen and David R Karger. Less is more: probabilistic models for re-trieving fewer relevant documents, pages 429–436. ACM, New York, USA,2006.

256 BIBLIOGRAPHY

[79] Hsinchun Chen and David Zimbra. Ai and opinion mining. IEEE IntelligentSystems, 25(3):74–80, 2010.

[80] Judith a Chevalier and Dina Mayzlin. The effect of word of mouth on sales:Online book reviews. Journal of Marketing Research, 43(3):345–354, 2006.

[81] Paul-Alexandru Alexandru Chirita, Wolfgang Nejdl, and Cristian Zamfir.Preventing shilling attacks in online recommender systems, pages 67–74.ACM, 2005.

[82] Konstantinos Christidis, Gregoris Mentzas, and Dimitris Apostolou. Usinglatent topics to enhance search and recommendation in enterprise social soft-ware. Expert Systems with Applications, 39(10):9297–9307, Aug 2012.

[83] Ciao! http://www.ciao.co.uk, 2012.

[84] Charles L. A. Clarke, Nick Craswell, and Ian Soboroff. Overview of theTREC 2009 web track. In Proc. of TREC-2009, pages 1–9, Virginia, USA,2009. DTIC Document.

[85] Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vech-tomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. Novelty anddiversity in information retrieval evaluation. In Proceedings of the 31st an-nual international ACM SIGIR conference on Research and development ininformation retrieval, SIGIR ’08, pages 659–666, New York, NY, USA, 2008.ACM.

[86] R. Cooley, B. Mobasher, and J. Srivastava. Web mining: Information andpattern discovery on the world wide web. In ICTAI ’97: Proceedings of the9th International Conference on Tools with Artificial Intelligence, page 558,Washington, DC, USA, 1997. IEEE Computer Society.

[87] Yahoo! Webscope dataset. http://research.yahoo.com/Academic_Relations, 2011.

[88] Ali Daud, Juanzi Li, Lizhu Zhou, and Faqir Muhammad. Conference miningvia generalized topic modeling. In Proceedings of the European Conferenceon Machine Learning and Knowledge Discovery in Databases: Part I, ECMLPKDD ’09, pages 244–259, Berlin, Heidelberg, 2009. Springer-Verlag.

[89] John Davies, Rudi Studer, and Paul Warren, editors. Semantic Web Tech-nologies: Trends and Research in Ontology-Based Systems. Wiley, Chichester,UK, 2006.

[90] Gerald F Davis, Mina Yoo, and Wayne E Baker. The small world of theamerican corporate elite, 1982-2001. Strategic Organization, 1(3):301–326,2003.

http://www.ciao.co.uk

http://research.yahoo.com/Academic_Relations

http://research.yahoo.com/Academic_Relations

BIBLIOGRAPHY 257

[91] I. Davis and E. Vitiello Jr. Relationship: A vocabulary for describing rela-tionships between people. http://vocab.org/relationship/, Last accessed2007.

[92] Degemmis, Marco, Lops, Pasquale, Semeraro, and Giovanni. A content-collaborative recommender that exploits wordnet-based user profiles forneighborhood formation. User Modeling and User-Adapted Interaction,17(3):217–255, July 2007.

[93] S. Deligne and F. Bimbot. Language modeling by variable length sequences:theoretical formulation and evaluation of multigrams. Acoustics, Speech, andSignal Processing, IEEE International Conference on, 1:169–172, 1995.

[94] Elena Demidova, Peter Fankhauser, Xuan Zhou, and Wolfgang Nejdl. Divq: Diversification for keyword search over structured databases. In Structure,pages 331–338, New York, USA, 2010. ACM.

[95] DHT. Last modified March-2009, Available:http://en.wikipedia.org/wiki/Distributed_hash_table.

[96] Tamara Dinev, Heng Xu, Jeff H Smith, and Paul Hart. Information privacyand correlates: an empirical attempt to bridge and distinguish privacy-relatedconcepts. European Journal of Information Systems, May 2012.

[97] Lee Ding and Timothy Finin. Weaving the web of belief into the semanticweb. Proceedings, 2004.

[98] Nima Dokoohaki. Modeling and representing trust relations in semantic web-driven social networks: An ontological analysis. Technical report, Royal In-stitute of Technology (KTH), May 2007.

[99] Nima Dokoohaki, Cihan Kaleli, Huseyin Polat, and Mihhail Matskin. Achiev-ing optimal privacy in trust-aware social recommender systems. In Proceed-ings of the Second international conference on Social informatics, SocInfo’10,pages 62–79, Berlin, Heidelberg, 2010. Springer-Verlag.

[100] Nima Dokoohaki and Mihhail Matskin. Structural determination of ontology-driven trust networks in semantic social institutions and ecosystems. In Mo-bile Ubiquitous Computing, Systems, Services and Technologies, 2007. UBI-COMM ’07. International Conference on, pages 263 –268, nov. 2007.

[101] Nima Dokoohaki and Mihhail Matskin. Effective Design of Trust Ontologiesfor Improvement in the Structure of Socio-Semantic Trust Networks. In-ternational Journal On Advances in Intelligent Systems, 1(1942-2679):23–42,2008.

http://vocab.org/relationship/

258 BIBLIOGRAPHY

[102] Nima Dokoohaki and Mihhail Matskin. Personalizing human interactionthrough hybrid ontological profiling: Cultural heritage case study. InRonchetti Marco Editor, editor, ASWC Workshop on Semantic Web Applica-tions and Human Aspects (SWAHA) 2008, pages 133–140. AIT e-press, 2008.

[103] Nima Dokoohaki and Mihhail Matskin. Quest: An adaptive framework foruser profile acquisition from social communities of interest. In Social NetworkAnalysis and Mining, International Conference on Advances in, pages 360–364. IEEE Computer Society, 2010.

[104] Nima Dokoohaki and Mihhail Matskin. Reasoning about weighted semanticuser profiles through collective confidence analysis: A fuzzy evaluation. InVaclav Snášel, Piotr Szczepaniak, Ajith Abraham, and Janusz Kacprzyk, ed-itors, Advances in Intelligent Web Mastering - 2, volume 67 of Advances inIntelligent and Soft Computing, pages 71–81. Springer, Berlin / Heidelberg,2010.

[105] Nima Dokoohaki and Mihhail Matskin. Mining divergent opinion trust net-works through latent dirichlet allocation. Advances in Social Network Analysisand Mining (ASONAM 2012), 2012.

[106] Peter Dolog and Wolfgang Nejdl. Challenges and benefits of the semanticweb for user modelling. In AH2003 Workshop at WWW2003, 2003.

[107] Peter Dolog and Wolfgang Nejdl. Semantic web technologies for the adaptiveweb. The Adaptive Web, pages 697–719, 2007.

[108] Stephen Downes. Semantic networks and social networks. The LearningOrganization Journal, 12(5):411–417, May 2005.

[109] Thomas DuBois, Jennifer Golbeck, and Aravind Srinivasan. Predicting trustand distrust in social networks. In SocialCom/PASSAT, pages 418–424. IEEE,2011.

[110] Catherine Dwyer, Starr Roxanne Hiltz, and Katia Passerini. Trust and pri-vacy concern within social networking sites: A comparison of facebook andmyspace. In John A. Hoxmeier and Stephen Hayne, editors, AMCIS, page339. Association for Information Systems, 2007.

[111] Magdalini Eirinaki and Michalis Vazirgiannis. Web mining for web personal-ization. ACM Trans. Internet Technol., 3(1):1–27, 2003.

[112] Epinions. http://www.epinions.com, Last accessed 2011.

[113] Epinions. http://www.epinions.com/, Last accessed 2012.

http://www.epinions.com

http://www.epinions.com/

BIBLIOGRAPHY 259

[114] M. Ester, M. Groß, and H.P. Kriegel. Focused web crawling: A genericframework for specifying the user interest and for adaptive crawling strategies.In Proceedings of 27th International Conference on Very Large Data Bases,Roma, Italy, 2001.

[115] Facebook. http://www.facebook.com, 2012.

[116] Rino Falcone, Giovanni Pezzulo, and Cristiano Castelfranchi. A fuzzy ap-proach to a belief-based trust computation. In Rino Falcone, K. SuzanneBarber, Larry Korba, and Munindar P. Singh, editors, Trust, Reputation,and Security, volume 2631 of Lecture Notes in Computer Science, pages 73–86. Springer, 2002.

[117] Soude Fazeli, Hendrik Drachsler, Francis Brouns, and Peter Sloep. A trust-based social recommender for teachers. Proceedings, pages 49–60, 2012.

[118] Soude Fazeli, Alireza Zarghami, Nima Dokoohaki, and Mihhail Matskin.Elevating prediction accuracy in trust-aware collaborative filtering recom-menders through t-index metric and toptrustee lists. Journal of EmergingTechnologies in Web Intelligence, 2(4):300–309, November 2010.

[119] Soude Fazeli, Alireza Zarghami, Nima Dokoohaki, and Mihhail Matskin.Mechanizing social trust-aware recommenders with t-index augmented trust-worthiness. In Miguel Soriano Editor Sokratis Katsikas, Javier Lopez, editor,TrustBus ’10 Proceedings of the 7th international conference on Trust, privacyand security in digital busines, volume 6264, pages 202–213–213. SpringerBerlin / Heidelberg, 2010.

[120] Carsten Felden and Markus Linden. Ontology-based user profiling. In Pro-ceedings of the 10th international conference on Business information sys-tems, BIS’07, pages 314–327, Berlin, Heidelberg, 2007. Springer-Verlag.

[121] Christiane Fellbaum. Wordnet. In Roberto Poli, Michael Healy, and AchillesKameas, editors, Theory and Applications of Ontology: Computer Applica-tions, pages 231–243. Springer Netherlands, 2010.

[122] Bob Ferris and Kurt Jacobson. The recommendation ontology specification v0.3. http://smiy.sourceforge.net/rec/spec/recommendationontology.html, August 2010.

[123] Caxton C. Foster, Anatol Rapoport, and Carol J. Orwant. A study of a largesociogram ii. elimination of free parameters. Behavioral Science, 8(1):56–65,1963.

[124] Michael J. Franklin. Challenges in ubiquitous data management. In ReinhardWilhelm, editor, Informatics, volume 2000 of Lecture Notes in ComputerScience, pages 24–33. Springer, 2001.

http://www.facebook.com

http://smiy.sourceforge.net/rec/spec/recommendationontology.html

http://smiy.sourceforge.net/rec/spec/recommendationontology.html

260 BIBLIOGRAPHY

[125] Chris Fröschl. User modeling and user profiling in adaptive e-learning systems.Technical report, Graz, Austria, 2005.

[126] M. Gamon, A. Aue, S. Corston-Oliver, and E. Ringger. Pulse: Mining cus-tomer opinions from free text. Advances in Intelligent Data Analysis VI,3646(3646):121–132, 2005.

[127] Susan Gauch, Jeason Chaffee, and Alaxander Pretschner. Ontology-basedpersonalized search and browsing. Web Intelligence and Agent Systems, 1(3-4):219–234, 2003.

[128] Susan Gauch, Mirco Speretta, Aravind Chandramouli, and Alessandro Mi-carelli. User profiles for personalized information access. In The AdaptiveWeb: Methods and Strategies of Web Personalization, Lecture Notes in Com-puter Science, chapter 2, pages 54–89. Springer, 2007.

[129] Susan Gauch, Mirco Speretta, and Alexander Pretschner. Ontology-baseduser profiles for personalized search. In Raj Sharman, Rajiv Kishore, and RamRamesh, editors, Ontologies, volume 14 of Integrated Series in InformationSystems, pages 665–694. Springer US, 2007.

[130] David Gefen, Izak Benbasat, and Paul A. Pavlou. A research agenda fortrust in online environments. Journal of Management Information Systems,24(4):275–286, 2008.

[131] Simon Gerber, Michael Fry, Judy Kay, Bob Kummerfeld, Glen Pink, andRainer Wasinger. Personisj: mobile, client-side user modelling. User ModelingAdaptation and Personalization, 6075:111–122, 2010.

[132] Getty. Art & architecture thesaurus online (AAT). http://www.getty.edu/research/conducting_research/vocabularies/aat, 2000.

[133] Getty. The getty thesaurus of geographic names (TGN). http://www.getty.edu/research/conducting_research/vocabularies/tgn, 2000. Last ac-cess on Dec 2008.

[134] Getty. Union list of artist names (ULAN). http://www.getty.edu/research/conducting_research/vocabularies/ulan, 2000.

[135] Anindya Ghose and Panagiotis G. Ipeirotis. Designing novel review rankingsystems: predicting the usefulness and impact of reviews. In Proceedings ofthe ninth international conference on Electronic commerce, ICEC ’07, pages303–310, New York, NY, USA, 2007. ACM.

[136] Riddhiman Ghosh and Mohamed Dekhil. Discovering user profiles, pages1233–1234. ACM, 2009.

http://www.getty.edu/research/conducting_research/vocabularies/aat

http://www.getty.edu/research/conducting_research/vocabularies/aat

http://www.getty.edu/research/conducting_research/vocabularies/tgn

http://www.getty.edu/research/conducting_research/vocabularies/tgn

http://www.getty.edu/research/conducting_research/vocabularies/ulan

http://www.getty.edu/research/conducting_research/vocabularies/ulan

BIBLIOGRAPHY 261

[137] Kevin Gimpel, Nathan Schneider, Brendan O’Connor, Dipanjan Das, DanielMills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan,and Noah A. Smith. Part-of-speech tagging for twitter: annotation, features,and experiments. In Proceedings of the 49th Annual Meeting of the Associationfor Computational Linguistics: Human Language Technologies: short papers- Volume 2, HLT ’11, pages 42–47, Stroudsburg, PA, USA, 2011. Associationfor Computational Linguistics.

[138] Daniela Godoy and Analía Amandi. Enabling topic-level trust for collabo-rative information sharing. Personal Ubiquitous Comput., 16(8):1065–1077,December 2012.

[139] Jennifer Golbeck. Computing and applying trust in web-based social networks.PhD thesis, University of Maryland, 2005.

[140] Jennifer Golbeck. Filmtrust: Movie recommendations from semantic web-based social networks. In ISWC2005 Posters & Demostrations, pages PID–72,2005. printed proceedings only.

[141] Jennifer Golbeck. Combining provenance with trust in social networks forsemantic web content filtering. In Luc Moreau and Ian T. Foster, editors,International Provenance and Annotation Workshop, volume 4145 of LectureNotes in Computer Science, pages 101–108. Springer, 2006.

[142] Jennifer Golbeck. Generating predictive movie recommendations from trustin social networks. In n Proceedings of the fourth international conference ontrust management, 2006.

[143] Jennifer Golbeck. Trust on the world wide web: A survey. Foundations andTrends in Web Science, 1(2):131–197, January 2006.

[144] Jennifer Golbeck. Trust and nuanced profile similarity in online social net-works. TWEB, 3(4), 2009.

[145] Jennifer Golbeck and James Hendler. Inferring binary trust relationshipsin web-based social networks. ACM Trans. Internet Technol., 6(4):497–529,November 2006.

[146] Jennifer Golbeck and James A. Hendler. Accuracy of metrics for inferringtrust and reputation in semantic web-based social networks. In Enrico Motta,Nigel Shadbolt, Arthur Stutt, and Nicholas Gibbins, editors, EKAW, volume3257 of Lecture Notes in Computer Science, pages 116–131. Springer, 2004.

[147] Jennifer Golbeck, Bijan Parsia, and James A. Hendler. Trust networks on thesemantic web. In Matthias Klusch, Sascha Ossowski, Andrea Omicini, andHeimo Laamanen, editors, CIA, volume 2782 of Lecture Notes in ComputerScience, pages 238–249. Springer, 2003.

262 BIBLIOGRAPHY

[148] David Goldberg, David A. Nichols, Brian M. Oki, and Douglas B. Terry.Using collaborative filtering to weave an information tapestry. Commun.ACM, 35(12):61–70, 1992.

[149] Sreenivas Gollapudi and Aneesh Sharma. An axiomatic approach for resultdiversification. In Proceedings of the 18th International Conference on WorldWide Web (WWW ’09), page 381, New York, USA, 2009. ACM Press.

[150] Marco Gori and Augusto Pucci. Itemrank: A random-walk based scoringalgorithm for recommender engines. In Manuela M. Veloso, editor, IJCAI,pages 2766–2771, 2007.

[151] Tyrone Grandison. Trust management for internet applications. PhD thesis,Imperial College, 2003.

[152] Tyrone Grandison and Morris Sloman. A survey of trust in internet applica-tions. IEEE Communications Surveys and Tutorials, 3(4):2–16, 2000.

[153] Peijun Guo, Hideo Tanaka, and Masahiro Inuiguchi. Self-organizing fuzzyaggregation models to rank the objects with multiple attributes. IEEE Trans-actions on Systems, Man, and Cybernetics, Part A, 30(5):573–580, 2000.

[154] Peng Han, Bo Xie, Fan Yang, and Ruimin Shen. A scalable p2p recom-mender system based on distributed collaborative filtering. Expert Syst. Appl.,27(2):203–210, 2004.

[155] F.M. Harper, D. Moy, and J.A. Konstan. Facts or friends?: distinguishinginformational and conversational questions in social q&a sites. In Proceedingsof the 27th international conference on Human factors in computing systems,pages 759–768, New York, USA, 2009. ACM.

[156] J. Hartigan. Clustering Algorithms. John Wiley and Sons, New York, 1975.

[157] T. Heath, J. Domingue, and P. Shabajee. User interaction and uptake chal-lenges to successfully deploying semantic web technologies. In Third Interna-tional Semantic Web User Interaction Workshop (SWUI 2006), Athens, GA,USA, 2006.

[158] Dominik Heckmann. Gumo-the general user modeling onotlogy. http://www.ubisworld.org/ubisworld/documents/gumo/2.0/gumo.owl.

[159] Dominik Heckmann, Tim Schwartz, Boris Brandherm, and Alexander Kröner.Decentralized user modeling with UserML and GUMO. In Peter Dolog andJulita Vassileva, editors, Proceedings of the Workshop on Decentralized, AgentBased and Social Approaches to User Modeling, DASUM-05, at UM2005,pages 61–66, Edinburgh, Scotland, July 2005.

http://www.ubisworld.org/ubisworld/documents/gumo/2.0/gumo.owl

http://www.ubisworld.org/ubisworld/documents/gumo/2.0/gumo.owl

BIBLIOGRAPHY 263

[160] Dominik Heckmann, Tim Schwartz, Boris Brandherm, Michael Schmitz, andMargeritta von Wilamowitz-Moellendorff. Gumo - The General User ModelOntology. In User Modeling 2005, volume 3538, pages 428–432. SpringerBerlin / Heidelberg, 2005.

[161] Dominik Heckmann, Eric Schwarzkopf, Junichiro Mori, Dietmar Dengler, andAlexander Krner. The user model and context ontology gumo revisited forfuture web 2.0 extensions. In Paolo Bouquet, Jrme Euzenat, Chiara Ghidini,Deborah L. McGuinness, Luciano Serafini, Pavel Shvaiko, and Holger Wache,editors, CO:RR, volume 298 of CEUR Workshop Proceedings. CEUR-WS.org,2007.

[162] Sandy Heleou, Hendrik Drachsler, and Dennis Gillet. Evaluation of recom-mender systems for technology-enhanced learning:challenges and possible so-lutions. 1st workshop on Contextaware Recommender Systems for Learning,pages 3–5, 2009.

[163] Jonathan L. Herlocker, Joseph A. Konstan, Al Borchers, and John Riedl. Analgorithmic framework for performing collaborative filtering. In SIGIR, pages230–237. ACM, 1999.

[164] Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T.Riedl. Evaluating collaborative filtering recommender systems. ACM Trans-actions on Information Systems (TOIS), 22:5–53, January 2004.

[165] J. E. Hirsch. An index to quantify an individual’s scientific research output.PNAS, 102(46):16569–16572, November 2005.

[166] Thomas Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages50–57. ACM, 1999.

[167] J. Hradesky and B. Acrement. Elements for building trust, 1994.

[168] M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedingsof the tenth ACM SIGKDD international conference on Knowledge discoveryand data mining, pages 168–177, New York, USA, 2004. ACM.

[169] Jingwei Huang and Mark S. Fox. An ontology of trust: formal semantics andtransitivity. In Mark S. Fox and Bruce Spencer, editors, ICEC, volume 156of ACM International Conference Proceeding Series, pages 259–270. ACM,2006.

[170] Zhisheng Huang, Frank van Harmelen, and Annette ten Teije. Reasoningwith inconsistent ontologies. In Proceedings of the Nineteenth InternationalJoint Conference on Artificial Intelligence (IJCAI’05), page xxx, Edinburgh,Scotland, August 2005.

264 BIBLIOGRAPHY

[171] Norman P. Hummon and Patrick Doreian. Computational methods for socialnetwork analysis. Social Networks, 12(4):273 – 288, 1990.

[172] Ivan Ivanov, Peter Vajda, Jong Seok Lee, and Touradj Ebrahimi. In Tags WeTrust: Trust modeling in social tagging of multimedia content. IEEE SignalProcessing Magazine, Special Issue on Signal and Information Processing forSocial Learning and Networking, 29(2):98–107, 2012.

[173] Mohsen Jamali and Martin Ester. Trustwalker: a random walk model forcombining trust-based and item-based recommendation. In John F. ElderIV, Françoise Fogelman-Soulié, Peter A. Flach, and Mohammed Zaki, editors,KDD, pages 397–406. ACM, 2009.

[174] Márk Jelasity and Ozalp Babaoglu. T-man: Gossip-based overlay topologymanagement. In The Fourth International Workshop on Engineering Self-Organizing Applications (ESOA’06), Hakodate, Japan, May 2006. Springer.

[175] Márk Jelasity, Alberto Montresor, and Ozalp Babaoglu. Gossip-based aggre-gation in large dynamic networks. ACM Trans. Comput. Syst., 23(3):219–252,2005.

[176] David Jensen and Jennifer Neville. Data mining in social networks. In In Na-tional Academy of Sciences Symposium on Dynamic Social Network Modelingand Analysis, page 2002, 2002.

[177] George H. John and Pat Langley. Estimating continuous distributions inbayesian classifiers. In Philippe Besnard and Steve Hanks, editors, UAI,pages 338–345. Morgan Kaufmann, 1995.

[178] Audun Josang. Fission of opinions in subjective logic. In Information Fusion,2009. FUSION ’09. 12th International Conference on, pages 1911 –1918, july2009.

[179] Audun Jøsang, Ross Hayward, and Simon Pope. Trust network analysis withsubjective logic, pages 85–94. Australian Computer Society, Inc., 2006.

[180] Audun Jøsang and Svein Knapskog. A metric for trusted systems. In Pro-ceedings of the 21st National Security Conference, NSA. CiteSeerX - Online,1998.

[181] L. Kagal, T. Finin, M. Paolucci, Navcen Srinivasan, K. Sycara, andG. Denker. Authorization and privacy for semantic web services. IntelligentSystems, IEEE, 19(4):50 – 56, jul-aug 2004.

[182] Cihan Kaleli and Huseyin Polat. Providing private recommendations usingnaïve bayesian classifier. In Katarzyna Wegrzyn-Wolska and Piotr S. Szczepa-niak, editors, AWIC, volume 43 of Advances in Soft Computing, pages 168–173. Springer, 2007.

BIBLIOGRAPHY 265

[183] Cihan Kaleli and Huseyin Polat. P2p collaborative filtering with privacy.Turkish Journal of Electric Electrical Engineering and Computer Sciences,8(1):101–116, 2010.

[184] Cihan Kaleli and Huseyin Polat. Providing private recommendations on per-sonal social networks. In Vaclav Snášel, Piotr S. Szczepaniak, Ajith Abra-ham, and Janusz Kacprzyk, editors, Advances in Intelligent Web Mastering -2, volume 67 of Advances in Intelligent and Soft Computing, pages 117–125.Springer Berlin Heidelberg, 2010.

[185] Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Pi-atko, Ruth Silverman, and Angela Y. Wu. An efficient k-means clusteringalgorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach.Intell., 24(7):881–892, 2002.

[186] Philipp Kärger and Wolf Siberski. Guarding a Walled Garden â SemanticPrivacy Preferences for the Social Web. In Lora Aroyo, Grigoris Antoniou,Eero Hyvönen, Annette ten Teije, Heiner Stuckenschmidt, Liliana Cabral, andTania Tudorache, editors, The Semantic Web: Research and Applications,volume 6089 of Lecture Notes in Computer Science, chapter 11, pages 151–165. Springer Berlin / Heidelberg, Berlin, Heidelberg, 2010.

[187] Said Kashoob, James Caverlee, and Krishna Kamath. Community-based rank-ing of the social web, page 141. ACM Press, 2010.

[188] J Kay, B Kummerfeld, and P Lauder. Managing private user models andshared personas. Workshop on user modelling for ubiquitous computing, 9thinternational conference on user modeling, 2003.

[189] Judy Kay. Scrutable adaptation: Because we can and must. In Vincent P.Wade, Helen Ashman, and Barry Smyth, editors, AH, volume 4018 of LectureNotes in Computer Science, pages 11–19. Springer, 2006.

[190] A. Kim, L.J. Joffman, , and C.D. Martin. Position paper building privacyinto the semantic web: An ontology needed now. In Semantic Web Workshop2002, Hawaii USA, Berlin, Heidelberg, 2002.

[191] H.D. Kim and C.X. Zhai. Generating comparative summaries of contradictoryopinions in text, pages 385–394. ACM, New York, USA, 2009.

[192] Soo-Min Kim, Patrick Pantel, Tim Chklovski, and Marco Pennacchiotti. Au-tomatically assessing review helpfulness. In Proceedings of the 2006 Confer-ence on Empirical Methods in Natural Language Processing, pages 423–430,Morristown, NJ, USA, 2006. Association for Computational Linguistics.

[193] Jon M. Kleinberg. Challenges in mining social network data: processes, pri-vacy, and paradoxes, pages 4–5. ACM, 2007.

266 BIBLIOGRAPHY

[194] George J. Klir and Bo Yuan. Fuzzy Sets, Fuzzy Logic, And Fuzzy Systems-Selected papers by Lotfi A. Zadeh. World Scientific, 1996.

[195] Tomas Knap and Irena Mlynkova. Towards topic-based trust in social net-works. In Zhiwen Yu, Ramiro Liscano, Guanling Chen, Daqing Zhang, andXingshe Zhou, editors, UIC, volume 6406 of Lecture Notes in Computer Sci-ence, pages 635–649. Springer, 2010.

[196] Bart P. Knijnenburg, Martijn C. Willemsen, Zeno Gantner, Hakan Soncu,and Chris Newell. Explaining the user experience of recommender systems.User Model. User-Adapt. Interact., 22(4-5):441–504, 2012.

[197] Moo Nam Ko, G.P. Cheek, M. Shehab, and R. Sandhu. Social-networksconnect services. Computer, 43(8):37 –43, aug. 2010.

[198] Alfred Kobsa. User modeling: Recent work, prospects and hazards. InT. Kuhme M. Schneider-Hufschmidt and eds U. Malinowski, editors, AdaptiveUser Interfaces: Principles and Practice. Springer, 1993.

[199] Alfred Kobsa and Jörg Schreck. Privacy through pseudonymity in user-adaptive systems. ACM Trans. Internet Technol., 3(2):149–183, May 2003.

[200] Nora Koch. Software Engineering for Adaptive Hypermedia Systems: Ref-erence Model, Modeling Techniques and Development Process. PhD thesis,Ludwig- Maximilians-University Munich, 2000.

[201] Reto Kohlas and Ueli M. Maurer. Confidence valuation in a public-key in-frastructure based on uncertain evidence. In Hideki Imai and Yuliang Zheng,editors, Public Key Cryptography, volume 1751 of Lecture Notes in ComputerScience, pages 93–112. Springer, 2000.

[202] Konfidi. http://www.konfidi.org, 2007.

[203] Ralf Krestel and Ling Chen. The Art of Tagging: Measuring the Quality ofTags, volume 5367, chapter 18, pages 257–271. Springer Berlin Heidelberg,2008.

[204] Ralf Krestel and Nima Dokoohaki. Diversifying product review rankings:Getting the full picture. In 2011 IEEE/WIC/ACM International Confer-ences on Web Intelligence and Intelligent Agent Technology, pages 138–145,Washington, DC, USA, Aug 2011. IEEE.

[205] Ralf Krestel and Peter Fankhauser. Tag recommendation using probabilistictopic models. In Folke Eisterlehner, Andreas Hotho, and Robert JÃ¤schke,editors, ECML PKDD Discovery Challenge 2009 (DC09), volume 497, pages131–141, Bled, Slovenia, September 2009. CEUR Workshop Proceedings.

http://www.konfidi.org

BIBLIOGRAPHY 267

[206] John Krumm, Nigel Davies, and Chandra Narayanaswami. User-generatedcontent. Pervasive Computing, IEEE, 7(4):10–11, 2008.

[207] Tsvi Kuflik, Adriano Albertini, Paolo Busetta, Cesare Rocchi, Oliviero Stock,and Massimo Zancanaro. An agent-based architecture for museum visitors’guide systems. In Martin Hitz, Marianna Sigala, and Jamie Murphy, editors,ENTER, page 57. Springer, 2006.

[208] Jerome Kunegis, Alan Said, and Winfried Umbrath. The universal recom-mender, 2009. cite arxiv:0909.3472 Comment: 17 pages; typo and referencesfixed.

[209] Ugur Kuter and Jennifer Golbeck. Sunny: A new algorithm for trust inferencein social networks using probabilistic confidence models. In AAAI, pages1377–1382. AAAI Press, 2007.

[210] Knowledge Systems AI Laboratory. Inference web. http://iw.stanford.edu/, 2007.

[211] Shyong K. Lam, Dan Frankowski, and John Riedl. Do you trust your recom-mendations? an exploration of security and privacy issues in recommendersystems. In Günter Müller, editor, ETRICS, volume 3995 of Lecture Notesin Computer Science, pages 14–29. Springer, 2006.

[212] Shyong K. Lam and John Riedl. Shilling recommender systems for fun andprofit. In Stuart I. Feldman, Mike Uretsky, Marc Najork, and Craig E. Wills,editors, WWW, pages 393–402. ACM, 2004.

[213] T Landauer, D McNamara, S Dennis, and W Kintsch. Handbook of LatentSemantic Analysis, chapter Semantics, pages 121–141. Erlbaum, New York,NY, USA, 2007.

[214] Neal Lathia, Stephen Hailes, and Licia Capra. Trust-based collaborativefiltering. Joint iTrust and PST Conferences on Privacy Trust Managementand Security IFIPTM Trondheim Norway, 263:119–134, 2008.

[215] David Lazer, Alex Pentland, Lada Adamic, Sinan Aral, Albert-LászlóBarabási, Devon Brewer, Nicholas Christakis, Noshir Contractor, JamesFowler, Myron Gutmann, and et al. Computational social science. SocialScience Magazine, 323(February):721–723, 2009.

[216] Raph Levien. Attack-resistant trust metrics. In Jennifer Golbeck, editor,Computing with Social Trust, Human-Computer Interaction Series, pages121–132. Springer, 2009.

[217] C. Li. Profiles: The real value of social networks. Forrester. July, 15, 2004.

http://iw.stanford.edu/

http://iw.stanford.edu/

268 BIBLIOGRAPHY

[218] Y. Li, B. Xu, J. Lu, and D. Kang. Reasoning with fuzzy ontologies. Contextsand Ontologies: Theory, Practice and Applications, page 71, 2006.

[219] Yuefeng Li and Ning Zhong. Mining ontology for automatically acquiringweb user information needs. IEEE Trans. Knowl. Data Eng., 18(4):554–568,2006.

[220] LinkedIn. http://www.linkedin.com/, Last accessed 2012.

[221] Christina Lioma, Birger Larsen, Hinrich Schuetze, and Peter Ingwersen. Asubjective logic formalisation of the principle of polyrepresentation for infor-mation needs, pages 125–134. ACM, 2010.

[222] Bing Liu. Sentiment Analysis and Subjectivity, chapter 28, pages 1–38. CRCPress, Taylor and Francis Group, Boca Raton, FL, USA, 2010.

[223] Hugo Liu, Pattie Maes, and Glorianna Davenport. Unraveling the taste fabricof social networks. Int. J. Semantic Web Inf. Syst., 2(1):42–71, 2006.

[224] Kaipeng Liu and Binxing Fang. Integrating social relations into personalizedtag recommendation. In Intelligent Human-Machine Systems and Cybernetics(IHMSC), 2010 2nd International Conference on, volume 1, pages 292 –295,aug. 2010.

[225] Yang Liu, Xiangji Huang, Aijun An, and Xiaohui Yu. Modeling and predictingthe helpfulness of online reviews. In 2008 Eighth IEEE International, pages443–452, Washington DC, USA, 2008. IEEE Computer Society.

[226] LiveJournal. http://www.livejournal.com/, 2010.

[227] Junhai Luo, Xue Liu, and Mingyu Fan. A trust model based on fuzzy rec-ommendation for mobile ad-hoc networks. Computer Networks, 53(14):2396–2407, 2009.

[228] Saranya Maneeroj and Atsuhiro Takasu. Hybrid Recommender System UsingLatent Features, pages 661–666. IEEE, May 2009.

[229] A Marin and B Wellman. Handbook of Social Network Analysis, pages 1–23.Sage, 2010.

[230] Humberto Mariotti. Autopoiesis, culture and society. Oikos (Itália) www:oikos. org/maten. htm, 1999.

[231] Benjamin Marlin. Collaborative Filtering: A Machine Learning Perspective.Master’s thesis, University of Toronto, 2004.

[232] Maria J. MartÃn-Bautista, Donald H. Kraft, MarÃa Amparo Vila Miranda,Jianhua Chen, and J. Cruz. User profiles and fuzzy logic for web retrievalissues. Soft Comput., 6(5):365–372, 2002.

http://www.linkedin.com/

http://www.livejournal.com/

BIBLIOGRAPHY 269

[233] A. Marwick. Livejournal users: Passionate, prolific and private. URLhttp://www. livejournalinc. com/LJ_Research_Report. pdf. Accessed Jan-uary, 18(2010):7, 2008.

[234] Paolo Massa and Paolo Avesani. Trust-aware collaborative filtering for rec-ommender systems. Lecture Notes in Computer Science, 3290:492–508, 2004.

[235] Paolo Massa and Paolo Avesani. Trust-aware collaborative filtering for rec-ommender systems. In In Proc. of Federated Int. Conference On The Moveto Meaningful Internet: CoopIS, DOA, ODBASE, pages 492–508, 2004.

[236] Paolo Massa and Paolo Avesani. Trust-aware recommender systems. In Pro-ceedings of the 2007 ACM conference on Recommender systems, RecSys ’07,pages 17–24, New York, NY, USA, 2007. ACM.

[237] Paolo Massa and Paolo Avesani. Trust metrics in recommender systems.In Jennifer Golbeck, editor, Computing with Social Trust, Human-ComputerInteraction Series, pages 259–285. Springer London, 2009.

[238] J Mayer, A Narayanan, and S Stamm. Do not track: A universal third-partyweb tracking opt out. IETF Request for Comments, pages 1–12, 2011.

[239] Andrew McCallum, Xuerui Wang, and Andrés Corrada-Emmanuel. Topicand role discovery in social networks with experiments on enron and academicemail. Journal of Artificial Intelligene Research (JAIR), 30:249–272, 2007.

[240] Andrew Kachites McCallum. Mallet: A machine learning for language toolkit,2002. http://mallet.cs.umass.edu.

[241] Deborah McGuinness, Paulo Pinheiro da Silva, and Lee Ding. Proof markuplanguage (pml) primer. http://inference-web.org/2007/primer/, 2007.

[242] Deborah L McGuinness and Frank Van Harmelen. OWL Web Ontology Lan-guage overview. W3C Recommendation, 10:1–19, 2004.

[243] Bhaskar Mehta and Thomas Hofmann. Cross system personalization andcollaborative filtering by learning manifold alignments. In Proceedings of the29th annual German conference on Artificial intelligence, KI’06, pages 244–259, Berlin, Heidelberg, 2007. Springer-Verlag.

[244] Bhaskar Mehta, Claudia Niederée, Avare Stewart, Marco Degemmis, PasqualeLops, and Giovanni Semeraro. Ontologically-enriched unified user modelingfor cross-system personalization. In Liliana Ardissono, Paul Brna, and An-tonija Mitrovic, editors, User Modeling, volume 3538 of Lecture Notes inComputer Science, pages 119–123. Springer, 2005.

http://inference-web.org/2007/primer/

270 BIBLIOGRAPHY

[245] Matthew Michelson and Sofus A. Macskassy. Discovering users’ topics ofinterest on twitter: a first look. In Proceedings of the fourth workshop onAnalytics for noisy unstructured text data, AND ’10, pages 73–80, New York,NY, USA, 2010. ACM.

[246] Stuart E. Middleton, David C. De Roure, and Nigel R. Shadbolt. Capturingknowledge of user preferences: ontologies in recommender systems. In K-CAP’01: Proceedings of the 1st international conference on Knowledge capture,pages 100–107, New York, NY, USA, 2001. ACM Press.

[247] Stuart E. Middleton, Nigel Shadbolt, and David De Roure. Ontological userprofiling in recommender systems. ACM Trans. Inf. Syst., 22(1):54–88, 2004.

[248] Stuart E. Middleton, Nigel R. Shadbolt, and David C. De Roure. Ontologicaluser profiling in recommender systems. ACM Trans. Inf. Syst., 22(1):54–88,2004.

[249] Peter Mika. Flink: Semantic web technology for the extraction and analysisof social networks. Journal of Web Semantics, 3:211–223, 2005.

[250] Stanley Milgram. The small world problem. Psychology Today, 61:60–67,1967.

[251] Bradley N. Miller, Joseph A. Konstan, and John Riedl. Pocketlens: Towarda personal recommender system. ACM Trans. Inf. Syst., 22(3):437–476, July2004.

[252] B. Mobasher. A web personalization engine based on user transaction clus-tering. In Proceedings of the 9th Workshop on Information Technologies andSystems, volume 18, 1999.

[253] Bamshad Mobasher. Data mining for web personalization. In The AdaptiveWeb: Methods and Strategies of Web Personalization, chapter 3, pages 90–135. Springer, 2007.

[254] Bamshad Mobasher, Robin D. Burke, Runa Bhaumik, and Chad Williams.Toward trustworthy recommender systems: An analysis of attack models andalgorithm robustness. ACM Trans. Internet Techn., 7(4), 2007.

[255] Mikolaj Morzy. New algorithms for mining the reputation of participants ofonline auctions. Algorithmica, 52(1):95–112, 2008.

[256] Lik Mui, Mjdeh Mohtashemi, and Ari Halberstadt. A computational modelof trust and reputation. In 35th Hawaii International Conference on SystemScience (HICSS), 2002.

[257] Myspace. http://www.myspace.com, 2012.

http://www.myspace.com

BIBLIOGRAPHY 271

[258] Miklos Nagy, Maria Vargas-Vera, and Enrico Motta. Introducing fuzzy trustfor managing belief conflict over semantic web data. In Fernando Bobillo,Paulo Cesar G. da Costa, Claudia d’Amato, Nicola Fanizzi, Kathryn B.Laskey, Kenneth J. Laskey, Thomas Lukasiewicz, Trevor P. Martin, MatthiasNickles, Michael Pool, and Pavel Smrz, editors, URSW, volume 423 of CEURWorkshop Proceedings. CEUR-WS.org, 2008.

[259] O Nasraoui, H Frigui, R Krishnapuram, and A Joshi. Extracting web userprofiles using relational competitive fuzzy clustering. International Journalon Artificial Intelligence Tools, 9(4):509–526, 2000.

[260] Samia Nefti, Farid Meziane, and Mohd Khairudin Kasiran. A fuzzy trustmodel for e-commerce. In CEC, pages 401–404. IEEE Computer Society,2005.

[261] M E J Newman. The structure of scientific collaboration networks. Proceed-ings of the National Academy of Sciences of the United States of America,98(2):7, 2000.

[262] Claudia Niederèe, Avarè Stewart, Bhaskar Mehta, and Matthias Hemmje. AMulti-Dimensional, Unified User Model for Cross-System Personalization. InProcceedings of the AVI 2004 Workshop On Environments For PersonalizedInformation Access, 2004.

[263] NIST. Tweets2011: Trec 2011 microblog dataset. http://trec.nist.gov/data/tweets/, 2011.

[264] Rishab Nithyanand and Karthik Raman. Fuzzy privacy preserving peer-to-peer reputation management. IACR Cryptology ePrint Archive, 2009:442,2009.

[265] Blaz Novak. A survey of focused web crawling algorithms. Conference onData Mining and Warehouses (SiKDD 2004), 2004.

[266] Brendan O’Connor, Michel Krieger, and David Ahn. Tweetmotif: Ex-ploratory search and topic summarization for twitter. In William W. Cohenand Samuel Gosling, editors, ICWSM. The AAAI Press, 2010.

[267] M. O’Connor and J. Herlocker. Clustering items for collaborative filtering.In the Proceedings of SIGIR-2001. Citeseer, 2001.

[268] Martin J. O’Connor, Samson W. Tu, Csongor Nyulas, Amar K. Das, andMark A. Musen. Querying the semantic web with swrl. In Adrian Paschke andYevgen Biletskiy, editors, RuleML, volume 4824 of Lecture Notes in ComputerScience, pages 155–159. Springer, 2007.

http://trec.nist.gov/data/tweets/

http://trec.nist.gov/data/tweets/

272 BIBLIOGRAPHY

[269] John O’Donovan. Capturing trust in social web applications. In JenniferGolbeck, editor, Computing with Social Trust, Human-Computer InteractionSeries, pages 213–257. Springer, 2009.

[270] John O’Donovan and Barry Smyth. Trust in recommender systems. In Pro-ceedings of the 10th international conference on Intelligent user interfaces,IUI ’05, pages 167–174, New York, NY, USA, 2005. ACM.

[271] John O’Donovan and Barry Smyth. Is trust robust?: an analysis of trust-based recommendation. In CÃ©cile Paris and Candace L. Sidner, editors,IUI, pages 101–108. ACM, 2006.

[272] Web of Trust Vocabulary. http://xmlns.com/wot/0.1/, Last accessed 2007.

[273] Kieron O’Hara and Wendy Hall. Trust on the web: Some web science researchchallenges. UoC Papers: E-Journal on the Knowledge Society, October 2008.

[274] Michael P. O’Mahony, Pádraig Cunningham, and Barry Smyth. An assess-ment of machine learning techniques for review recommendation. In Coyle LDunnion DEditor Freyne J, editor, Proceeding of the 20th Irish Conference onArtificial Intelligence and Cognitive Science, pages 241–250, Heidelberg/Ber-lin, 2009. Springer-Verlag.

[275] Michael P O’Mahony and Barry Smyth. Learning to recommend helpful hotelreviews. In Proceedings of the third ACM conference on Recommender systems(RecSys ’09), page 305, New York, USA, 2009. ACM Press.

[276] Orkut. http://www.orkut.com/, Last accessed 2007.

[277] Róbert Ormandi, István Hegedüs, and Márk Jelasity. Overlay managementfor fully distributed user-based collaborative filtering. In Pasqua D’Ambra,Mario Rosario Guarracino, and Domenico Talia, editors, Euro-Par (1), vol-ume 6271 of Lecture Notes in Computer Science, pages 446–457. Springer,2010.

[278] Mourad Ouziri. Accessing the distributed learner profile in the semantic web.In Ignac Lovrek, Robert J. Howlett, and Lakhmi C. Jain, editors, KES (1),volume 5177 of Lecture Notes in Computer Science, pages 464–472. Springer,2008.

[279] Bo Pang and Lillian Lee. Opinion mining and sentiment analysis. Foundationsand Trends in Information Retrieval, 2(2):1–135, 2008.

[280] Rupa Parameswaran. A robust data obfuscation approach for privacy pre-serving collaborative filtering. PhD thesis, Georgia Institute of Technology,Atlanta, GA, USA, 2006. AAI3233574.

http://xmlns.com/wot/0.1/

http://www.orkut.com/

BIBLIOGRAPHY 273

[281] Eli Pariser. The filter bubble : what the Internet is hiding from you. Viking,London, 2011.

[282] Sung Hyuk Park, Sang Pil Han, Soon Young Huh, and Hojin Lee. Preprocess-ing uncertain user profile data: Inferring user’s actual age from ages of theuser’s neighbors. In Proceedings of the 2009 IEEE International Conferenceon Data Engineering, ICDE ’09, pages 1619–1624, Washington, DC, USA,2009. IEEE Computer Society.

[283] Michael Pazzani and Daniel Billsus. Learning and revising user profiles: Theidentification of interesting web sites. In Machine Learning, volume 27, pages313–331. Kluwer Academic Publishers, Jun 1997.

[284] Michael J. Pazzani and Daniel Billsus. Content-based recommendation sys-tems. In The Adaptive Web: Methods and Strategies of Web Personalization,chapter 10, pages 325–341. Springer, 2007.

[285] Georgios Pitsilis and Lindsay Marshall. Trust as a key to improving recom-mendation systems. In Peter Herrmann, Valérie Issarny, and Simon Shiu,editors, iTrust, volume 3477 of Lecture Notes in Computer Science, pages210–223. Springer, 2005.

[286] Huseyin Polat and Wenliang Du. Privacy-preserving collaborative filteringusing randomized perturbation techniques. In ICDM, pages 625–628. IEEEComputer Society, 2003.

[287] Huseyin Polat and Wenliang Du. Privacy-preserving collaborative filtering.International Journal of Electronic Commerce, 9(4):pp. 9–35, 2005.

[288] Jay M. Ponte and W. Bruce Croft. A language modeling approach to infor-mation retrieval. In Proceedings of the 21st annual international ACM SIGIRconference on Research and development in information retrieval, SIGIR ’98,pages 275–281, New York, NY, USA, 1998. ACM.

[289] Danny Chiang Choon Poo, Brian Chng, and Jie-Mein Goh. A hybrid approachfor user profiling. In HICSS, page 103, 2003.

[290] Ana-Maria Popescu and Oren Etzioni. Extracting product features and opin-ions from reviews. In Proceedings of the conference on Human LanguageTechnology and Empirical Methods in Natural Language Processing (HLT’05), volume 06, pages 339–346, Morristown, NJ, USA, 2005. Associationfor Computational Linguistics.

[291] M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, July1980.

[292] EU FP7 Smartmuseum Project. http://www.smartmuseum.eu, 2010.

http://www.smartmuseum.eu

274 BIBLIOGRAPHY

[293] Grapple Project. Gumf ontology. http://www.kbs.uni-hannover.de/gumf.owl, 2011.

[294] Protégé. Copyright ©2009 Stanford Center for Biomedical Informatics Re-search, Available: http://protege.stanford.edu/.

[295] W.V.O. Quine, J.S. Ullian, and R.M. Ohmann. The web of belief, volume 2.Random House New York, 1978.

[296] Ross Quinlan. C4.5: Programs for Machine Learning. Morgan KaufmannPublishers, San Mateo, CA, 1993.

[297] F Radlinski and S Dumais. Improving personalized web search using resultdiversification. In Proceedings of the 29th annual international ACM SIGIRconference on Research and development in information retrieval, pages 691–692, New york, USA, 2006. ACM.

[298] Davood Rafiei, Krishna Bharat, and A Shukla. Diversifying web search re-sults. In Proceedings of the 19th international conference on World wide web(WWW ’10), page 781, New York, USA, 2010. ACM Press.

[299] P. Anand Raj and D. Nagesh Kumar. Ranking alternatives with fuzzy weightsusing maximizing set and minimizing set. Fuzzy Sets and Systems, 105(3):365– 375, 1999.

[300] Daniel Ramage, Evan Rosen, Jason Chuang, Christopher D. Manning, andDaniel A. McFarland. Topic modeling for the social sciences. In NIPS 2009Workshop on Applications for Topic Models: Text and Beyond, Whistler,Canada, December 2009.

[301] Karthik Raman, P. Shivaswamy, and Thorsten Joachims. Learning to diver-sify from implicit feedback. In DDR-2012: Diversity in Document Retrieval,co-located with International Conference on Web Search and Data Mining(WSDM ’2012), New york, USA, 2012. ACM.

[302] Sanjog Ray and Ambuj Mahanti. Strategies for effective shilling attacksagainst recommender systems. In Francesco Bonchi, Elena Ferrari, Wei Jiang,and Bradley Malin, editors, Privacy, Security, and Trust in KDD, volume5456 of Lecture Notes in Computer Science, pages 111–125. Springer BerlinHeidelberg, 2009.

[303] Liana Razmerita, Albert A. Angehrn, and Alexander Maedche. Ontology-based user modeling for knowledge management systems. In PeterBrusilovsky, Albert T. Corbett, and Fiorella de Rosis, editors, User Mod-eling, volume 2702 of Lecture Notes in Computer Science, pages 213–217.Springer, 2003.

http://www.kbs.uni-hannover.de/gumf.owl

http://www.kbs.uni-hannover.de/gumf.owl

BIBLIOGRAPHY 275

[304] Tim Reichling and Volker Wulf. Expert recommender systems in practice:evaluating semi-automatic profile generation. In Saul Greenberg, Scott EHudson, Ken Hinckley, Meredith Ringel, and Dan R Editors Olsen, editors,Proceedings of the 27th international conference on Human factors in com-puting systems CHI ’09, pages 59–68. ACM, 2009.

[305] P Resnick, R Zeckhauser, J Swanson, and K Lockwood. The value of reputa-tion on ebay: A controlled experiment. Experimental Economics, 9(2):79–101,2006.

[306] Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and JohnRiedl. Group lens: An open architecture for collaborative filtering of netnews.In Proceedings of ACM 1994 Conference on Computer Supported CooperativeWork, pages 175–186, 1994.

[307] Seungmin Rho, Seheon Song, Yunyoung Nam, Eenjun Hwang, and MinkooKim. Implementing situation-aware and user-adaptive music recommendationservice in semantic web and real-time multimedia computing environment.Multimedia Tools and Applications, pages 1–24, 2011.

[308] John Riedl and Joseph Konstan. Movielens dataset, 1998.

[309] Tuukka Ruotsalo. Methods and applications for ontology-based recommendersystems. PhD thesis, Aalto University, 2010.

[310] Tuukka Ruotsalo, Eetu Mäkelä, Tomi Kauppinen, Eero Hyvönen, Kris-ter Haav, Ville Rantala, Matias Frosterus, Nima Dokoohaki, and MihhailMatskin. Smartmuseum – personalized context-aware access to digital cul-tural heritage, 2009.

[311] Elie Sanchez, editor. Fuzzy Logic and the Semantic Web. Capturing Intelli-gence. Elsevier, Amsterdam, 2006.

[312] Mark Sanderson, Jiayu Tang, Thomas Arni, and Paul Clough. What elseis there? search diversity examined. In Proceedings of the 31th EuropeanConference on IR Research on Advances in Information Retrieval, ECIR ’09,pages 562–569, Berlin, Heidelberg, 2009. Springer-Verlag.

[313] Badrul M. Sarwar, George Karypis, Joseph Konstan, and John Reidl. Recom-mender systems for large-scale e-commerce: Scalable neighborhood formationusing clustering. In Proceedings of the 5th International Conference on Com-puter and Information Technology (ICCIT), 2002.

[314] B.M. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Recommender systemsfor large–scale e–commerce: Scalable neighborhood formation using cluster-ing. In Proceedings of the Fifth International Conference on Computer andInformation Technology (ICCIT’02), December 27–28 2002.

276 BIBLIOGRAPHY

[315] J. Ben Schafer, Dan Frankowski, Jon Herlocker, and Shilad Sen. Collaborativefiltering recommender systems. In The Adaptive Web: Methods and Strategiesof Web Personalization, chapter 9, pages 291–324. Springer, 2007.

[316] Stefan Schmidt, Robert Steele, Tharam S. Dillon, and Elizabeth Chang.Fuzzy trust evaluation and credibility development in multi-agent systems.Appl. Soft Comput., 7(2):492–505, 2007.

[317] B. Schneier. A taxonomy of social networking data. Security Privacy, IEEE,8(4):88, july-aug. 2010.

[318] Dominik Schnitzer, Arthur Flexer, and Gerhard Widmer. A filter-and-refineindexing method for fast similarity search in millions of music tracks. In Pro-ceedings of the 10th International Conference on Music Information Retrieval(ISMIR’09), pages 537–542, online, 2009. online.

[319] J. Scott. Social network analysis: A handbook. Sage, 2000.

[320] Fabrizio Sebastiani. Machine learning in automated text categorization. ACMComput. Surv., 34(1):1–47, March 2002.

[321] Dongmahn Seo, Suhyun Kim, Hogun Park, Geun Young Lee, and HeedongKo. Overlay SNS: Next generation social network service. IEEE, 2012.

[322] Ahu Sieg, Bamshad Mobasher, and Robin Burke. Web search personalizationwith ontological user profiles. In CIKM ’07: Proceedings of the sixteenth ACMconference on Conference on information and knowledge management, pages525–534, New York, NY, USA, 2007. ACM.

[323] Ahu Sieg, Bamshad Mobasher, and Robin D. Burke. Learning ontology-based user profiles: A semantic approach to personalized web search. IEEEIntelligent Informatics Bulletin, 8(1):7–18, 2007.

[324] Aameek Singh and Ling Liu. Trustme: anonymous management of trustrelationships in decentralized p2p systems. In Peer-to-Peer Computing, 2003.(P2P 2003). Proceedings. Third International Conference on, pages 142 – 149,sept. 2003.

[325] H. Jeff Smith, Tamara Dinev, and Heng Xu. Information privacy research:An interdisciplinary review. MIS Quarterly, 35(4):989–1015, 2011.

[326] S. J. Soltysiak and I. B. Crabtree. Automatic learning of user profiles - towardsthe personalisation of agent services. In BT Technology Journal, volume 16,pages 110–117, Jul 1998.

[327] Flavia Sparacino. Museum Intelligence: Using interactive technologies foreffective communication and storytelling in the Pucini Set Designer exhibit. InInternational Cultural Heritage Informatics Meeting (ICHIM 2004), Berlin,August 2004. ICHIM.

BIBLIOGRAPHY 277

[328] A. Squicciarini, E. Bertino, Elena Ferrari, F. Paci, and B. Thuraisingham.Pp-trust-x: A system for privacy preserving trust negotiations. ACM Trans.Inf. Syst. Secur., 10(3), July 2007.

[329] Jaideep Srivastava, Robert Cooley, Mukund Deshpande, and Pang-Ning Tan.Web usage mining: discovery and applications of usage patterns from webdata. SIGKDD Explor. Newsl., 1(2):12–23, January 2000.

[330] Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, and Thomas Griffiths.Probabilistic author-topic models for information discovery. In KDD ’04:Proceedings of the tenth ACM SIGKDD international conference on Knowl-edge discovery and data mining, pages 306–315, New York, NY, USA, 2004.ACM Press.

[331] Veselin Stoyanov and Claire Cardie. Topic identification for fine-grained opin-ion analysis. Proceedings of the 22nd International Conference on Computa-tional Linguistics (COLING ’08), 1(August):817–824, 2008.

[332] Xiaoyuan Su and Taghi M. Khoshgoftaar. A survey of collaborative filteringtechniques. Advances in Artificial Intelligence, 2009:1–19, Jan 2009.

[333] Charles Sutton and Andrew McCallum. An introduction to conditional ran-dom fields. In Foundations and Trends in Machine Learning, volume 4, pages267 – 373. Citeseer, 2012.

[334] Knowledge Systems. Inference web portal. http://inference-web.org, Lastaccessed 2012.

[335] Martin Szomszor, Harith Alani, Ivan Cantador, Kieron O’Hara, and NigelShadbolt. Semantic modelling of user interests based on cross-folksonomyanalysis. In 7th International Semantic Web Conference (ISWC), October2008. Event Dates: October 26th - 30th.

[336] Jiliang Tang, Huiji Gao, and Huan Liu. mTrust: discerning multi-facetedtrust in a connected world, pages 93–102. ACM Press, Feb 2012.

[337] Technorati. Technorati state of blogosphere 2010. Technical report, Techno-rati, 2010.

[338] Jaime Teevan, Susan T. Dumais, and Eric Horvitz. Personalizing search viaautomated analysis of interests and activities. In SIGIR ’05: Proceedingsof the 28th annual international ACM SIGIR conference on Research anddevelopment in information retrieval, pages 449–456, New York, NY, USA,2005. ACM Press.

[339] N. Tintarev and J. Masthoff. A survey of explanations in recommender sys-tems. In Data Engineering Workshop, 2007 IEEE 23rd International Con-ference on, pages 801–810, Washington DC, USA, 2007. IEEE ComputerSociety.

http://inference-web.org

278 BIBLIOGRAPHY

[340] N. Tintarev and J. Masthoff. The effectiveness of personalized movie expla-nations: An experiment using commercial meta-data. In Wolfgang Nejdl,Judy Kay, Pearl Pu, and EelcoEditors Herder, editors, Adaptive Hypermediaand Adaptive Web-Based Systems, pages 204–213, Heidelberg, 2008. Springer-Verlag.

[341] Ivan Titov and Ryan McDonald. Modeling online reviews with multi-graintopic models. In Proceeding of the 17th international conference on WorldWide Web (WWW ’08), pages 111–120, New york, USA, 2008. ACM Press.

[342] Eran Toch, YangWang, and Lorrie Faith Cranor. Personalization and privacy:a survey of privacy risks and remedies in personalization-based systems. UserModeling and User-Adapted Interaction, 22(1-2):203–220, Mar 2012.

[343] Santtu Toivonen and Grit Denker. The impact of context on the trustwor-thiness of communication: An ontological approach. In Jennifer Golbeck,Piero A. Bonatti, Wolfgang Nejdl, Daniel Olmedilla, and Marianne Winslett,editors, ISWC Workshop on Trust, Security, and Reputation on the SemanticWeb, volume 127 of CEUR Workshop Proceedings. CEUR-WS.org, 2004.

[344] Santtu Toivonen and Grit Denker. The impact of context on the trustworthi-ness of communication: An ontological approach. ISWC Workshop on Trust,Security, and Reputation on the Semantic Web, pages 1–10, 2004.

[345] H. Tong, J. He, Z. Wen, Ravi Konuru, and C.Y. Lin. Diversified ranking onlarge graphs: an optimization viewpoint. In Proceedings of the 17th ACMSIGKDD international conference on Knowledge discovery and data mining,pages 1028–1036, New york, USA, 2011. ACM.

[346] Stephanie Tom Tong, Brandon Van Der Heide, Lindsey Langwell, andJoseph B. Walther. Too much of a good thing? the relationship betweennumber of friends and interpersonal impressions on facebook. Journal ofComputer-Mediated Communication, 13(3):531–549, 2008.

[347] Ilaria Torre. Adaptive systems in the era of the semantic and social web, asurvey. User Modeling and User-Adapted Interaction, 19(5):433–486, 2009.

[348] Kristina Toutanova, Dan Klein, Christopher D Manning, and Yoram Singer.Feature-rich part-of-speech tagging with a cyclic dependency network. InMarti Hearst and Mari Ostendorf, editors, Proceedings of the 2003 Confer-ence of the North American Chapter of the Association for ComputationalLinguistics on Human Language Technology (NAACL’03), volume 1, pages173–180, Morristown, NJ, USA, 2003. Association for Computational Lin-guistics.

[349] Joana Trajkova and Susan Gauch. Improving ontology-based user profiles. InChristian Fluhr, Gregory Grefenstette, and W. Bruce Croft, editors, RIAO,pages 380–390. CID, 2004.

BIBLIOGRAPHY 279

[350] Ya Fen Tseng, Tzai-Zang Lee, Shu-Chen Kao, and ChienHsing Wu. An ex-tension of trust and privacy in the initial adoption of online shopping: Anempirical study, pages 159–164. IEEE, 2011.

[351] Mark Van Setten, Sean McNee, and Joseph Konstan, editors. Interestmap:An Identity and Taste-Based Recommender. 2005 International Conferenceon Intelligent User Interfaces (IUI 2005), 2005.

[352] Erik Vee, Utkarsh Srivastava, Jayavel Shanmugasundaram, Prashant Bhat,and Sihem Amer Yahia. Efficient computation of diverse query results.IEEE 24th International Conference on Data Engineering (ICDE), 00:228–236, 2008.

[353] Nele Verbiest, Chris Cornelis, Patricia Victor, and Enrique Herrera-Viedma.Trust and distrust aggregation enhanced with path length incorporation.Fuzzy Sets and Systems, 202:61–74, 2012.

[354] Patricia Victor, Chris Cornelis, Martine De Cock, and Paulo Pinheiroda Silva. Gradual trust and distrust in recommender systems. Fuzzy Setsand Systems, 160(10):1367–1382, 2009.

[355] Patricia Victor, Chris Cornelis, Martine De Cock, and Paulo Pinheiro daSilva. Gradual trust and distrust in recommender systems. Fuzzy Sets Syst.,160(10):1367–1382, May 2009.

[356] Patricia Victor, Chris Cornelis, Ankur M. Teredesai, and Martine De Cock.Whom should i trust?: the impact of key figures on cold start recommen-dations. In Proceedings of the 2008 ACM symposium on Applied computing,SAC ’08, pages 2014–2018, New York, NY, USA, 2008. ACM.

[357] Spyros Voulgaris, Daniela Gavidia, and Maarten van Steen. Cyclon: Inex-pensive membership management for unstructured p2p overlays. J. NetworkSyst. Manage., 13(2):197–217, 2005.

[358] VRA. The VRA core categories, version 4.0. http://www.vraweb.org/projects/vracore4, 2002.

[359] Wolfgang Wahlster and Alfred Kobsa. User models in dialog systems. InA. Kobsa and W. Wahlster, editors, User Models in Dialog Systems, pages4–34. Springer, Berlin, Heidelberg, 1989.

[360] Hanna M. Wallach. Topic modeling: beyond bag-of-words. In Proceedingsof the 23rd international conference on Machine learning, ICML ’06, pages977–984, New York, NY, USA, 2006. ACM.

[361] Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno.Evaluation methods for topic models. In Proceedings of the 26th AnnualInternational Conference on Machine Learning, ICML ’09, pages 1105–1112,New York, NY, USA, 2009. ACM.

http://www.vraweb.org/projects/vracore4

http://www.vraweb.org/projects/vracore4

280 BIBLIOGRAPHY

[362] Eric Wang, Jorge Silva, Rebecca Willett, and Lawrence Carin. Dynamicrelational topic model for social network analysis with noisy links, volume 22.IEEE, 2011.

[363] Jun Wang and Jianhan Zhu. Portfolio theory of information retrieval. InJames Allan, Javed A Aslam, Mark Sanderson, Cheng-Xiang Zhai, andJustinEditors Zobel, editors, Proceedings of the 32nd international ACM Con-ference on Research and Development in Information Retrieval, page 115,New york, USA, 2009. ACM Press.

[364] Yang Wang and Alfred Kobsa. Respecting users’ individual privacy con-straints in web personalization. In User Modeling, pages 157–166, 2007.

[365] Yiwen Wang, Lora Aroyo, Natalia Stash, and Lloyd Rutledge. Interactive usermodeling for personalized access to museum collections: The rijksmuseumcase study. In Cristina Conati, Kathleen F. McCoy, and Georgios Paliouras,editors, User Modeling, volume 4511 of Lecture Notes in Computer Science,pages 385–389. Springer, 2007.

[366] Zhe Wang, YongjiWang, and HuWu. Tags Meet Ratings: Improving Collabo-rative Filtering with Tag-Based Neighborhood Method. Proceedings of SocialRecommender Systems Workshop (SRS’10), 2010.

[367] Stanley Wasserman and Katherine Faust. Social Network Analysis: Methodsand Applications. Number 8 in Structural analysis in the social sciences.Cambridge University Press, 1 edition, 1994.

[368] D.J. Watts and S.H. Strogatz. Collective dynamics of "small-world" networks.Nature, 393(6684):440–2, Jun 1998.

[369] Duncan J Watts. Small Worlds: The Dynamics of Networks between Orderand Randomness, volume 107. Princeton University Press, 1999.

[370] Inference Web. Proof markup language (pml) ontology. http://iw.stanford.edu/2006/06/pml-trust.owl, Last accessed 2006.

[371] W. Weerkamp and M. De Rijke. Credibility improves topical blog post re-trieval. In proceedings of Association for Computational Linguistics (ACL)2008, pages 923–931, Morristown, NJ, USA, 2008. Association for Computa-tional Linguistics.

[372] Xing Wei and W. Bruce Croft. LDA-based document models for ad hocretrieval. In Proc. SIGIR, 2006.

[373] Stephan Weibelzahl. Evaluation of adaptive systems, 2001.

[374] Welkin. Copyright ©2004-2008 Massachusetts Institute of Technology, Avail-able: http://simile.mit.edu/welkin/.

http://iw.stanford.edu/2006/06/pml-trust.owl

http://iw.stanford.edu/2006/06/pml-trust.owl

BIBLIOGRAPHY 281

[375] Jianshu Weng, E.P. Lim, Jing Jiang, and Q. He. Twitterrank: finding topic-sensitive influential twitterers, pages 261–270. ACM, 2010.

[376] Jianshu Weng, Chunyan Miao, and Angela Goh. Improving collaborativefiltering with trust-based metrics. In SAC ’06: Proceedings of the 2006 ACMsymposium on Applied computing, pages 1860–1864, New York, NY, USA,2006. ACM.

[377] P.O. Wennerberg. Ontology based knowledge discovery in social networks.JRC Joint Research Center, 2005.

[378] A. Westin. Privacy and Freedom. New York: Atheneum, 1967.

[379] Janyce Wiebe and Ellen Riloff. Creating subjective and objective sentenceclassifiers from unannotated texts, pages 486–497. Number 34063406 in Lec-ture Notes in Computer Science. Springer, Heidelberg/Berlin, 2005.

[380] Bo Xiao and Izak Benbasat. The asymmetric effects of trust and distrust:An empirical investigation in a deception detection context. SIGHCI 2010Proceedings, 2010.

[381] Xueke Xu, Tao Meng, and Xueqi Cheng. Aspect-based extractive summa-rization of online reviews. In Proceedings of the 2011 ACM Symposium onApplied Computing, page 968Â975, New york, USA, 2011. ACM.

[382] Jian-hua Yeh and Meng-lun Wu. Recommendation Based on Latent Topicsand Social Network Analysis, pages 209–213. IEEE, 2010.

[383] Jianxing Yu, Z.J. Zha, Meng Wang, and T.S. Chua. Aspect ranking: identify-ing important product aspects from online consumer reviews. In Proceedingsof the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies-Volume 1, pages 1496–1505, Morristown, NJ,USA, 2011. Association for Computational Linguistics.

[384] Kuifei Yu, Baoxian Zhang, Hengshu Zhu, Huanhuan Cao, and Jilei Tian.Towards personalized context-aware recommendation by mining context logsthrough topic models. In Pang-Ning Tan, Sanjay Chawla, Chin Kuan Ho,and James Bailey, editors, PAKDD (1), volume 7301 of Lecture Notes inComputer Science, pages 431–443. Springer, 2012.

[385] Weiwei Yuan, Donghai Guan, Young-Koo Lee, Sungyoung Lee, and Sung JinHur. Improved trust-aware recommender system using small-worldness oftrust networks. Knowledge-Based Systems, 23(3):232 – 238, 2010.

[386] I. Zaihrayeu, P. da Silva, and D. McGuinness. Iwtrust: Improving user trustin answers from the web. Trust Management, pages 179–188, 2005.

282 BIBLIOGRAPHY

[387] Mohamed Ramzy Zakaria, Adam Moore, Helen Ashman, Craig D. Stewart,and Tim J. Brailsford. The hybrid model for adaptive educational hyperme-dia. In Paul De Bra, Peter Brusilovsky, and Ricardo Conejo, editors, AH,volume 2347 of Lecture Notes in Computer Science, pages 580–585. Springer,2002.

[388] Alireza Zarghami, Soude Fazeli, Nima Dokoohaki, and Mihhail Matskin.Social trust-aware recommendation system: A t-index approach. InIEEE/WIC/ACM International Conference on Web Intelligence and Intel-ligent Agent Technology, volume 3, pages 85–90. IEEE Computer Society,2009.

[389] Cheng X. Zhai, William W. Cohen, and John Lafferty. Beyond independentrelevance: methods and evaluation metrics for subtopic retrieval. In SIGIR’03: Proceedings of the 26th annual international ACM SIGIR conference onResearch and development in informaion retrieval, pages 10–17, New York,NY, USA, 2003. ACM.

[390] Zhongwu Zhai, Bing Liu, Hua Xu, and P. Jia. Constrained lda for groupingproduct features in opinion mining. In Advances in Knowledge Discovery andData Mining, pages 448–459, Heidelberg/Berlin, 2011. Springer.

[391] Chenyi Zhang and Jianling Sun. Large scale microblog mining using dis-tributed MB-LDA, page 1035. ACM Press, Apr 2012.

[392] Fuguo Zhang. Average shilling attack against trust-based recommender sys-tems. In Information Management, Innovation Management and IndustrialEngineering, 2009 International Conference on, volume 4, pages 588 –591,dec. 2009.

[393] Qingsheng Zhang, Yong Qi, Jizhong Zhao, Di Hou, and Yujie Niu. Fuzzyprivacy decision for context-aware access personal information. Wuhan Uni-versity Journal of Natural Sciences, 12:941–945, 2007.

[394] Qingsheng Zhang, Yong Qi, Jizhong Zhao, Di Hou, Tianhai Zhao, and LiangLi. A study on context-aware privacy protection for personal information,pages 1351–1358. IEEE, 2007.

[395] Lixin Zhou. Trust based recommendation system with social network analy-sis. In Information Engineering and Computer Science, 2009. ICIECS 2009.International Conference on, pages 1 –4, dec. 2009.

[396] Xujuan Zhou, Sheng-Tang Wu, Yuefeng Li, Yue Xu, Raymond Y. K. Lau, andPeter D. Bruza. Utilizing search intent in topic ontology-based user profilefor web mining. In Web Intelligence, pages 558–564. IEEE Computer Society,2006.

BIBLIOGRAPHY 283

[397] Xujuan Zhou, Yue Xu, Yuefeng Li, Audun Josang, and Clive Cox. Thestate-of-the-art in personalized recommender systems for social networking.Artificial Intelligence Review, 37(2):119–132, may 2011.

[398] Li Zhuang, Feng Jing, and Xiao-Yan Zhu. Movie review mining and summa-rization. In Proceedings of the 15th ACM international conference on Infor-mation and knowledge management - CIKM’06, page 43, New York, USA,2006. ACM Press.

[399] Leyla Zhuhadar, Olfa Nasraoui, and Robert Wyatt. Dual representation ofthe semantic user profile for personalized web search in an evolving domain.In AAAI Spring Symposium: Social Semantic Web: Where Web 2.0 MeetsWeb 3.0, pages 84–. AAAI, 2009.

[400] Cai Nicolas Ziegler. Towards Decentralized Recommender Systems. PhD the-sis, Albert-Ludwigs-Universität Freiburg, Germany, 2005.

[401] Cai-Nicolas Ziegler and Jennifer Golbeck. Investigating interactions of trustand interest similarity. Decision Support Systems, 43(2):460 – 475, 2007.Emerging Issues in Collaborative Commerce.

[402] Cai Nicolas. Ziegler and Georg Lausen. Spreading activation models for trustpropagation. In e-Technology, e-Commerce and e-Service, 2004. EEE ’04.2004 IEEE International Conference on, pages 83 – 97, march 2004.

[403] Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, and Georg Lausen.Improving recommendation lists through topic diversification. In Proceedingsof the 14th international conference on World Wide Web, WWW ’05, pages22–32, New York, NY, USA, 2005. ACM.

[404] Philip R. Zimmermann. The official PGP user’s guide. MIT Press, Cam-bridge, MA, USA, 1995.

trust-baseduserproﬁlingnimadokoohaki.com/papers/phdthesis.pdf · trust-baseduserproﬁling nima...

Documents