Online Social Networks: Measurement, Analysis, and Applications to

Download Online Social Networks: Measurement, Analysis, and Applications to

Post on 31-Dec-2016

215 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>RICE UNIVERSITY</p><p>Online Social Networks:</p><p>Measurement, Analysis, and</p><p>Applications to Distributed Information Systems</p><p>by</p><p>Alan E. Mislove</p><p>A Thesis Submitted</p><p>in Partial Fulfillment of the</p><p>Requirements for the Degree</p><p>Doctor of Philosophy</p><p>Approved, Thesis Committee:</p><p>Peter Druschel, ChairProfessor of Computer Science</p><p>T. S. Eugene NgAssistant Professor of Computer Science</p><p>Krishna P. GummadiAssistant Professor of Computer Science</p><p>Houston, Texas</p><p>April, 2009</p></li><li><p>Online Social Networks:</p><p>Measurement, Analysis, and</p><p>Applications to Distributed Information Systems</p><p>Alan E. Mislove</p><p>Abstract</p><p>Recently, online social networking sites have exploded in popularity. Numerous sites</p><p>are dedicated to finding and maintaining contacts and to locating and sharing different</p><p>types of content. Online social networks represent a new kind of information network</p><p>that differs significantly from existing networks like the Web. For example, in the</p><p>Web, hyperlinks between content form a graph that is used to organize, navigate, and</p><p>rank information. The properties of the Web graph have been studied extensively,</p><p>and have lead to useful algorithms such as PageRank. In contrast, few links exist</p><p>between content in online social networks and instead, the links exist between content</p><p>and users, and between users themselves. However, little is known in the research</p><p>community about the properties of online social network graphs at scale, the factors</p><p>that shape their structure, or the ways they can be leveraged in information systems.</p><p>In this thesis, we use novel measurement techniques to study online social net-</p><p>works at scale, and use the resulting insights to design innovative new information</p><p>systems. First, we examine the structure and growth patterns of online social net-</p></li><li><p>works, focusing on how users are connecting to one another. We conduct the first</p><p>large-scale measurement study of multiple online social networks at scale, capturing</p><p>information about over 50 million users and 400 million links. Our analysis identifies</p><p>a common structure across multiple networks, characterizes the underlying processes</p><p>that are shaping the network structure, and exposes the rich community structure.</p><p>Second, we leverage our understanding of the properties of online social networks</p><p>to design new information systems. Specifically, we build two distinct applications</p><p>that leverage different properties of online social networks. We present and evaluate</p><p>Ostra, a novel system for preventing unwanted communication that leverages the</p><p>difficulty in establishing and maintaining relationships in social networks. We also</p><p>present, deploy, and evaluate PeerSpective, a system for enhancing Web search using</p><p>the natural community structure in social networks. Each of these systems has been</p><p>evaluated on data from real online social networks or in a deployment with real</p><p>users.</p></li><li><p>Acknowledgments</p><p>First and foremost, I would like to thank my advisors, Peter Druschel and Krishna P.</p><p>Gummadi, for their help, advice, and mentoring during my graduate career. Without</p><p>their support and guidance, none of the work presented in this thesis would have</p><p>been possible. Moreover, I am deeply indebted to them both for showing me how to</p><p>do successful research, how to mentor students, and how to communicate research</p><p>results effectively. I suspect that this debt will only grow over time, as I use these</p><p>skills in my own research career.</p><p>I would also like to thank Eugene Ng for his service on my thesis committee.</p><p>His insight and advice proved very useful during the preparation of this thesis, and</p><p>in my search for a tenure-track job. I am also grateful to have worked with Bobby</p><p>Bhattacharjee his advice and enthusiasm played no small part in my decision to</p><p>continue a career in academia.</p><p>I am extremely grateful to have worked with and mentored numerous talented</p><p>students during my research career. Working with Bimal, Malveeka, and Hema was a</p><p>pleasure, and the excitement and energy they each brought to their research was both</p><p>refreshing and invigorating. I hope that I am lucky enough to work with students of</p><p>a similar caliber in the future.</p></li><li><p>v</p><p>I am deeply indebted to Brigitta Hansen, Claudia Richter, and Belia Martinez,</p><p>whose assistance with many administrative matters proved invaluable. They all made</p><p>living in Germany while finishing a Ph.D. at Rice a much easier experience.</p><p>I would also like to thank my colleagues and friends in Saarbrucken: Ansley,</p><p>Animesh, Atul, Andreas, Jeff, Jim, Rodrigo, Andrey, Derek, Rose, Marcel, Max,</p><p>Nuno, Pedro, Mia, and Ashu. They all made MPI-SWS a wonderful place to be, and</p><p>being in Germany is an experience that I will always treasure.</p><p>I am also grateful for my close friendship with Rebecca. Her contagious excitement</p><p>and enthusiasm was always refreshing, and I benefited greatly from her insight and</p><p>advice. Additionally, I am grateful for my friendship with Stephanie our travels</p><p>and adventures often provided a needed break from research.</p><p>Finally, I would like to express my deep gratitude to my family, and especially my</p><p>parents, for their love and support during the ups and downs of graduate school. I</p><p>am grateful beyond words for all that they have given me.</p></li><li><p>Contents</p><p>Abstract ii</p><p>Acknowledgments iv</p><p>List of Illustrations xv</p><p>List of Tables xxii</p><p>1 Introduction 1</p><p>1.1 Background, related work, and methodology . . . . . . . . . . . . . . 4</p><p>1.2 Network structure and growth . . . . . . . . . . . . . . . . . . . . . . 5</p><p>1.3 Communities in online social networks . . . . . . . . . . . . . . . . . 7</p><p>1.4 Ostra: Leveraging relationships . . . . . . . . . . . . . . . . . . . . . 8</p><p>1.5 Peerspective: Leveraging shared interest . . . . . . . . . . . . . . . . 9</p><p>2 Background 11</p><p>2.1 What are online social networks? . . . . . . . . . . . . . . . . . . . . 11</p><p>2.1.1 Definition and purpose . . . . . . . . . . . . . . . . . . . . . . 11</p><p>2.1.2 A brief history . . . . . . . . . . . . . . . . . . . . . . . . . . 12</p><p>2.1.3 Mechanisms and policies . . . . . . . . . . . . . . . . . . . . . 14</p><p>2.1.4 A new form of information exchange . . . . . . . . . . . . . . 17</p></li><li><p>vii</p><p>2.2 Why study online social networks? . . . . . . . . . . . . . . . . . . . 19</p><p>2.2.1 Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19</p><p>2.2.2 Shared interest . . . . . . . . . . . . . . . . . . . . . . . . . . 20</p><p>2.2.3 Content exchange . . . . . . . . . . . . . . . . . . . . . . . . . 20</p><p>2.2.4 Other disciplines . . . . . . . . . . . . . . . . . . . . . . . . . 22</p><p>2.3 How do we analyze complex networks? . . . . . . . . . . . . . . . . . 23</p><p>2.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 23</p><p>2.3.2 Radius and diameter . . . . . . . . . . . . . . . . . . . . . . . 23</p><p>2.3.3 Degree distribution . . . . . . . . . . . . . . . . . . . . . . . . 24</p><p>2.3.4 Joint degree distribution . . . . . . . . . . . . . . . . . . . . . 25</p><p>2.3.5 Scale-free behavior . . . . . . . . . . . . . . . . . . . . . . . . 26</p><p>2.3.6 Assortativity . . . . . . . . . . . . . . . . . . . . . . . . . . . 26</p><p>2.3.7 Clustering coefficient . . . . . . . . . . . . . . . . . . . . . . . 27</p><p>2.3.8 Betweenness centrality . . . . . . . . . . . . . . . . . . . . . . 27</p><p>2.3.9 Modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28</p><p>2.3.10 Connected components . . . . . . . . . . . . . . . . . . . . . . 29</p><p>2.3.11 Classes of studied networks . . . . . . . . . . . . . . . . . . . 30</p><p>2.3.12 Preferential attachment . . . . . . . . . . . . . . . . . . . . . 31</p><p>3 Related Work 32</p><p>3.1 Complex network structure . . . . . . . . . . . . . . . . . . . . . . . . 32</p><p>3.1.1 Social networks . . . . . . . . . . . . . . . . . . . . . . . . . . 33</p></li><li><p>viii</p><p>3.1.2 Other information networks . . . . . . . . . . . . . . . . . . . 35</p><p>3.2 Complex network growth . . . . . . . . . . . . . . . . . . . . . . . . . 36</p><p>3.2.1 Growth models . . . . . . . . . . . . . . . . . . . . . . . . . . 36</p><p>3.2.2 Observations of network growth . . . . . . . . . . . . . . . . . 39</p><p>3.3 Detecting communities . . . . . . . . . . . . . . . . . . . . . . . . . . 41</p><p>3.3.1 Classical community detection . . . . . . . . . . . . . . . . . . 41</p><p>3.3.2 Global community detection . . . . . . . . . . . . . . . . . . . 42</p><p>3.3.3 Local community detection . . . . . . . . . . . . . . . . . . . . 44</p><p>3.3.4 Observations of communities . . . . . . . . . . . . . . . . . . . 46</p><p>3.4 Preventing unwanted communication . . . . . . . . . . . . . . . . . . 47</p><p>3.4.1 Content-based filtering . . . . . . . . . . . . . . . . . . . . . . 48</p><p>3.4.2 Originator-based filtering . . . . . . . . . . . . . . . . . . . . . 49</p><p>3.4.3 Imposing a cost on the sender . . . . . . . . . . . . . . . . . . 50</p><p>3.4.4 Content rating . . . . . . . . . . . . . . . . . . . . . . . . . . 53</p><p>3.4.5 Leveraging relationships . . . . . . . . . . . . . . . . . . . . . 53</p><p>3.5 Personalized web search . . . . . . . . . . . . . . . . . . . . . . . . . 54</p><p>4 Measurement Methodology 57</p><p>4.1 Challenges in crawling large graphs . . . . . . . . . . . . . . . . . . . 57</p><p>4.1.1 Crawling the entire large WCC . . . . . . . . . . . . . . . . . 58</p><p>4.1.2 Using only forward links . . . . . . . . . . . . . . . . . . . . . 59</p><p>4.2 Capturing social networks structure . . . . . . . . . . . . . . . . . . 60</p></li><li><p>ix</p><p>4.2.1 Flickr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60</p><p>4.2.2 LiveJournal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62</p><p>4.2.3 Orkut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64</p><p>4.2.4 YouTube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66</p><p>4.2.5 Web graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66</p><p>4.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67</p><p>4.3 Capturing group membership . . . . . . . . . . . . . . . . . . . . . . 68</p><p>4.4 Capturing social networks growth . . . . . . . . . . . . . . . . . . . . 68</p><p>4.4.1 Flickr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69</p><p>4.4.2 YouTube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70</p><p>4.4.3 Wikipedia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71</p><p>4.4.4 Internet topology . . . . . . . . . . . . . . . . . . . . . . . . . 72</p><p>4.5 Capturing communities . . . . . . . . . . . . . . . . . . . . . . . . . . 73</p><p>4.5.1 Measurement methodology . . . . . . . . . . . . . . . . . . . . 73</p><p>4.5.2 Collected data . . . . . . . . . . . . . . . . . . . . . . . . . . . 74</p><p>4.5.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75</p><p>4.6 Data availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76</p><p>5 Network Structure 77</p><p>5.1 High-level data statistics . . . . . . . . . . . . . . . . . . . . . . . . . 78</p><p>5.2 Link symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79</p><p>5.3 Power-law node degrees . . . . . . . . . . . . . . . . . . . . . . . . . . 80</p></li><li><p>x</p><p>5.4 Correlation of indegree and outdegree . . . . . . . . . . . . . . . . . . 85</p><p>5.5 Path lengths and diameter . . . . . . . . . . . . . . . . . . . . . . . . 87</p><p>5.6 Link degree correlations . . . . . . . . . . . . . . . . . . . . . . . . . 88</p><p>5.6.1 Joint degree distribution . . . . . . . . . . . . . . . . . . . . . 88</p><p>5.6.2 Scale-free behavior . . . . . . . . . . . . . . . . . . . . . . . . 90</p><p>5.6.3 Assortativity . . . . . . . . . . . . . . . . . . . . . . . . . . . 90</p><p>5.7 Densely connected core . . . . . . . . . . . . . . . . . . . . . . . . . . 91</p><p>5.8 Tightly clustered fringe . . . . . . . . . . . . . . . . . . . . . . . . . . 94</p><p>5.9 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96</p><p>5.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98</p><p>5.10.1 Information dissemination and search . . . . . . . . . . . . . . 99</p><p>5.10.2 Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99</p><p>5.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101</p><p>6 Network Growth 102</p><p>6.1 High-level data characteristics . . . . . . . . . . . . . . . . . . . . . . 103</p><p>6.2 Growth dominates network evolution . . . . . . . . . . . . . . . . . . 104</p><p>6.3 Reciprocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105</p><p>6.4 Preferential attachment . . . . . . . . . . . . . . . . . . . . . . . . . . 107</p><p>6.4.1 Undirected networks . . . . . . . . . . . . . . . . . . . . . . . 109</p><p>6.4.2 Directed networks . . . . . . . . . . . . . . . . . . . . . . . . . 109</p><p>6.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110</p></li><li><p>xi</p><p>6.5 Proximity bias in link creation . . . . . . . . . . . . . . . . . . . . . . 111</p><p>6.6 Mechanisms causing proximity bias . . . . . . . . . . . . . . . . . . . 114</p><p>6.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118</p><p>6.7.1 Is proximity fundamental? . . . . . . . . . . . . . . . . . . . . 118</p><p>6.7.2 Proximity mechanisms . . . . . . . . . . . . . . . . . . . . . . 120</p><p>6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120</p><p>7 Network Communities 122</p><p>7.1 Data sets used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124</p><p>7.2 Attributes in the network . . . . . . . . . . . . . . . . . . . . . . . . 124</p><p>7.2.1 Friends with common attributes . . . . . . . . . . . . . . . . . 125</p><p>7.2.2 Attribute-based communities . . . . . . . . . . . . . . . . . . . 126</p><p>7.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130</p><p>7.3 Detecting communities . . . . . . . . . . . . . . . . . . . . . . . . . . 130</p><p>7.3.1 Global community detection . . . . . . . . . . . . . . . . . . . 131</p><p>7.3.2 Local community detection . . . . . . . . . . . . . . . . . . . . 134</p><p>7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143</p><p>8 Ostra: Leveraging Relationships 146</p><p>8.1 Ostra strawman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148</p><p>8.1.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149</p><p>8.1.2 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . 150</p></li><li><p>xii</p><p>8.1.3 User credit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152</p><p>8.1.4 Credit adjustments . . . . . . . . . . . . . . . . . . . . . . . . 153</p><p>8.1.5 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157</p><p>8.1.6 Multi-party communication . . . . . . . . . . . . . . . . . . . 159</p><p>8.2 Ostra design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160</p><p>8.2.1 Trust networks . . . . . . . . . . . . . . . . . . . . . . . . . . 161</p><p>8.2.2 Link credit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162</p><p>8.2.3 Security properties . . . . . . . . . . . . . . . . . . . . . . . . 167</p><p>8.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171</p><p>8.3.1 Joining Ostra . . . . . . . . . . . . . . . . . . . . . . . . . . . 171</p><p>8.3.2 Content classification . . . . . . . . . . . . . . . . . . . . . . . 172</p><p>8.3.3 Parameter settings . . . . . . . . . . . . . . . . . . . . . . . . 173</p><p>8.3.4 Compromised user accounts . . . . . . . . . . . . . . . . . . . 174</p><p>8.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . ....</p></li></ul>