analyzing online social networks

3
14 COMMUNICATIONS OF THE ACM | NOVEMBER 2008 | VOL. 51 | NO. 11 news VISUALIZATION BY GUSTAVO GLUSMAN T HE ONLINE SOCIAL network seems like a new kid on the online block. Actually, the online social network stretches back years before the dot-com bust. The first major so- cial network site, SixDegrees.com, launched in 1997. The rapid growth has come more recently—MySpace in 2003, Facebook in 2004, and Twitter in 2006—propelled by the ubiquity of broadband and cellular-messaging connections plus the golden touch of yet another Harvard dropout (Mark Zuckerberg of Facebook). Their expan- sion set off a secondary growth market in analyzing social network sites. Social network analysis (or social networking analysis, take your pick) helps us un- derstand why Facebook and Flickr suc- ceeded while Friendster didn’t; shows how physical and online social net- works can be alike and different; and attempts to predict how they’ll evolve and, for beneficiaries of the research, how someone might get rich off the next wave. There’s also a good deal of research about how honest people are in describing themselves online. The sites differ in who can join, who can see your profile and how much of it is visible, and their openness to Web crawlers and other applications. The sites also differ in their suitability for use on a cell phone and whether they can be universally accessed among the multitude of telecom companies. For instance, Twitter, the what-are-you- doing-now site, wouldn’t be a big hit if there wasn’t a mobile Web. Online social networks also differ in size. Facebook’s magnitude, with 132 million unique visitors in June 2008, seems to fly in the face of the con- ventional wisdom that too much size makes a social networking site both impersonal and undesirable. (As Yogi Berra quipped, “Nobody goes there anymore; it’s too crowded.”) More than a few sites evolve in unpredict- able ways, sometimes because their infrastructure couldn’t handle geo- metric growth or because their rules annoyed existing members. Some died Analyzing Online Social Networks Social network analysis explains why some sites succeed and others fail, how physical and online social networks differ and are alike, and attempts to predict how they will evolve. Technology | DOI:10.1145/1400214.1400220 Bill Howard A detail from a painting of a Flickr network, consisting only of people with at least 50 mutual contacts, which reveals four distinct clusters.

Upload: bill

Post on 27-Jan-2017

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Analyzing online social networks

14 communications of the acm | NovEmbER 2008 | vol. 51 | No. 11

news

vi

su

al

iz

at

io

n b

y G

us

ta

vo

Gl

us

ma

n

ThE on Lin E s ociAL network seems like a new kid on the online block. Actually, the online social network stretches back years before

the dot-com bust. The first major so-cial network site, SixDegrees.com, launched in 1997. The rapid growth has come more recently—MySpace in 2003, Facebook in 2004, and Twitter in 2006—propelled by the ubiquity of broadband and cellular-messaging connections plus the golden touch of yet another Harvard dropout (Mark Zuckerberg of Facebook). Their expan-sion set off a secondary growth market in analyzing social network sites. Social network analysis (or social networking

analysis, take your pick) helps us un-derstand why Facebook and Flickr suc-ceeded while Friendster didn’t; shows how physical and online social net-works can be alike and different; and attempts to predict how they’ll evolve and, for beneficiaries of the research, how someone might get rich off the next wave. There’s also a good deal of research about how honest people are in describing themselves online.

The sites differ in who can join, who can see your profile and how much of it is visible, and their openness to Web crawlers and other applications. The sites also differ in their suitability for use on a cell phone and whether they can be universally accessed among the

multitude of telecom companies. For instance, Twitter, the what-are-you- doing-now site, wouldn’t be a big hit if there wasn’t a mobile Web.

Online social networks also differ in size. Facebook’s magnitude, with 132 million unique visitors in June 2008, seems to fly in the face of the con-ventional wisdom that too much size makes a social networking site both impersonal and undesirable. (As Yogi Berra quipped, “Nobody goes there anymore; it’s too crowded.”) More than a few sites evolve in unpredict-able ways, sometimes because their infrastructure couldn’t handle geo-metric growth or because their rules annoyed existing members. Some died

analyzing online social networks Social network analysis explains why some sites succeed and others fail, how physical and online social networks differ and are alike, and attempts to predict how they will evolve.

Technology | DOI:10.1145/1400214.1400220 Bill Howard

a detail from a painting of a flickr network, consisting only of people with at least 50 mutual contacts, which reveals four distinct clusters.

Page 2: Analyzing online social networks

news

NovEmbER 2008 | vol. 51 | No. 11 | communications of the acm 15

and others took on second lives. In 2002, Friendster was a dating service, competing against Match.com in the U.S., but it crashed and burned. Now, Friendster has re-emerged as a social network site, but its strongest markets are in Indonesia, Malaysia, the Philip-pines, and Singapore. Orkut started in the U.S. as a social network site, but flared out; today, 80% of its users reside in Brazil or India.

social networking Goes onlineSocial network analysis, of course, pre-dates online social networks. Some trace the roots of social network analy-sis to the early 20th century when so-ciologist Georg Simmel differentiated between social groups (a group with a specific focus such as a family, neigh-borhood, or job) and a social network (a looser, larger collection of people and groups with connections among groups). Later, psychologist Abraham Maslow’s hierarchy of needs (physi-ological, safety, love/belonging, es-teem, and self-actualization) was used to understand social networks. Re-search accelerated in the two decades after World War II as the availability of computers allowed the study of social networks with thousands of nodes. It remained for the Internet to provide networks with millions of nodes. As the size of networks grew, it became more difficult to display a network as a plot of dots connected by relationship lines, and the visual description be-came points or formulas.

Psychologist Stanley Milgram’s small world, or six degrees of separa-tion, experiments in the 1960s helped explain some aspects of social net-works, including the finding that most pairs of nodes passed through 5.5 nodes to reach the targeted individual. (Don’t look for the phrase “six degrees of separation” in Milgram’s papers; it was coined by playwright John Guare in his 1992 book of the same name.) While six degrees of separation may be true offline, less than three degrees is more likely online.

The Erdos-Rényi models for gener-ating random graphs, which place con-nections between pairs of nodes with equal probability, help explain some social networks, but later research in-dicates that random graph models may not scale to larger online networks.

Work in recent years finds intrigu-ing similarities among social network sites as well as with traditional social networks. In the Barabási-Albert mod-el, networks have power-law, scale-free, growth and exhibit preferential attachment. A physics professor at Notre Dame University, Albert-László Barabási has applied the preferential attachment model to online social networks and found that future gains more often accrue to nodes with more connections. In other worlds, a rising tide lifts all yachts, oft-cited academic papers are cited even more often, and a newbie to an online community connects more often to a well-known member.

Ravi Kumar, Jasminie Novak, and Andrew Tomkins at Yahoo! Research studied growth patterns at the Flickr photo-sharing site and the Yahoo! 360 social networking site. In both, they found the network density, or the in-terconnections per person, followed similar patterns: rapid growth through early adopters, decline in the wake of fewer friendships developing relative to network growth, and slow and steady growth where both members and con-nections grow. The trio segmented the network in three ways: “singletons” who don’t take part; a large core of connected users; and a middle region of isolated communities that keep to themselves and display a star structure. The stars make up a third of Flickr us-ers and 10% of Yahoo! 360 users; these communities may have a single charis-matic activist linked to other users who have few connections outside the star.

Jure Leskovec of Carnegie Mellon, Lars Backstrom of Cornell, and Ravi Kumar and Andrew Tomkins of Yahoo! Research studied large datasets from

While six degrees of separation may be true offline, less than three degrees is more likely online.

Computer Science

Increasing Network Efficiencycomputer scientists at the university of california, san diego (ucsd) have developed an algorithm that promises to significantly increase the efficiency of network routing. Known as xL, for approximate link state, the algorithm boosts network routing efficiency by suppressing system updates that force connected networks to continuously recalculate their paths in the internet.

“Being able to adapt to hardware failures is one of the fundamental characteristics of the internet,” says Kirill Levchenko, a student member of the ucsF team. “our routing algorithm reduces the overhead of route recomputation after a network change, making it possible to support larger networks. the benefits are especially significant when networks are made up of low-power devices of slow links.”

Computer Security

Virus Cinchresearchers at tel Aviv university have developed Korset, an open source program designed to halt malware on Linux, the operating system used by the majority of the world’s Web and email servers. instead of waiting for viruses and other malware to begin operating, Korset models the normal behavior of legitimate programs and instantly shuts down any program that veers away from expected activity. created by Avishai Wool, a professor of computer engineering at tel Aviv university and his graduate student, ohad Ben-cohen, Korset’s code has been released at www.korset.org to promote further development of the program. “it is our hope that this becomes mainstream and that this approach is adopted in standing distributions of operating systems,” said Wool in an interview with MsnBc.

Page 3: Analyzing online social networks

16 communications of the acm | NovEmbER 2008 | vol. 51 | No. 11

news

prises more than 10,000 servers on a Web tier, about 2,000 servers on a MySQL tier, and about 1,000 servers on a MemCache tier. Every second, the site gets 10 million requests, about 500,000 of which are MySQL queries. Data vol-ume was in the tens of gigabytes per day in early 2006, hit 1TB per day by mid-2007, and continues to grow.

“i (almost) look like Brad Pitt”What man doesn’t suck in his gut when a good-looking woman walks by? Online, a user posts his or her best picture, usu-ally in a setting that evokes how the user wants to be perceived, such as placing the Newport Yacht Club or a funky bar in the background. Some users resort to deception. Catalina Toma and Jef-frey Hancock of Cornell University and Nicole Ellison of Michigan State found that when it comes to online profiles on Match.com, Yahoo! Personals, Webdate, and American Singles, 81% of a survey group provided information that devi-ated from reality. “Deviations tended to be ubiquitous but small in magnitude. Men lied more about their height, and women lied more about their weight, with participants farther from the mean lying more,” they noted. “Overall, par-ticipants reported being the least accu-rate about their photographs and the most accurate about their relationship information.” The fact that you can up-date your profile if the misstatement becomes too pronounced may promote deception, although “a record of the presentation is preserved.” Because of the asynchronicity of social network-ing sites, “[Users] can plan, create, and edit their self-presentation, including deceptive elements, much more delib-erately than they would in face-to-face first encounters,” they noted. “The re-

Flickr, Delicious (social bookmark-ing), Answers (reference), and LinkedIn (business contacts) to develop a model of network evolution following the preferential attachment model. For all, the number of connections among members drops off exponentially with more degrees of separation, particular-ly beyond two hops. Two people with a common friend (two hops away) close a triangle and become friends them-selves. There were notable differences in new members: Flickr grows expo-nentially, LinkedIn grows quadratical-ly, Delicious grows superlinearly, and Answers grows sublinearly.

Anthropologist Robin Dunbar has argued a person can sustain about 150 social relationships and that often was the comfortable size of settlements, farming villages, and the tactical unit of the Roman legion, the maniple. On-line social networks with millions of users also work to keep human scale in mind.

At Facebook, users strive to mask the immense number of nodes with privacy settings, filters such as People You May Know, and the News Feed that shows on your page what your friends are doing and posting (so you don’t have to search dozens or hundreds of individual pages). The News Feed ini-tially set off howls of protest about pri-vacy concerns, but it turned out to be a key element in making Facebook more manageable and fueling its explosive growth. Just as size and density makes cities vibrant and attractive up to a point, Facebook research scientist Jeff Hammerbacher says, “We’ve noticed that people are more likely to become active users if they enter a dense, active network.”

The Facebook network now com-

duction of communication cues, espe-cially nonverbal and visual cues (with the exception of photographs), spares online daters some of the common pre-dicaments faced by traditional daters trying to make a good first impression.”

According to Hancock, similar misstatements appear in email com-munications, too, and they may show similarities in phrasing. “We’re look-ing to see if there are any verbal features that might identify these lies,” he says. Which raises the question: Could a fu-ture social networking applet be a pro-file lie detector?

Toma, Hancock, and Ellison found that the online photograph is the infor-mation most likely to be less than accu-rate. The more accurate the photo, the more honest the person is in his or her other profile information. And the more friends who are aware of the online dat-er’s profile, the more accurate the photo. But beware of escalation once the first lie gets told. Hancock says, “There will be elevated lying if people suspect others are, too. Lying will still be constrained even in a ‘high-lie environment’—most people do not feel comfortable stating big lies.”

Social networks can even make you a fitter, healthier person. Sometimes. Nicole Ellison of Michigan State, Re-becca Heino of Georgetown University, and Jennifer Gibbs of Rutgers Univer-sity found some respondents to social network and dating sites underreported their weight, then realized they’d bet-ter start losing weight to match their ideal self. One woman lost 44 pounds and said, “I can thank online dating for that.” Take that, Jenny Craig.

Bill Howard writes about science and technology from Westfield, nJ.

For the first time, Japanese and u.s. cosmologists have reliably reproduced the formula of the universe’s first star in supercomputer experiments, Science reports, and the protostar they produced was the catalyst for a primordial sun that rapidly

expanded to 100 times the mass of our sun.

Led by astrophysicist naoki Yoshida of nagoya university and a team of colleagues, the supercomputer simulations of the first primordial stars’ formation are partly based on data from nAsA’s Wilkinson

Microwave Anisotropy Probe. the nAsA probe is analyzing the universe’s oldest light, which has been traveling across the universe for 13.7 billion years.

Yoshida’s team spent nearly eight years on the experiment, and each simulation took a

month of computer time. Even though the theoretical universe exists only as a set of equations operating in a supercomputer, it has provided critical information about the origins of early stars and may help scientists better understand early star formation.

Theoretical Astrophysics

The Universe’s First Star