11 class 2012 socialnetworks

Upload: spudddd

Post on 05-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    1/40

    Systems challenges inonline social media

    Alan Mislove

    College of Computer and Information ScienceNortheastern University

    February 22nd, 2012, Networks Class

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    2/40

    22.02.12 Networks Class Alan Mislove

    Social networking sites (Web 2.0)

    Popular way toconnect and share contentPhotos, videos, blogs, profiles, news, status...

    MySpace (275 M), Facebook (500 M)

    Growing exponentially

    Incredible amounts of content being sharedFacebook (7.5 B photos/month)

    YouTube (48 hours of video/min)

    2

    0

    125

    250

    375

    500

    04 06 08 10FacebookU

    sers(millions)

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    3/40

    22.02.12 Networks Class Alan Mislove

    Whats new in Web 2.0?

    3

    Web 2.0Web 1.0

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    4/40

    22.02.12 Networks Class Alan Mislove

    My groups research

    Thesis: OSNs (Web 2.0) fundamentally different from Web 1.0

    Introducing new and unforeseen challengesNeed new approaches to address these challenges

    Leveraging social networks results in better systemsDue to the increasing integration of systems and social networks

    But, must be backed by measurement and analysis

    My groups research is motivated by effects of this changeWill give three examples today

    4

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    5/40

    Alan Mislove12.06.10 University of Massachusetts, Boston

    Effect 1:

    Changing patterns of content creation + exchange

    submitted tousenix12

    5

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    6/40

    22.02.12 Networks Class Alan Mislove 6

    Pre-2005 Web (a.k.a. Web 1.0)

    Telecom ItaliaTelvia

    Fastweb NGI

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    7/40

    22.02.12 Networks Class Alan Mislove

    Difference 1: Content popularity

    7

    Fraction of documents(ranked from most to least popular)

    Fractionof

    requests

    facebook photos [2]

    classic web [1]

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 0.2 0.4 0.6 0.8 1

    [1] Breslau et al., INFOCOM, 1999, [2] Mislove et al., WSDM, 2010

    even distribution

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    8/40

    22.02.12 Networks Class Alan Mislove

    Implication: Caches less effective

    Popularity distribution much more evenObjects have more narrow scope

    In classic Web:Caching top 10% serves between 55% [1] and 95% [2] of requests

    Success of CDNs, web caches, ...

    In online social media:Caching top 10% would only serve 27% [3] of requests

    8

    [1] Breslau et al., INFOCOM, 1999, [2] Arlitt et al. IEEE Network, 2000, [3] Mislove et al., WSDM, 2010

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    9/40

    22.02.12 Networks Class Alan Mislove 9

    Difference 2: Content generation

    Telecom ItaliaTelvia

    Fastweb NGI

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    10/40

    22.02.12 Networks Class Alan Mislove

    Implication: Workload change

    Significant content creation at networks edgeEase of digital content creation (photos, video)

    Ubiquity of Internet access (cell phone, iPad)

    In classic Web:Workload was center-to-edge

    Caching, CDNs take load off origin server

    In online social media:Workload is edge-to-edge

    Significant geographic locality

    10

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    11/40

    22.02.12 Networks Class Alan Mislove

    How is OSN content being delivered?

    Web 1.0 centralized architectures dominateAkamai, Limelight, Clearway, ...

    Facebook serves much of its own content

    Mismatch between infrastructure, workload

    Workload is naturally decentralizedEvery Facebook upload goes via CA

    Can we build a workload-matching distribution system?Avoid unnecessary, expensive transfers

    11

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    12/40

    22.02.12 Networks Class Alan Mislove

    WebCloud: Decentralized delivery

    First step towards decentralized Web content deliveryChallenge: Web doesnt support decentralization

    Browsers distinct from Web servers

    Use novel techniques to allow browser to serve contentNo client-side changes

    Users help serve content they upload

    Result: Scalable, workload-matching architecture

    Next: Brief technical discussion

    12

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    13/40

    22.02.12 Networks Class Alan Mislove

    WebCloud design overview

    Goal: Move towards more decentralized content exchangeKeep content exchange at the edge

    Want to make it work with todays sites, browsers

    Reason: Users wont install anythingCant require users to do anything different

    Idea: Introduce a middlebox to allow browsers to communicate

    To build WebCloud, need to makeClient-side changes

    Deploy middleboxes

    13

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    14/40

    22.02.12 Networks Class Alan Mislove

    Client-side changes

    Want to turn web browser into web serverImplement WebCloud in Javascript

    Add it to the sites pages

    Use LocalStorage to storage browsed contentPersistent cache, up to 5MB/site

    Easily programmatically accessed

    Treated like LRU cache

    Use WebSockets/XHR to communicate with middleboxAllows bi-directional communication

    Onlineclient is always connected to middlebox

    14

    +

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    15/40

    22.02.12 Networks Class Alan Mislove

    Middleboxes

    Add redirector proxies in each ISPLike Akamai proxy, but doesnt store any content

    Maintains open connect to online web visitors

    Run by OSN provider

    Clients connect to proxyInform proxy of locally stored content

    Clients request content from proxyProxy checks for other local clientsFound: fetches content, forwards to requestor

    Not found: fetches content from origin site

    15

    Redirector

    proxy

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    16/40

    22.02.12 Networks Class Alan Mislove

    Putting it all together

    Overall, WebCloud serves as a distributed cache

    Use content-hashes to ensure integrity

    Privacy implicationsk-anonymity for viewers

    16

    Redirector

    proxy

    Client A

    Client B

    Internet

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    17/40

    22.02.12 Networks Class Alan Mislove

    WebCloud applied to real-world site

    Top-50 U.S. web siteSimulation based on Akamai logs

    Would dramatically reduce bandwidth requiredSavings for both site and ISP

    17

    00:00

    Fri

    00:00

    Sat

    00:00

    Sun

    00:00

    Mon

    00:00

    Tue

    00:00

    Wed

    00:00

    Thu

    20

    40

    60

    80

    100

    120

    00:00

    Fri

    Time(one week)

    B

    andwidth

    (MB/s)

    76% reduction in 95th percentile bandwidth

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    18/40

    22.02.12 Networks Class Alan Mislove

    Summary

    Beginnings of shift in patterns of content creation + exchangePatterns changing from center to edge to edge to edge

    Less biased popularity distribution

    But, still using centralized delivery architectures

    WebCloud: Step towards decentralized Web content deliveryUsers help serve content they create

    Implemented using existing browser features; no client changes

    Evaluation demonstrated practicality, efficacy

    18

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    19/40

    Alan Mislove12.06.10 University of Massachusetts, Boston

    Effect 2:

    Changing meaning of accounts/identity

    nsdi11

    19

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    20/40

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    21/40

    22.02.12 Networks Class Alan Mislove

    Sybils

    Free accounts with privileges leading to Sybil attacks [IPTPS 2002]Single person creates many accounts

    Why?

    Natural: Gain extra privilegesIncentives set up to encourage this

    Examples in the wildMaze [ICDCS 2007]

    Digg [NSDI 2009]

    TripAdvisor [NYT, 10/2011]

    Facebook, Gmail [me, others]

    21

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    22/40

    22.02.12 Networks Class Alan Mislove 22

    Example: Online marketplaces

    Among most successful Web siteseBay alone: $62 b in 2010

    But, known to suffer from fraud

    Auctions

    Marketplace

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    23/40

    22.02.12 Networks Class Alan Mislove

    Identities and reputations

    23

    $40

    $25

    $90

    $2

    $5

    $1

    $300

    $90

    $50

    Feedback profile

    $300

    $90 $90

    Significant monetary lossesRecent arrest of user who stole $717 k from 5,000 users

    Used >250 accounts

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    24/40

    22.02.12 Networks Class Alan Mislove

    Bazaar: A new approach

    New approach to strengthening user reputations

    Leverages an (existing) risk network

    Focuses on protecting buyers from malicious sellers

    Works in conjunction with existing marketplaceAssumes same feedback system as today

    No additional monetary cost

    No strong identities

    Insight: Successful transactions represent shared riskBuyer and seller more likely to enter into future transactions

    24

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    25/40

    22.02.12 Networks Class Alan Mislove

    Bazaars risk network

    Successful transaction two identities linked

    Weighted by amount of transaction

    Risk network automatically generatedUsers need not even know about it

    25

    $5

    $25

    $1

    $7

    $45

    $4$10

    $3

    $50$10

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    26/40

    22.02.12 Networks Class Alan Mislove

    Estimating risk

    26

    Bazaar calculates max-flow between buyer and sellerIf max-flow lower than potential transaction, flag as fraudulent

    $5

    $300

    $4000

    $50

    $200

    $100

    Max-flow: $5

    Buyer

    Seller

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    27/40

    22.02.12 Networks Class Alan Mislove

    Summary

    Increasing trend of online services with free accountsOpens new vector for attack

    Focused on reputation manipulation in online marketplacesBazaar: A new approach to strengthening reputations

    Evaluated on 10 m auctions from eBay UKWould have prevented 164 k of negative feedback

    Only in five categories over 90 days

    Currently looking to apply techniques to other domains

    27

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    28/40

    Alan Mislove12.06.10 University of Massachusetts, Boston

    Effect 3:

    Changing requirements of end users

    imc11

    28

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    29/40

    16.10.2009 CCIS/COE Retreat Alan Mislove 29

    Privacy on OSNs

    Privacy is a significant issue on OSNs

    Received recent press, research attention

    What is underlying privacy debate?

    1. Sites control personal information of millions of users

    2. Users are expected to manage their privacy

    5,830 word privacy policyOver 100 different settings

    Default is open-to-the-world (over 800 million users)

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    30/40

    16.10.2009 CCIS/COE Retreat Alan Mislove

    A fundamental shift for users

    Prior to OSNs

    Users were largely content consumers

    Now, with sites like Facebook

    Users expected to be content creators and managersMust enumerate who is able to access every uploaded content

    Avg. 130 friends, 90 pieces of content/month...

    Whats the extent of privacy problem?

    So far, most studies anecdotal

    Can we quantify the extent of the privacy problem on Facebook?

    30

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    31/40

    16.10.2009 CCIS/COE Retreat Alan Mislove

    Facebook privacy model

    Consider Facebook-supported content:

    Photos, Videos, Statuses, Links and Notes

    Five sharing granularities:

    Only Me (Me)Some Friends (SF)

    All Friends (AF)

    Friends of Friends (FoF)

    Everyone (All)

    31

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    32/40

    16.10.2009 CCIS/COE Retreat Alan Mislove

    Measuring desired and actual settings

    32

    Design a Facebook survey application

    Collects actual setting for all content

    Selects up to 10 photos

    Asks user about desired privacy setting

    Recruit using Amazon Mechanical Turk

    Total of 200 Facebook users

    Pay them each $1

    116,553 actual settings

    1,675 desired settings

    Study was conducted under Northeastern IRB protocol #10-10-04

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    33/40

    16.10.2009 CCIS/COE Retreat Alan Mislove

    What are the existing privacy settings?

    36% of all content shared with the default (visible to all users)

    Photos have the most privacy-conscious settings

    33

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    Photo Video Status Link Note

    Fraction

    of

    Content

    Only Me

    Some Friends

    All Friends

    Friends of Friends

    Everyone

    Default

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    34/40

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    35/40

    16.10.2009 CCIS/COE Retreat Alan Mislove

    What about photos with modified settings?

    Settings match only for 39% of privacy-modified photosEven when user has explicitly changed setting

    Take-away: Not just poor defaults

    Users have significant trouble managing their privacy35

    Actual

    Setting

    De ired Setting

    Me SF AF FoF All

    Me

    SF

    AF

    FoF

    All

    2 6 4 0 4

    2 12 29 8 11

    40 8 237 40 69 218 (28%)

    39 17 148 45 47

    0 0 0 0 0

    Total 54 (33% 296 (39%)

    Additional 768 photos with non-default privacy settings

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    36/40

    16.10.2009 CCIS/COE Retreat Alan Mislove

    Can we improve sharing mechanisms?

    Can we provide better management tools?

    Ease users role as content manager

    Idea: Leverage the structure of the social network

    Create privacy groups from users friends

    Update the groups as the user forms or breaks friendships

    36

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    37/40

    16.10.2009 CCIS/COE Retreat Alan Mislove

    Automatically detecting friendlists

    Friendlists: Facebook feature similar to Google+ Circles

    Ground truth; Meaningful groupings of users for privacy

    Collected 233 friendlists from our 200 AMT users

    Do friendlists correspond with the social network?Normalized conductance [WSDM10] rates the quality of community

    Strongly positive values indicate significant community structure

    Results on 233 friendlists:Over 48% friendlists correspond to strong communities

    May be able to be inferred from social network

    37

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    38/40

    16.10.2009 CCIS/COE Retreat Alan Mislove

    Summary

    Privacy an important issue on OSNs

    But, to date, no quantification of privacy problem

    Develop methodology to measure actual, desired privacy settings

    Deployed to 200 Facebook users from AMT

    Findings:

    36% of all content shared with the default settings

    Privacy settings match expectations less than 40% of the timeEven when users has already modified setting

    But, potential to aid users by providing better mechanisms

    38

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    39/40

    22.02.12 Networks Class Alan Mislove

    Conclusion

    Social networks and computer systems increasingly integratedNew way of organizing information

    Leading to new opportunities, challenges

    My groups goal: Leverage social networks in systems design

    WebCloud: Addresses challenges with emerging workloads

    Bazaar: Addresses challenges with free accounts

    Privacy: Addresses difference between privacy perception andreality

    39

  • 7/31/2019 11 CLASS 2012 SocialNetworks

    40/40

    Questions?

    Work done in collaboration with

    Ben Adams (MPI-I), Bobby Bhattacharjee (University of Maryland), Meeyoung Cha (KAIST),

    Peter Druschel (MPI-SWS), Krishna P. Gummadi (MPI-SWS),

    Andreas Haeberlen (University of Pennsylvania), Ancsa Hannk (Northeastern University),

    Jonathan Katz (University of Maryland), Hema Swetha Koppula (Yahoo Research India),

    Sune Lehmann (TU Copenhagen), Yabing Liu (Northeastern University),

    Arash Molavi (Northeastern University), Jukka-Pekka Onnela (Harvard University),

    Ansley Post (Google), J. Niels Rosenquist (Harvard Medical School),

    Neil Spring (University of Maryland), Ravi Sundaram (Northeastern University),

    Malveeka Tewari (University of California, San Diego), Bimal Viswanath (MPI-SWS),

    Liang Zhang (Northeastern University), Fangfei Zhou (Northeastern University)