architectures and algorithms for internet-scale (p2p) data management by joe hellerstein

Upload: dhruv-jain

Post on 08-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    1/173

    Architectures and Algorithms for

    Internet-Scale (P2P) Data Management Joe Hellerstein

    Intel Research & UC Berkeley

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    2/173

    Powerpoint Compatibility Note

    This file was generated using MS PowerPoint 2004 for Mac. It may not display correctly in other versions of PowerPoint. In particular,

    animations are often a problem.

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    3/173

    Ov er v iew

    Preliminaries What, Why The Platform

    Uple v eling Network Data

    Independence

    Early P2P architectures Client-Ser v er

    Floodsast Hierarchies A Little Gossip Commercial Off erings Lessons and Limitations

    Ongo i ng Research Structured Ov erlays: DHTs Query Pr ocessi ng on Ov erlays St ora ge M odels & Systems

    Se curity a nd Tr ust Joi ni ng the fun Tools a nd P latfo rm s Closi ng thoughts

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    4/173

    Acknowledgments

    For speci fic content in t hese sl ides Fr ans K aa s hoek Petros M an iat is

    Sylvia Ratn as amy Timot hy Roscoe Scott S henker

    Add it ion al Coll abor ators Brent C hun, Tyson

    Cond ie, Ry an Huebsc h,D avid K arge r, Ankur Jain,Jinyang Li, Boon T hau Loo, Robe rt Mo rris,S riram Ram ab had ran,Se an Rhe a, Ion Sto ic a,D avid Wet he rall

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    5/173

    Preliminaries

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    6/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    7/173

    Scoping the Tutorial

    Architecture s an d Algorith ms f or Data Manage ment The peril s o f overview s Can t cover everything. So much here!

    So me intere sting thing s we ll skip Se mantic Me diation: data integration on steroi ds

    E.g., Hyperion (Toronto), Piazza (UWa sh), etc. High-Throughput Co mputing

    I.e. The Gri d

    Co mplex data analy si s/ re duction /mining E.g. p2p di stribute d in f erence, wavelet s, regre ss ion, matrixco mputation s, etc.

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    8/173

    Mo v ing Past the P2P Moniker:The Platform

    The P2P name has lots of connotations Sim ple filestealing systems Very en d-u ser -centri c

    Our fo cus here is on: Many parti ci pating ma chines, symmetri c in f un ction

    Very Large S cale (MegaNo des, not PetaBytes) Minimal (or non -existent) management Note: user mo del is flexible

    Co ul d be embe dde d (e.g. in OS, HW, firewall, et c.) Large -s cale hoste d ser v i ces a la Akamai or Google

    A key to a chie v ing a utonomi c com pu ting ?

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    9/173

    Ov erlay Networks

    P2P a pp l icat io ns nee d to: Tra ck ide nt it ies & (IP) a ddresses o f peers

    May be ma ny !Ma y ha v e s ignifica nt Chu r nBest not to ha v e n2 ID re f ere nces

    Ro ute messa ges a mo ng peers If y o u do n t k ee p tra ck o f all peers, t his is mu lt i-ho p

    This is a n o v erla y networ k Peers are do ing bot h na ming a nd ro ut ing IP be co mes just t he low -le v el tra ns port

    All t he IP ro ut ing is o paque

    Co ntrol o v er na ming a nd ro ut ing is power ful And as we ll see, br ings networ k s into t he database era

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    10/173

    Man y New Challenges

    Rela tive to ot he r pa rallel /dis trib ut e d s y s t e ms Pa rtial f a il ure Ch urn Few g ua ran t ees on trans port , s torage, e tc. Huge optimiza tion s pa ce Ne t w ork b ott lene cks & ot he r res ourc e cons tra in t s N o a dmin is tra tive organ iza tions Trus t iss ues: se curity , priva cy , in cen tives Rela tive to IP ne t w orking M uch h ighe r fun ction, more f lexible M uch less con tro llable /pr e dict able

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    11/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    12/173

    W h y Bother II: Positi v e Lessons from Filestealing

    P 2P enables organic scaling Vs. the to p few killer ser v ices -- no VCs req uire d! Can affor d to place more bets , tr y wacky i deas

    Centralize d ser v ices engen der scr utin y Tracking users is tri v ial Pro v i der is liable (for mis use, for downtime, for local laws,

    etc.)

    Centralize d means b usiness

    Nee d to pa y off start up & maintenance ex penses Nee d to protect against liabilit y Business req uirements dri v e to partic ular short -term goals

    Trage dy of the commons

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    13/173

    W h y Bother III ? Intellectual moti v ation

    Hea dy mix o f theor y an d system s Great communit y o f re searcher s ha v e gathere d Al gorithm s, Networkin g, Di stribute d S ystem s,

    Databa se s Health y set o f publication v enue sI PTPS worksho p a s a catal yst

    Sur pri sin g de gree o f collaboration acro ss area sIn part su pporte d b y N SF Lar ge ITR ( project IRI S)

    UC Berkele y , ICSI, MIT, NYU, an d Rice

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    14/173

    Infecting the Networ k, Peer-to-Peer

    The Internet i s h ar d to ch ange.But Over lay Net s are e as y! P 2P i s a won derf ul ho st for infecting networ k de sign s The next Internet i s li k e ly to be very different

    N am ing i s a k ey de sign i ssu e to da yQuerying an d da t a in de pen dence k ey to morrow?

    Don t forget: The Internet w as origin ally an over lay on the te le phone

    networ k There i s no money to be mad e in the bit- shi pping b us ine ss

    Amo de st go al for DB re se arch: Don t query the Internet.

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    15/173

    Infecting the Networ k, Peer-to-Peer

    Amo de st go al for DB re se arch: Don t query the Internet.

    Be the Internet.

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    16/173

    Some Guiding Applications

    N

    Int e l Re se arch & UC Be rke ley

    LOCK SS

    Stanford , HP Lab s, Sun , Harvard , Int e l Re se archLibe ration Ware

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    17/173

    N : Pub lic He alth for the Int e rne t

    Se curit y tools focus e d on me dicine Vaccine s for Viruse s Impro ving the w orld one pati e nt at a time We akne ss/opportunit y in the Pub lic He alth are na Public Health: population-focused, community-oriented Epide miolog y: incidence, distribution, and control in a

    population

    N A New A pproach Perform population-wide measurement Enable massive sharing of data and query results

    The Internet Screensaver Engage end users: education and prevention Understand risky behaviors, at-risk populations.

    Prototype running over PIER

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    18/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    19/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    20/173

    N V ision: Networ k O racle

    Supp ose t here existe d a Networ k O racle Answerin g questions abo ut c urrent Internet state

    Ro utin g tables, lin k loa ds, latencies, f irewall events, etc.

    How wo ul d t his c han ge t hin gs Social c han ge ( Public Healt h, sa f e co mpu tin g)Me di um ter m c han ge in distrib ute d a pplication desi gn

    Currently distrib ute d a pp s do so me o f t his on t heir own Lon g ter m c han ge in networ k protocols

    App- s peci f ic c usto m ro utin g Fa ult dia gnosis Etc.

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    21/173

    L OCK SS: L ots O f C opies K eep Stuff Safe

    Di gita l Prese rvatio n of Aca de mi c Mate ria ls L ib ra ria ns a re s ca re d wit h goo d reaso n Access depe nds o n t he fate of t he pub lis he r Ti me is u nki nd to bits afte r de ca des Ple nty of e ne mies (i deo lo gies, gove rnm e nts, co rpo ratio ns )Goa l: Arch iva l sto ra ge a nd a ccess

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    22/173

    L OCK SS Approach

    C ha lle nge s: Very low -co st har dware, opera tio n a nd a dminist ra tio n No ce nt ra l co nt ro l Re spec t f or acce ss co nt ro ls

    A lo ng-t er m hor izo nM ust a ntic ipa t e a nd de gra de grace fully wit h Unde t ec t e d b it ro t S ust a ine d a tt acks

    Esp. S t ea lt h mo difica tio n

    So lutio n: P2P a uditing a nd repa ir syst e m f or rep lica t e d doc s

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    23/173

    L iberation W are

    Take yo ur f avorite Internet a ppli cation W eb ho stin g, sear ch, IM, f i le sharin g, Vo IP, e mai l, et c. Con si der us in g centra lize d ver sion s in a co untry wit h a

    re pre ss ive govern ment

    Tra ckabi lity an d liabi lity wi ll prevent t hi s bein g use d f or f ree spee ch No w con si der p2p

    En han ce d wit h a pp ro priate se curity /p riva cy prote ction sCo uld be t he me di um o f t he next To m Paine s

    Exa mple s: FreeNet, Pub li us , FreeHaven p2p stora ge to avoi d cen sor shi p & guarantee priva cy PK I-en crypte d stora ge Mix-net priva cy-pre servin g ro utin g

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    24/173

    U ple v eling : Networ k Data Independence

    SIGMOD Record, Sep. 2003

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    25/173

    R ecall C odd s Data Independence

    Deco uple app -level API fro m data o rgan izat ion C an make c han ges to data layo ut wit ho ut

    mod if yin g appl icat ions

    Simple ve rs ion: locat ion -independent na mes

    Fanc ie r: decla rat ive q ue ries

    A s clea r a pa rad igm s hif t as we can hope to find in co mp ute r sc ience- C . Papad imit rio u

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    26/173

    T he Pillars of Data Independence

    Indexes Val ue -based look ups have

    to co mpete with direct access

    Must adapt to shiftin g data distrib utions

    Must guarantee perfor mance

    Query Opti mization Support declarative q ueries

    beyond look up /search Must adapt to shiftin g data

    distrib utions Must adapt to chan ges in

    environ ment

    DBMSB-T ree

    Join Orderin g, A M Selection,etc.

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    27/173

    G eneralizing Data Independence

    A cla ss ic level of indirecti on sc he me Indexe s are exactly t hat Complex q uerie s are a ric her indirecti on

    The key for data independence: It s all ab out rate s of c hange

    Heller stein s Data Independence Ineq uality: Data independence matter s when

    d(envir on ment) /dt >> d(app) /dt

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    28/173

    Data Independence in Networ k s

    d(en v ironment) /dt >> d(app) /dt

    In databases, t he RHS is un us ua lly sma ll This dro v e t he re lationa l database re v o lution

    In extreme networ k ed systems, LH S is un us ua lly hi gh A nd t he app lications increasin gly comp lex and data -dri v en Simp le indirections (e. g. loca l loo k aside tab les) ins uff icient

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    29/173

    T he Pillars of Data Independence

    Indexes Val ue -based look ups have

    to co mpete with direct access

    Must adapt to shiftin g data distrib utions

    Must guarantee perfor mance

    Query Opti mization Support declarative q ueries

    beyond look up /search Must adapt to shiftin g data

    distrib utions Must adapt to chan ges in

    environ ment

    DBMS P 2P B-T ree Content -

    A ddressable OverlayNet works (DHT s )

    Join Orderin g, A M Selection,etc.

    Multiquerydataflo w sharin g?

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    30/173

    E arl y P2P

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    31/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    32/173

    E arl y P2P I : C lient-Ser v er

    Na ps ter C -S sear ch

    xyz.mp3

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    33/173

    E arl y P2P I : C lient-Ser v er

    Na ps ter C -S sear ch

    xyz.mp3 ?

    xyz.mp3

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    34/173

    E arl y P2P I : C lient-Ser v er

    Na ps ter C -S sear ch pt2 pt f ile xf er

    xyz.mp3 ?

    xyz.mp3

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    35/173

    E arl y P2P I : C lient-Ser v er

    Na ps ter C -S sear ch pt2 pt f ile xf er

    xyz.mp3 ?

    xyz.mp3

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    36/173

    E arl y P2P I : C lient Ser v er

    S E TI@Home Ser v er a ss i gn s work unit s

    My ma chine in fo

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    37/173

    E arl y P2P I : C lient Ser v er

    S E TI@Home Ser v er a ss i gn s work unit s

    Ta sk: f(x)

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    38/173

    E arl y P2P I : C lient Ser v er

    S E TI@Home Ser v er a ss i gn s work unit s

    Re sult : f(x)

    60 TeraFLOPS!

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    39/173

    E arl y P2P II : Flooding on Ov erla y s

    xyz.mp3 ?

    xyz.mp3

    A n o v erla y networ k . U nstructured .

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    40/173

    E arl y P2P II : Flooding on Ov erla y s

    xyz.mp3 ?

    xyz.mp3

    Flooding

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    41/173

    E arl y P2P II : Flooding on Ov erla y s

    xyz.mp3 ?

    xyz.mp3

    Flooding

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    42/173

    E arl y P2P II : Flooding on Ov erla y s

    xyz.mp3

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    43/173

    E arl y P2P II. v: U ltrapeers

    U ltrapeers ca n be installe d (KaZa A ) or sel f-pr omo te d (Gnutella )

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    44/173

    H ierarchical Networ k s (& Queries)

    IP H ierarchical na me s pace ( www.vl db.or g, 141.1 2.12.51) H ierarchical routi ng

    A uto no mous Syste ms correlate with na me s pace (thou gh not per f ectly) A strolabe [Bir ma n, et al. TOC S 03]

    OL A P-style a gg re gate queries dow n the IP hierarchyDN S H ierarchical na me s pace ( clie nts + hierarchy o f servers) H ierarchical routi ng w /a gg ressive cachi ng

    13 ma na ge d root servers IrisNet [Desh pa nde, et al. SIGMOD 03]

    X path queries over (selecte d ) DN S (sub) -trees.

    Tra ditio nal pros /co ns o f H ierarchical data mgm t Wor k s well f or thi ngs ali gne d with the hierarchy

    Es p. physical locality a la A strolabe Inf lexible

    No data i nde pe nde nce!

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    45/173

    C ommercial O fferings

    JXT A Java /X ML Frame work for p2p a pplica t ions Name resol ut ion an d ro ut ing is done wi th floo ds &

    s uper peers C an al ways a dd yo ur o wn if yo u like

    MS WinX P p2p ne tworking A n uns t r uc ture d overlay, floo de d publica t ion an d cac hing does no t ye t s upp or t dis t rib ut e d searc hesBo th have some sec uri t y s upp or t A uth en t ica t ion via signa tures (ass umes a t r us t e d a uth ori t y) Encrypt ion of t raffic

    Groove Pla t form for p2p experience . IM an d async h collab t ools.

    C lien t-serveris h name resol ut ion, back up services, e t c.

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    46/173

    L essons and L imitations

    Client -Se rve r pe rf o rms we ll But not a lways f easib le

    Idea l pe rf o rman ce is o f ten not t he key iss ue!

    Thin gs t hat flood -based systems do we ll Organi c s ca lin g De cent ra lization o f visibi lity and liabi lity Findin g po pula r st uff Fan cy lo ca l que ries Thin gs t hat flood -based systems do poo rly

    Findin g un po pula r st uff [L oo, et a l V

    L DB 04] Fan cy dist rib uted q ue ries

    Vulne rabi lities: data poisonin g, t ra ckin g, et c. Gua rantees abo ut anyt hin g (ans we r qua lity, priva cy, et c.)

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    47/173

    A L ittle G ossip

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    48/173

    G ossip Protocols ( E pidemic Algorithms)

    Origi na lly t argeted at d at ab ase replic atio n [Demers, et al. PODC 87] E speci ally nice f or un str uct ured net works R umor-mo ngeri ng : prop ag ate ne wly-received upd ate to k r an dom

    neighbors E xte nded to ro uti ng

    Poi nt-to-poi nt ro uti ng [Vahd at /Becker TR

    , 00] R umor-mo ngeri ng o f queries i nste ad o f f loodi ng [Haa s, et al Inf ocom 02]E xte nded to aggreg ate comp ut atio n [Kempe, et al, FOCS 03]Mostly theoretic al ana lyses Us ua lly o f t wo f orms :

    Wh a t is the tippi ng poi nt where an epidemic i nf ects the whole pop ul a tio n? (Percol atio n theory) Wh a t is the expected # o f mess ages f or i nf ectio n?

    A Cor nell speci alty Demers, Klei nberg, G ehrke, H alper n,

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    49/173

    Structured Ov erla y s : Distributed

    H ash T ables (D HT s)

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    50/173

    D HT O utline

    H i gh- level ove rvie wFun dam ent al s of st ru ctu re d net work t opo l og ie s A n d examp le s

    O ne con crete D HT ChordSome syste ms i ssue s St orag e mod el s & sof t st ate Loca lity

    Chu rn ma n ag e ment

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    51/173

    H igh- L e v el Idea : Indirection

    Indirection in space L ogical (content-ba sed ) I Ds, ro uting to tho se I Ds

    Content-addre ssable net work Tolerant o f ch urn

    node s joining and lea v ing the net work to h

    y

    z

    h=y

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    52/173

    H igh- L e v el Idea : Indirection

    Indirection in space L ogical (content-ba sed ) I Ds, ro uting to tho se I Ds

    Content-addre ssable net work Tolerant o f ch urn

    node s joining and lea v ing the net work

    Indirection in ti me Want so me sche me to te mporally deco uple send and recei v e Per si stence req uired. Ty pical Internet sol ution : so f t state

    Co mbo o f per si stence v ia storage and v ia retry Publi sher req ue st s TTL on storage Re publi she s a s needed

    Meta phor : Di strib uted H a sh Table

    to h z

    h=z

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    53/173

    W hat is a D HT ?

    H ash T ab le data st ruc t ure that ma ps keys t o va lues esse ntia l b ui ldi ng b lock i n s of t wa re syste ms

    Dist rib ute d H ash T ab le (D HT ) si mi la r, b ut s prea d a cross the In te rnet

    Inte rf a ce i nse rt(key, va lue ) lookup( key)

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    54/173

    H ow ?

    Eve ry DH T no de supp o rts a single o pe ratio n:

    Give n k ey as input; ro ut e me ssag e s t ow ard no de ho lding k ey

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    55/173

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    D HT in action

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    56/173

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    D HT in action

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    57/173

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    D HT in action

    Operation: take key as input; route messages to node holding key

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    58/173

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    D HT in action : put()

    insert(K 1 ,V1 )

    Operation: take key as input; route messages to node holding key

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    59/173

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    D HT in action : put()

    Operation: take key as input; route messages to node holding key

    insert(K 1 ,V1 )

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    60/173

    (K 1 ,V1 )

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    D HT in action : put()

    Operation: take key as input; route messages to node holding key

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    61/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    62/173

    retrieve (K 1 )

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    K V

    Iterati v e v s. R ecursi v e R outing

    Operation: take key as input; route messages to node holding key

    Previously show e d rec ursive. A noth e r option : it e ra tive

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    63/173

    D HT Design G oals

    A n ove rlay ne two rk wi th : Flexible ma pping o f keys t o physi cal no des Small ne two rk dia me t e r Small deg ree (f ano ut)

    Lo cal ro ut ing de cisions Rob us t ness t o chur n Ro ut ing f lexibili t y De cen t lo cali t y (lo w s tre tch )A s t o rage o r me mo ry me chanis m wi th No g ua ran t ees on pe rsis t en ce Main t enan ce via so ft s t a t e

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    64/173

    Peers v s Infrastructure

    Peer: A pplicat ion users pr ovide n odes f or DHT Exa mples: f iles har in g, etc

    Infrastructure: Set of mana ge d n odes pr ovide DHT ser vice

    Per ha ps ser v e many a pplicat ions A p2p incubat or ?We ll discuss t his at t he en d of t he tut or ia l

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    65/173

    L ibrar y or Ser v ice

    L ibrar y : DHT co de b undle d i nt o a pplica t io n Runs o n eac h no de r unn i ng a pplica t io n Eac h a pplica t io n req uire s o wn ro ut i ng

    i nf ra st r uc tu re

    Ser v ice: si ngle DHT share d b y a pplica t io ns Require s co mmo n i nf ra st r uc tu re

    But e li mi na t e s dupl ica t e ro ut i ng syst e ms

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    66/173

    D HT O utline

    H i gh- level ove rvie wFun dam ent al s of st ru ctu re d net work t opo l ogie s A n d examp le sO ne con crete D HT ChordSome syste ms i ssue s St orag e mod el s & sof t st ate Loca lity Chu rn ma n ag e ment

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    67/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    68/173

    An E xample D HT : C hord

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    69/173

    An E xample D HT : C hord

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    70/173

    An E xample D HT : C hord

    Overlayed 2k -Gon s

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    71/173

    R outing in C hord

    A t mo st one o f e ach Gon E.g. 1-to -0

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    72/173

    R outing in C hord

    A t mo st one o f e ach Gon E.g. 1-to -0

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    73/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    74/173

    R outing in C hord

    A t mo st one o f e ach Gon E.g. 1-to -0

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    75/173

    R outing in C hord

    A t mo st one o f e ach Gon E.g. 1-to -0

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    76/173

    R outing in C hord

    A t mo st one o f e ach Gon E.g. 1-to -0Wh at h app ened?

    We con stru cted the bin ary nu mber 15! R outing f ro m x to y

    i s like co mputing y - x mod n bysu mm ing po wer s o f 2

    4

    1

    8

    2

    Di am eter: log n (1 ho p per gon ty pe )Degree: log n (one out link per gon ty pe )

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    77/173

    W hat is happening here ? Algebra!

    Un derlying gr oup -the oreti c str uct ure Re call a gr oup is a set S an d an operat or s uch that:

    S is cl ose d un der Ass ociativity: ( AB)C = A(BC)There is an i dentity ele ment I S s.t. IX = XI = X for all X S

    The re is an inve rse X -1

    S for e ach e le me nt X S s.t. XX -1 = X -1X = I

    The ge ne ra t ors of a g roup Ele me nts {g 1, , g n} s.t. applica t ion of t he ope ra t or on t he

    ge ne rat ors produc es all t he me mbe rs of t he g roup .

    Canonical exampl e: (Z n, + ) I de nt ity is 0 A set of ge ne ra t ors: {1 } A diff e re nt set of ge ne rat ors: { 2, 3}

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    78/173

    C a y le y G raphs

    The C a y le y G raph (S , E ) of a gr oup: Ver tices corresp onding to t he und erl ying se t S E dges corresp onding to t he a ctions of t he ge nera tors

    ( Comple t e) C h or d is a C a y le y graph for ( Z n,+)

    S = Z mod n ( n = 2k ).

    G e nera tors {1 , 2, 4, , 2k -1} Tha t s wha t t he gon s are all ab out !

    Fa ct : Mos t ( comple t e) DHTs are C a y le y graphs A nd t he y didn t eve n know it ! Foll ows f r om parallel Int er Conne ct Ne twor k s ( ICNs)

    Sh own to be gr oup- t he ore tic [ A k ers /Kr ish na mu r t h y 89]

    Not e: t he ones t ha t are n t C a y le y G raphs are cose t graphs ,a rela t e d gr oup- t he ore tic s t r uctu re

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    79/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    80/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    81/173

    U pshot

    Goo d DHT topo lo gies will be Cayley /Coset gra phs A rep lay o f ICN Des ign But DHTs can use funky wiring th at was inf e as ib le in ICNs A ll the gro up -theo ret ic anal ys is be co mes s ugg est ive

    Cle an ma th des crib ing the topo lo gy he lps crisp lyanal yze e fficie ncy E.g. de gree /diam ete r t rad eo ff s E.g. sh apes o f t rees we ll see late r f o r aggr e ga t io n o r jo in

    Re ally no ex cuse to be s loppy IS A M vs. B-t rees

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    82/173

    Pastr y/B amboo

    B ase d o n P laxto n Mes h [P laxto n, et a l SP AA 97]Names are fixe d b it str ings To po lo gy : Pre fix Hyper cube For ea ch b it f rom le f t to

    r ight, pick a ne ighbor ID wit h commo n flippe d b it a nd commo n pre fix

    lo g n de gree & diameter P lus a r ing For re liab ilit y (wit h k

    pre d/ s ucc)Suffix Ro ut ing f rom A to B Fix b its f rom le f t to r ight E.g. 1010 to 0001:

    1010 0101 00100000 0001

    1010

    0101

    1100 1000

    1011

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    83/173

    C AN : C ontent Addressable Networ k

    Explo it mult iple d imens ions Ea ch node is ass igned a zone Nodes are ident ified by zone bo undar ies Jo in : chose rando m po int, s pl it its zone

    (0,0) (1,0)

    (0,1)

    (0,0.5, 0.5, 1)

    (0.5,0.5, 1, 1)

    (0,0, 0.5, 0.5)

    (0.5,0.25, 0.75, 0.5)

    (0.75,0, 1, 0.5)

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    84/173

    R outing in 2-dimensions

    R outing is n avig ating a d-dimension al ID s pac e R oute to closest neig hbo r in di re ction o f destin ation R outing t ab le cont ains O (d ) neig hbo rs Numbe r o f ho ps is O (d N1 /d )

    (0,0) ( 1,0)

    (0, 1)

    (0,0 .5, 0 .5 , 1)

    (0 .5 ,0 .5 , 1, 1)

    (0,0, 0 .5, 0 .5) (0 .75 ,0, 1, 0 .5)

    (0 .5 ,0 .25, 0 .75 , 0 .5)

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    85/173

    K oorde

    DeBr ui jn gr aphs Link f ro m node x t o

    node s 2x an d 2x+1 De gree 2, d iame t er lo g n

    Optimal !

    K oorde is Chord -b as ed Basically Chord, b ut with

    DeBr ui jn finger s

    No t e: No t ver t ex-symme t r ic!No t a Cayley gr aph . But a co se t gr aph o f the b utt er fly t o po lo gy.

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    86/173

    T opologies of O ther O ft-cited D HT s

    T apestry Very si mil ar to Pastry /Bam boo topology No ri ng

    K ade mli a

    A lso si mil ar to Pastry /Bam boo But the ri ng is ordered by the X O R metric Used by the O ver net /eDo nkey filesh ari ng syste m

    Viceroy A n e mul ated B utterfly net work

    Sympho ny A r an do mized s ma ll- world net work

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    87/173

    Incomplete G raphs : E mulation

    For Chor d, we assume d 2m no des.What i f not? Nee d to emulate a complete graph

    even when incomplete. Note : you ve seen this problem

    be f ore!Lit win s Linear Hashin g emulates hashtables o f len gth 2m !

    DHT-speci f ic schemes use d In Chor d, no de x is responsible f or the

    ran ge [x, succ (x) )

    The holes on the rin g shoul d be ran domly distribute d due to hashin g

    Consistent Hashin g [Kar ger, et al. STOC97]

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    88/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    89/173

    J oining the C hord R ing

    Need IP o f so me node Pi ck a r ando m ID (e.g. SH A -1(IP))Send msg to current o wner o f th at ID Th at s yo ur prede ce ssor

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    90/173

    J oining the C hord R ing

    Need IP o f so me node Pi ck a r ando m ID (e.g. SH A -1(IP))Send msg to current o wner o f th at ID Th at s yo ur prede ce ssor Upd ate pred /succ links On ce the ring i s in place, all

    i s we ll!In f or m app to move d at a app ro pri ate ly

    Se ar ch to in st all f inger s o f varying po wer s o f 2 Or j us t co py f ro m pred /succ

    and che ck!Inbo und f inger s f ixed lazi ly

    Theore m: If con si sten cy i s re ached be f ore net work do ub le s, lookups re ma in log n

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    91/173

    I C N E mulation

    A t lea st 3 gene ri c emulation sch eme s have been pro po se d [Nao r/ Wie de r SP AA 03] [ A b ra ham, et al. I PDPS 03] [Manku PODC 03]

    A s an exe rci se, f unky I C N + emulation scheme =ne w DHT IHOP: Inte rnet Ha shin g on Pan cake gra phs

    [Rataj czak /Helle rstein 04]Pan cake gra ph I C N + A b ra ham, et al. emulation.

    Ba se d on Bill Gate s onlypa pe r.Trivia que stion: who wa s hi s a dvi so r/c o -aut ho r?

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    92/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    93/173

    f

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    94/173

    Storage Models for D HT s

    Up to no w we fo cused o n ro ut ing D HT s as co nte nt -addressable net work

    Imp l icit in t he na me D HT is so me k ind of storage Or per ha ps a better word is me mory Enables ind ire ct io n in t ime But also ca n be v ie wed as a pla ce to store t hings

    Soft state is t he na me of t he ga me in Inter net syste ms

    f

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    95/173

    A Note on Soft State

    Ahyb rid pe rsis ten ce sche me Pe rsis ten ce via sto rage & ret ryJo int re spon sib ility of pub lishe r an d sto rage no de Ite m pub lishe d wit h a Time -To -Live (TTL) Sto rage no de atte mpt s to pre se rve it fo r t hat t ime

    Be st effo rt Pub lishe r want s it to la st longe r?

    Must re pub lish it (o r rene w it )

    Must ba lan ce re liab ility an d re pub lishing ove rhea d Longe r TTL = longe r potent ia l o utage b ut le ss re pub lishing On fa ilure of a sto rage no de Pub lishe r event ua lly re pub lishe s e lse he re On fa ilure of a pub lishe r Sto rage no de event ua lly ga rbage co lle ct s

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    96/173

    L li C i N i hb S l i

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    97/173

    L ocalit y - C entric Neighbor Selection

    Much recent work [Gumm a di, et al. S IGC OMM 03, A braha m, et al. SO D A 04,Dabek, et al. NS DI 04, Rhea, et al.USEN IX 04, etc.]

    We sa w f lexibilit y in neighbor selection in Pa str y/Ba mboo

    C an al so intro duce so me ran do mization into C hor d, C A N, etc.Ho w to pick

    A nalogo us to a d-hoc net works

    1. Ping ran do m no de s2. S wa p neighbor set s with neighbor s C o mbine with ran do m ping s to ex plore

    3. Provabl y -goo d algorith m to f in d nearb y neighbor s ba se d on sa mpling [Karger an d Ruhl 02]

    G d i ff

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    98/173

    G eometr y and its effects

    Some to po lo gies a llo w more c hoices Choice of nei ghbors in t he nei ghbor tab les (e. g.

    Pastr y) Choice of ro utes to send a packet (e. g. Chord ) Cast in terms of geometr y

    But rea lly a gro up- t heoretic t ype of ana ly sis

    Havin g a rin g is ver y he lpf ul for resi lience Es pecia lly wit h a decent -sized leaf set

    (s uccessors /predecessors )Sa y ~ lo g n

    [G ummadi, et a l. SIG COMM 03]

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    99/173

    Ti

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    100/173

    T imeouts

    Re call Ite ra tive vs. Re cu rsive Routi ng Ite ra tive: O ri gi na to r requests IP addr ess o f e ach ho p

    Mess ag e t ran s po rt is ac tu ally do ne vi a di re ct IP Re cu rsive: Mess ag e t ran s f e rre d ho p-by-ho pEff e ct o n timeout me chan ism Nee d to t rac k late ncy o f commu ni catio n chann e ls Ite ra tive resu lts i n di re ct nv n commu ni catio n

    Can t kee p timeout st ats at t ha t s cale So lutio n: vi rtu al coo rdi na te s chemes [Dabek et al. NSDI 04]

    Wit h re cu rsive can do T CP-like t rac ki ng o f late ncy

    Expo ne nti ally wei gh te d me an and var i anc e Ups hot: Bot h wo rk OK u p to a poi nt T CP-style does some wha t bette r t han vi rtu al coo rds at

    mo dest chu rn rates (23 mi n. o r mo re me an sessio n time ) Vi rtu al coo rds be gi ns to fai l at hi ghe r chu rn ra tes

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    101/173

    C omplex Quer y Processing

    DHT G U E lit L k

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    102/173

    D HT s G a v e U s Eq ualit y L oo k ups

    What else mi gh t we wa nt? Ra nge Sea rch A ggr e gatio n G roup B y Joi n

    Intelli ge nt Que ry Disse mi natio n

    T he me A ll ca n be built ele ga ntl y o n D HT s!

    T his is t he app roa ch we ta k e i n PIE R

    But i n so me i nsta nces ot he r s che mes a re also reaso nable I will t ry to be su re to call t his out T he f loo di ng/g ossip st ra wma n is al wa y s a v ailable

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    103/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    104/173

    PHT DHT Cont e nt Logical T rie

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    105/173

    P HT g

    PHT DHT Cont e ntsLogical T rie

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    106/173

    P HT g

    Se arch for 011101?

    PHT Search

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    107/173

    P HT Search

    Ob serve: T he DHT a llows direct addre ss ing of P HT n ode s Can jump int o the P HT at any n ode

    Interna l, lea f , or be low a lea f! S o, can f ind lea f by binary search

    log log | D| search c os t ! If you kne w (r ough ly) the data di stributi on, even better

    More over, c on sider a f ai led machine in the system Equa ls a f ai led n ode of the trie Can h op over f ai led n ode s direct ly!

    A nd c on sider c oncurrency c ontr olA link-f ree data structure: simp le !

    Reusable Lessons from P HTs

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    108/173

    R eusable L essons from P HT s

    Dire ct-a dd ress in g a lovely way t o emula t e robus t ,eff icien t l inke d da t a s t ru ct ures in the ne twork

    Dire ct-a dd ress in g requ ires re gular it y in the da t a

    s pa ce par tition in g E .g. works for re gular s pa ce -par tition in g in dexes (t r ies, qua d t rees )

    No t so s im ple for da t a -par tition in g (B-t rees, R -t rees ) or irre gular s pa ce par tition in g (kd-t rees )

    Aggregation

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    109/173

    Aggregation

    Two key ob servation s f or DHTs DHTs are multi -ho p, so hierar chi ca l aggregation

    can re duc e BWE.g., t he TAG work f or sen sornet s [Ma dden, O SDI 2002]

    DHTs provi de tree con str uction in a very nat ura l way

    But what i f I don t use DHTs? Ho ld t hat t ho ug ht!

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    110/173

    Consider Aggregation in Chord

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    111/173

    C onsider Aggregation in C hord

    Everybody sends their message to node 0Ass ume greedy j ump s (in creasing Gon -order )

    Inter ce pt messages and aggregate a long the way

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    112/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    113/173

    Aggregation in Koorde

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    114/173

    Aggregation in K oorde

    Re ca ll t he DeBr uijn gra ph : Ea ch node x point s to 2x mod n and (2x + 1 ) mod n

    (But note: not node -symmetri c)

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    115/173

    Aggregation in Pastr y/Bamboo

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    116/173

    Aggregation in Pastr y/B amboo

    De pen ds on choi ce o f neig hbors But i f y o u fli p exa ct ly one bit ea ch ho p:

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    117/173

    Metrics for Aggregation Trees

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    118/173

    Metrics for Aggregation T rees

    What makes a goo d/ ba d agg tree? Number of e dges? No!Always n -1. Wit h distrib utive /a lgebraic aggs, msg size is fixe d.

    Degree of fan -in Affects congestion

    Heig ht Deter mines latency Pre dictabi lity of s ubtree s ha pe

    Deter mines abi lity to contro l ti ming tig ht ly Stabi lity in t he face of c hurn

    Changing tree s ha pe whi le acc umulating can res ult in errors Subtree size distrib ution

    Affects jeo par dy of lost messages

    So what if I don t ha ve a D HT?

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    119/173

    So what if I don t ha v e a D HT ?

    Need anothe r t ree -con st ruc tion me chani sm T he re a re many in the NW lite rat ure (e. g. fo r multi ca st ) Requi re maintenan ce me ssa ge s akin to D HT s

    Do yo u maintain fo r the life of yo ur que ry en gine ? Or set up/ tea rdown a s needed ?

    Can pi ck a t ree sha pe of yo ur own Not at the me rcy of the D HT to po lo gie s E.g. co uld do hi gh fan -in t ree s to mini mize laten cy

    A s we noted befo re, we wi ll re use t ree -con st ruc tion

    fo r multi ple purp o se s It s handy that they re t ri v ia l in D HT s But co uld re use anothe r sche me fo r multi ple purp o se s a s

    we ll

    Or, can do a ggre gation v ia go ss i p [Ke mpe, et a l FOCS 03]

    Group By

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    120/173

    G roup By

    A p ie ce o f cake in a DHT Ever y no de se nds t up le s t o war d the hash ID o f

    the group ing co lu mns A n agg t ree is nat ur ally co nst ru ct e d per group

    No t e nice du al-purpo se u se o f DHT Hash- b as e d p ar titio ning f or p ar alle l group b y

    Ju st like p ar alle l DBMS (G amma , the Exchang e op in Vo lcano )

    A gg t ree co nst ru ctio n in mu lti-hop over lay ne twork

    Hash Join

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    121/173

    H ash J oin

    We jus t di d hash -base d gro up by.H ash -base d join is ro ugh ly t he sa me dea l, twi ce: Given R.a J oin S.b Ea ch no de:

    sen ds ea ch R tuple t o wa rd H (R.a )

    sen ds ea ch S tuple t o wa rd H (S.b )A gain, DH T gives H ash -base d pa rt i t ionin g f o r pa ra lle l hash join Tree cons truct ion (no re duct ion a lon g t he way he re, t ho ugh )No t e t he res ult in g co mmu ni ca t ion pa tt e rn A tree is cons truct e d pe r hash des t ina t ion !

    Tha t s a lo t o f trees ! No bi g dea l f o r t he DH T -- i t a lrea dy ha d t ha t t o po lo gy t he re.

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    122/173

    Symmetric Semi- Join and Bloom Join

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    123/173

    S y mmetric Semi J oin and B loom J oin

    Quer y re writin g tricks f rom di strib uted DBsSemi- J oin s a la S DD-1 But do it to bot h side s o f t he join Re write R.a J oin S.b a s

    ( semi-join ) join R.a join S.b

    Latter 2 join s can be Fetc h Matc he sB loom J oin s a la R* Require s a bit more f ine sse here A ggre gate R.a B loom f ilter s to a f ixed ha sh ID. Same f or

    S.b.

    A ll t he R.a B loom f ilter s are OR ed, event uall y m ultica sted to all node s storin g S t uple s S y mmetric f or S.b B loom f ilter Can in princi ple stream re f inin g B loom f ilter s

    Quer y Dissemination

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    124/173

    Q uer y Dissemination

    Ho w do no des f in d out about a quer y ? Up to no w we convenient ly i gnore d t his!

    Case 1: Broa dcast A s f ar as we kno w, a ll no des nee d to parti ci pate Nee d to have a broa dcast tree out o f t he quer y no de This is t he o pposite o f an a ggre gation tree!

    But ho w to instantiate it?

    Nave so lution: F loo d Ea ch no des sen ds quer y to a ll its nei ghbors

    Prob lem: no des wi ll re ceive quer y mu lti ple times waste d ban dwi dt h

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    125/173

    Q uer y Dissemination II

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    126/173

    Quer y Dissemination II

    Su ppose y ou have a sim ple equa lit y quer y Se le ct * From R W here R. c = 5 I f R.c is a lrea dy in dexe d in t he DHT, can route quer y via

    DHTQ uer y Dissemination is an a ccess met ho d Basi ca lly t he same as an in dex

    Can take more com plex queries an d disseminate sub -parts Se le ct * From R, S, T

    Where R.a = S.b A n d S.c = T. d A n d R.c = 5

    PIER

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    127/173

    PI ER

    Pee r-to- Pee r I nformation E xchang e & R e trieval P uts tog e the r man y of the t e chnique s de scribe d ab ove A ggr e ssive ly use s DHTs

    But agnostic to choice Use s Bam b oo , has worke d on C A N and Chord

    [Hueb sch , e t al. VLDB 03]

    De ploye d R unning N que rie s on ~400 nod e s around the world

    (P lane t Lab ) Simulat e d on up to 10K nod e sCurre nt A pplications I mpro ve d File sharing I nt e rne t Monitoring (N ) Customi zab le R outing via R e cursive Que rie s

    http : //pi e r.cs.be rke ley.e du

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    128/173

    Nati v e Simulation

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    129/173

    Enti re system i s e v ent -dri v en Enable s di screte -e v ent simulation to be sli d in Re pla ce s lo we st -le v el

    net wo rkin g & sche dule r Run s all t he re st o f PIER

    nati v elyV e ry hel pf ul f o r

    debu gg in g a ma ss i v elydi st ribute d system!

    Initial T idbits from PI ER E fforts

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    130/173

    Multiresol ution sim ulation criti cal Native sim ulator was hug ely hel pf ul E m ulab allo ws control over link -level performan ce PlanetLab is a ni ce a pproximation of realityDeb ugg in g still very hard Need to have a tra ced exe cution mode.

    R adiolo gi cal dye? Intensive lo gg in g?

    DB workloads on NW te chnolo gy: mismat ches E .g. Bamboo a ggressively chan ges nei ghbors for sin gle -

    messa ge resilien ce /performan ce

    Can wreak havo c wit h statef ul a gg re gation trees E .g. ret urnin g res ults: SE LE CT * from Fire walls

    1 Me ga Node of ma chines want to send yo u a t up le!A relational q uery pro cessor w/o stora ge Where s t he metadata?

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    131/173

    T raditional F ileS y stems on p2p ?

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    132/173

    y p p

    Lots o f proje cts OceanStore, F arSite, C F S, Ivy , P A S T , et c.Lots o f challen ges Motivation & Viabilit y

    S hort & lon g term

    Reso ur ce m gmt Load balan cin g w/h etero geneit y , et c.Economi cs come stron gl y into pla y

    Billin g and capa cit y plannin g? Reliabilit y & A vailabilit y

    Repli cation, server sele ction Wide -area repli cation (+ consisten cy o f updates )

    Se curit y En cr y ption & ke y m gmt, rat her t han a ccess control

    Non-traditional Storage Models

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    133/173

    g

    Very long ter m ar chival storage LOCKSS

    Ephe meral storage Pali mpsest, O pen DHT

    L OCK SS [Maniatis e t al SOSP 04]

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    134/173

    Digital Pre se rvation of A cad e mic Mat e rials A cad e mic pub lishing is moving from pap e r to

    digital le asingL ib rarians are scar e d with good re ason A cce ss de pe nds on the fat e of the pub lishe r Time is unkind to b its aft e r de cad e s Ple nt y of e ne mie s (ide ologie s, gove rnm e nts ,

    corporations)

    Goal: Pre se rve acce ss for local patrons , for a ve ry long time

    [Maniatis , e t al. S O S P 04]

    Protocol T hreats

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    135/173

    A ss ume co nve nt io nal plat f or m/ soc ial attacks Mit igate further da ma ge thro ugh protocol T o p a dversary goal: Stealth Mo dificat io n Mo dif y re pl icas to co nta in a dversary s vers io n Har d to re instate or iginal co nte nt a f ter lar ge

    pro port io n o f re pl icas are mo difie d

    Other goals

    De nial o f serv ice Syste m slo wdo wn Co nte nt the f t

    T he L OCK SS Solution

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    136/173

    Pee r-to -pee r au ditin g an d re pa i r syste m f o r re pli cate d do cu ment s / no f ile sh ar in gA pee r pe rio di cally au dit s it s o wn re pli ca , bycallin g an o pinion poll When a pee r su spe ct s an att ack, it ra i se s an al arm f o r a hu ma n o pe ra to r C o rrel ate d failu re s IP addr e ss spoo f in g Syste m slo wdo wn

    2n d ite ra tion o f a de ploye d syste m

    Sampled O pinion Poll

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    137/173

    p p

    Ea ch pee r hold s re f e ren ce li st o f pee rs i t ha s di scove red friend s li st o f pee rs i t kno ws ext e rnallyPe riodi cally (f a st e r than ra t e o f bi t ro t) Take a sample o f the re f e ren ce li st

    Invi t e them t o send a ha sh o f thei r repli ca Compa re vo t e s wi th lo cal copy O ve rwhelmin g a greemen t (> 70% ) Sleep bli ssfu lly O ve rwhelmin g di sa greemen t (

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    138/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    139/173

    L imit the rate of operation

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    140/173

    Peer s determine their rate of operation a utonomo usly A dver sary m us t wait for the next po ll to atta ck

    thro ugh the proto co l

    No operationa l path i s fa ster than other s A rtifi cia lly inf late co st of cheap operation s No atta ck can o ccur fa ster than norma l op s

    B imodal S y stem B eha v ior

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    141/173

    Whe n most re pli cas are the same, no alarms In bet wee n, ma ny alarms To get f rom mostl y corre ct to mostl y wro ng re pli cas,

    s y stem m ust pass thro ugh moat o f alarmi ng states

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    142/173

    B imodal S y stem B eha v ior

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    143/173

    Whe n most re pli cas are the same, no alarms In bet wee n, ma ny alarms To get f rom mostl y corre ct to mostl y wro ng re pli cas,

    s y stem m ust pass thro ugh moat o f alarmi ng states

    C hurn F riends into R eference L ist

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    144/173

    C hurn adjusts the bi as in the reference list Hi gh churn f avors friends R educes the effects of Sybi l att acks But offers e asy t ar gets for focused att ackL o w churn f avors str an gers It offers Sybi l att acks free rei gn

    Bad peers no min ate b ad ; good peers no min ate so me b ad Makes focused att ack h arder, since advers ary c an predict

    less of the po ll s ampl e Go al: strike a b alance

    Palimpsest [R oscoe & H and , H ot O S 03]

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    145/173

    R ob ust , available , sec ure epheme ral sto ra ge Small and ve ry simple So f t -capacity f o r se rvice p rovide rs

    Con gestion -based p ricin gA utomatic space reclamation Flexible client and se rve r policies

    W e ll i gno re the economics

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    146/173

    H ow does it do this ?

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    147/173

    To w rite a f i le: Eras ure code it Ro ute it th ro ugh a netwo rk o f si mple b lo ck sto res Pay to sto re it

    Each b lo ck sto re is a f ixed -le ng th F IFO Blo ck sto res ma y be ow ned by multi ple provide rs Blo ck sto res do n't care who the use rs are No o ne sto re needs to be t rusted Blo cks are eve nt ually lost o ff the e nd o f the q ue ue

    Storing a file

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    148/173

    Ea ch file ha s a name an d a k e y .File Di sper sal

    Use a ratele ss co de to sprea d blo cks into fragment sRabin 's ID A over GF(216), 10 24-b y te blo cks

    Fragment En cr yption Se curit y , a ut henti cit y , i dentifi cation

    A ES in Off set Co deboo k Mo de

    Fragment Pla cement En cr ypt: (SH A 256(name ) frag.i d) 256-bit ID

    Sen d (fragment, ID) to a blo ck store us ing DHTA n y DHT will do

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    149/173

    R etrie v ing a file

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    150/173

    Generate en oug h frag ment IDsR eq ue st frag ment s fr om bl ock st ore sWait until n come ba ck t o you

    De crypt an d v erifyIn v ert t he ID A V oila !

    Unf ort unately

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    151/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    152/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    153/173

    Securit y and T rust

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    154/173

    F ree R iders

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    155/173

    F i les ha ri ng s tudies Lot s of pe ople d ownloa d F e w pe ople serve f i les

    Is th is b ad? If there s no i nce nt ive to serve, why d o pe ople d o

    s o? What i f there are s t r ong disi nce nt ives to bei ng a

    ma jor server?

    Simple Solution : T hreshholds

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    156/173

    Many pro gr ams allo w a threshhold to be set Don t uplo ad a f ile to a peer unless it sh ares > k

    f iles

    Problems : Wh at s k ? Ho w to ensure the sh ared f iles are interestin g?

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    157/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    158/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    159/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    160/173

    S q uelching S y bil

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    161/173

    Ce rt i f ic at e au t h ori ty Cen tra lize one t hing: t he signing of ID ce rt i f ic at e sCen tra l se rve r i s ot he rwi se ou t of t he l oop

    Or h ave an inne r ring of tru st e d n ode s do t hi sUsing pra c t ic al By zan t ine ag ree men t proto c ol s [Castro/ Li skov OS DI

    01]

    We ak secu re IDs ID = SH A -1(IP addr e ss) A ssu me atta cke r c on tro l s a mod e st nu mbe r of n ode s Be fore rou t ing t h rough a n ode, ch allenge i t to prod uce t he

    righ t IP addr e ssRe q ui re s i t e rat ive rou t ing

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    162/173

    E xample : R edundant Agg in C hord

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    163/173

    | support (0)| = 16

    |s upport (1-8)| = 1|s upport (9-12)| = 2|s upport (13-14)| = 4|s upport (15)| = 8

    |s upport (8)| = 16

    |s upport (9-0)| = 1|s upport (1-4)| = 2|s upport (5-6)| = 4|s upport (7)| = 8

    log (n ) role s w/binomial size di stribution (avg = 3 )

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    164/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    165/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    166/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    167/173

    C losing T houghts

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    168/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    169/173

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    170/173

    T he D B C ommunit y H as Much to O ffer

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    171/173

    C om plex (mu lti -o perator ) queries & o ptimization NW fo lks have ten de d to bui ld sin gle -o perator s y stemsE.g. a gg re gation on ly , or mu lti -d ran ge -search on ly

    A da ptivit y require dB ut ma y not look like a da ptive Q P in databases

    Dec larative lan gua ge semantics Dea l with streamin g, c lock jitter an d soft state!Data re duction techniques For visua lization, a pproximate quer y processin gB u lk-com putation workloa ds Quite different from the ones the NW an d s y stems fo lks envision Recursive quer y processin g T he net work is a gra ph!

    Metareferences

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    172/173

    Y ou r fav or ite searc h en gine s hould f in d t he in line refs Pr o ject IRIS has a lot of part ic ipants pa pers on line htt p: //www .pr o ject -ir is.or gIEEE Distr ib ute d Syste ms On line htt p: //d s on line.c ompu ter. or g/o s /re late d/p2p/O Re illy Open P2P htt p: //www .open p2p .c omKar l A berer s ICDE 2002 t ut or ia l htt p: //ls ir pe ople.e pf l.c h/ aberer /Ta lks /ICDE2002-Tut or ia l.pd f Ross /Rubenste in Inf oCom 2003 t ut or ia l htt p: // c is.poly.e du/ ~r oss /t ut or ia ls /P2P t ut or ia lInf oc om .pd f PlanetLab htt p: //www .planet -lab. or gOpen DHT htt p: //www .open dh t.or g

  • 8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein

    173/173