architectures and algorithms for internet-scale (p2p) data management by joe hellerstein
TRANSCRIPT
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
1/173
Architectures and Algorithms for
Internet-Scale (P2P) Data Management Joe Hellerstein
Intel Research & UC Berkeley
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
2/173
Powerpoint Compatibility Note
This file was generated using MS PowerPoint 2004 for Mac. It may not display correctly in other versions of PowerPoint. In particular,
animations are often a problem.
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
3/173
Ov er v iew
Preliminaries What, Why The Platform
Uple v eling Network Data
Independence
Early P2P architectures Client-Ser v er
Floodsast Hierarchies A Little Gossip Commercial Off erings Lessons and Limitations
Ongo i ng Research Structured Ov erlays: DHTs Query Pr ocessi ng on Ov erlays St ora ge M odels & Systems
Se curity a nd Tr ust Joi ni ng the fun Tools a nd P latfo rm s Closi ng thoughts
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
4/173
Acknowledgments
For speci fic content in t hese sl ides Fr ans K aa s hoek Petros M an iat is
Sylvia Ratn as amy Timot hy Roscoe Scott S henker
Add it ion al Coll abor ators Brent C hun, Tyson
Cond ie, Ry an Huebsc h,D avid K arge r, Ankur Jain,Jinyang Li, Boon T hau Loo, Robe rt Mo rris,S riram Ram ab had ran,Se an Rhe a, Ion Sto ic a,D avid Wet he rall
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
5/173
Preliminaries
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
6/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
7/173
Scoping the Tutorial
Architecture s an d Algorith ms f or Data Manage ment The peril s o f overview s Can t cover everything. So much here!
So me intere sting thing s we ll skip Se mantic Me diation: data integration on steroi ds
E.g., Hyperion (Toronto), Piazza (UWa sh), etc. High-Throughput Co mputing
I.e. The Gri d
Co mplex data analy si s/ re duction /mining E.g. p2p di stribute d in f erence, wavelet s, regre ss ion, matrixco mputation s, etc.
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
8/173
Mo v ing Past the P2P Moniker:The Platform
The P2P name has lots of connotations Sim ple filestealing systems Very en d-u ser -centri c
Our fo cus here is on: Many parti ci pating ma chines, symmetri c in f un ction
Very Large S cale (MegaNo des, not PetaBytes) Minimal (or non -existent) management Note: user mo del is flexible
Co ul d be embe dde d (e.g. in OS, HW, firewall, et c.) Large -s cale hoste d ser v i ces a la Akamai or Google
A key to a chie v ing a utonomi c com pu ting ?
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
9/173
Ov erlay Networks
P2P a pp l icat io ns nee d to: Tra ck ide nt it ies & (IP) a ddresses o f peers
May be ma ny !Ma y ha v e s ignifica nt Chu r nBest not to ha v e n2 ID re f ere nces
Ro ute messa ges a mo ng peers If y o u do n t k ee p tra ck o f all peers, t his is mu lt i-ho p
This is a n o v erla y networ k Peers are do ing bot h na ming a nd ro ut ing IP be co mes just t he low -le v el tra ns port
All t he IP ro ut ing is o paque
Co ntrol o v er na ming a nd ro ut ing is power ful And as we ll see, br ings networ k s into t he database era
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
10/173
Man y New Challenges
Rela tive to ot he r pa rallel /dis trib ut e d s y s t e ms Pa rtial f a il ure Ch urn Few g ua ran t ees on trans port , s torage, e tc. Huge optimiza tion s pa ce Ne t w ork b ott lene cks & ot he r res ourc e cons tra in t s N o a dmin is tra tive organ iza tions Trus t iss ues: se curity , priva cy , in cen tives Rela tive to IP ne t w orking M uch h ighe r fun ction, more f lexible M uch less con tro llable /pr e dict able
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
11/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
12/173
W h y Bother II: Positi v e Lessons from Filestealing
P 2P enables organic scaling Vs. the to p few killer ser v ices -- no VCs req uire d! Can affor d to place more bets , tr y wacky i deas
Centralize d ser v ices engen der scr utin y Tracking users is tri v ial Pro v i der is liable (for mis use, for downtime, for local laws,
etc.)
Centralize d means b usiness
Nee d to pa y off start up & maintenance ex penses Nee d to protect against liabilit y Business req uirements dri v e to partic ular short -term goals
Trage dy of the commons
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
13/173
W h y Bother III ? Intellectual moti v ation
Hea dy mix o f theor y an d system s Great communit y o f re searcher s ha v e gathere d Al gorithm s, Networkin g, Di stribute d S ystem s,
Databa se s Health y set o f publication v enue sI PTPS worksho p a s a catal yst
Sur pri sin g de gree o f collaboration acro ss area sIn part su pporte d b y N SF Lar ge ITR ( project IRI S)
UC Berkele y , ICSI, MIT, NYU, an d Rice
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
14/173
Infecting the Networ k, Peer-to-Peer
The Internet i s h ar d to ch ange.But Over lay Net s are e as y! P 2P i s a won derf ul ho st for infecting networ k de sign s The next Internet i s li k e ly to be very different
N am ing i s a k ey de sign i ssu e to da yQuerying an d da t a in de pen dence k ey to morrow?
Don t forget: The Internet w as origin ally an over lay on the te le phone
networ k There i s no money to be mad e in the bit- shi pping b us ine ss
Amo de st go al for DB re se arch: Don t query the Internet.
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
15/173
Infecting the Networ k, Peer-to-Peer
Amo de st go al for DB re se arch: Don t query the Internet.
Be the Internet.
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
16/173
Some Guiding Applications
N
Int e l Re se arch & UC Be rke ley
LOCK SS
Stanford , HP Lab s, Sun , Harvard , Int e l Re se archLibe ration Ware
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
17/173
N : Pub lic He alth for the Int e rne t
Se curit y tools focus e d on me dicine Vaccine s for Viruse s Impro ving the w orld one pati e nt at a time We akne ss/opportunit y in the Pub lic He alth are na Public Health: population-focused, community-oriented Epide miolog y: incidence, distribution, and control in a
population
N A New A pproach Perform population-wide measurement Enable massive sharing of data and query results
The Internet Screensaver Engage end users: education and prevention Understand risky behaviors, at-risk populations.
Prototype running over PIER
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
18/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
19/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
20/173
N V ision: Networ k O racle
Supp ose t here existe d a Networ k O racle Answerin g questions abo ut c urrent Internet state
Ro utin g tables, lin k loa ds, latencies, f irewall events, etc.
How wo ul d t his c han ge t hin gs Social c han ge ( Public Healt h, sa f e co mpu tin g)Me di um ter m c han ge in distrib ute d a pplication desi gn
Currently distrib ute d a pp s do so me o f t his on t heir own Lon g ter m c han ge in networ k protocols
App- s peci f ic c usto m ro utin g Fa ult dia gnosis Etc.
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
21/173
L OCK SS: L ots O f C opies K eep Stuff Safe
Di gita l Prese rvatio n of Aca de mi c Mate ria ls L ib ra ria ns a re s ca re d wit h goo d reaso n Access depe nds o n t he fate of t he pub lis he r Ti me is u nki nd to bits afte r de ca des Ple nty of e ne mies (i deo lo gies, gove rnm e nts, co rpo ratio ns )Goa l: Arch iva l sto ra ge a nd a ccess
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
22/173
L OCK SS Approach
C ha lle nge s: Very low -co st har dware, opera tio n a nd a dminist ra tio n No ce nt ra l co nt ro l Re spec t f or acce ss co nt ro ls
A lo ng-t er m hor izo nM ust a ntic ipa t e a nd de gra de grace fully wit h Unde t ec t e d b it ro t S ust a ine d a tt acks
Esp. S t ea lt h mo difica tio n
So lutio n: P2P a uditing a nd repa ir syst e m f or rep lica t e d doc s
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
23/173
L iberation W are
Take yo ur f avorite Internet a ppli cation W eb ho stin g, sear ch, IM, f i le sharin g, Vo IP, e mai l, et c. Con si der us in g centra lize d ver sion s in a co untry wit h a
re pre ss ive govern ment
Tra ckabi lity an d liabi lity wi ll prevent t hi s bein g use d f or f ree spee ch No w con si der p2p
En han ce d wit h a pp ro priate se curity /p riva cy prote ction sCo uld be t he me di um o f t he next To m Paine s
Exa mple s: FreeNet, Pub li us , FreeHaven p2p stora ge to avoi d cen sor shi p & guarantee priva cy PK I-en crypte d stora ge Mix-net priva cy-pre servin g ro utin g
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
24/173
U ple v eling : Networ k Data Independence
SIGMOD Record, Sep. 2003
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
25/173
R ecall C odd s Data Independence
Deco uple app -level API fro m data o rgan izat ion C an make c han ges to data layo ut wit ho ut
mod if yin g appl icat ions
Simple ve rs ion: locat ion -independent na mes
Fanc ie r: decla rat ive q ue ries
A s clea r a pa rad igm s hif t as we can hope to find in co mp ute r sc ience- C . Papad imit rio u
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
26/173
T he Pillars of Data Independence
Indexes Val ue -based look ups have
to co mpete with direct access
Must adapt to shiftin g data distrib utions
Must guarantee perfor mance
Query Opti mization Support declarative q ueries
beyond look up /search Must adapt to shiftin g data
distrib utions Must adapt to chan ges in
environ ment
DBMSB-T ree
Join Orderin g, A M Selection,etc.
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
27/173
G eneralizing Data Independence
A cla ss ic level of indirecti on sc he me Indexe s are exactly t hat Complex q uerie s are a ric her indirecti on
The key for data independence: It s all ab out rate s of c hange
Heller stein s Data Independence Ineq uality: Data independence matter s when
d(envir on ment) /dt >> d(app) /dt
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
28/173
Data Independence in Networ k s
d(en v ironment) /dt >> d(app) /dt
In databases, t he RHS is un us ua lly sma ll This dro v e t he re lationa l database re v o lution
In extreme networ k ed systems, LH S is un us ua lly hi gh A nd t he app lications increasin gly comp lex and data -dri v en Simp le indirections (e. g. loca l loo k aside tab les) ins uff icient
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
29/173
T he Pillars of Data Independence
Indexes Val ue -based look ups have
to co mpete with direct access
Must adapt to shiftin g data distrib utions
Must guarantee perfor mance
Query Opti mization Support declarative q ueries
beyond look up /search Must adapt to shiftin g data
distrib utions Must adapt to chan ges in
environ ment
DBMS P 2P B-T ree Content -
A ddressable OverlayNet works (DHT s )
Join Orderin g, A M Selection,etc.
Multiquerydataflo w sharin g?
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
30/173
E arl y P2P
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
31/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
32/173
E arl y P2P I : C lient-Ser v er
Na ps ter C -S sear ch
xyz.mp3
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
33/173
E arl y P2P I : C lient-Ser v er
Na ps ter C -S sear ch
xyz.mp3 ?
xyz.mp3
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
34/173
E arl y P2P I : C lient-Ser v er
Na ps ter C -S sear ch pt2 pt f ile xf er
xyz.mp3 ?
xyz.mp3
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
35/173
E arl y P2P I : C lient-Ser v er
Na ps ter C -S sear ch pt2 pt f ile xf er
xyz.mp3 ?
xyz.mp3
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
36/173
E arl y P2P I : C lient Ser v er
S E TI@Home Ser v er a ss i gn s work unit s
My ma chine in fo
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
37/173
E arl y P2P I : C lient Ser v er
S E TI@Home Ser v er a ss i gn s work unit s
Ta sk: f(x)
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
38/173
E arl y P2P I : C lient Ser v er
S E TI@Home Ser v er a ss i gn s work unit s
Re sult : f(x)
60 TeraFLOPS!
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
39/173
E arl y P2P II : Flooding on Ov erla y s
xyz.mp3 ?
xyz.mp3
A n o v erla y networ k . U nstructured .
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
40/173
E arl y P2P II : Flooding on Ov erla y s
xyz.mp3 ?
xyz.mp3
Flooding
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
41/173
E arl y P2P II : Flooding on Ov erla y s
xyz.mp3 ?
xyz.mp3
Flooding
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
42/173
E arl y P2P II : Flooding on Ov erla y s
xyz.mp3
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
43/173
E arl y P2P II. v: U ltrapeers
U ltrapeers ca n be installe d (KaZa A ) or sel f-pr omo te d (Gnutella )
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
44/173
H ierarchical Networ k s (& Queries)
IP H ierarchical na me s pace ( www.vl db.or g, 141.1 2.12.51) H ierarchical routi ng
A uto no mous Syste ms correlate with na me s pace (thou gh not per f ectly) A strolabe [Bir ma n, et al. TOC S 03]
OL A P-style a gg re gate queries dow n the IP hierarchyDN S H ierarchical na me s pace ( clie nts + hierarchy o f servers) H ierarchical routi ng w /a gg ressive cachi ng
13 ma na ge d root servers IrisNet [Desh pa nde, et al. SIGMOD 03]
X path queries over (selecte d ) DN S (sub) -trees.
Tra ditio nal pros /co ns o f H ierarchical data mgm t Wor k s well f or thi ngs ali gne d with the hierarchy
Es p. physical locality a la A strolabe Inf lexible
No data i nde pe nde nce!
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
45/173
C ommercial O fferings
JXT A Java /X ML Frame work for p2p a pplica t ions Name resol ut ion an d ro ut ing is done wi th floo ds &
s uper peers C an al ways a dd yo ur o wn if yo u like
MS WinX P p2p ne tworking A n uns t r uc ture d overlay, floo de d publica t ion an d cac hing does no t ye t s upp or t dis t rib ut e d searc hesBo th have some sec uri t y s upp or t A uth en t ica t ion via signa tures (ass umes a t r us t e d a uth ori t y) Encrypt ion of t raffic
Groove Pla t form for p2p experience . IM an d async h collab t ools.
C lien t-serveris h name resol ut ion, back up services, e t c.
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
46/173
L essons and L imitations
Client -Se rve r pe rf o rms we ll But not a lways f easib le
Idea l pe rf o rman ce is o f ten not t he key iss ue!
Thin gs t hat flood -based systems do we ll Organi c s ca lin g De cent ra lization o f visibi lity and liabi lity Findin g po pula r st uff Fan cy lo ca l que ries Thin gs t hat flood -based systems do poo rly
Findin g un po pula r st uff [L oo, et a l V
L DB 04] Fan cy dist rib uted q ue ries
Vulne rabi lities: data poisonin g, t ra ckin g, et c. Gua rantees abo ut anyt hin g (ans we r qua lity, priva cy, et c.)
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
47/173
A L ittle G ossip
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
48/173
G ossip Protocols ( E pidemic Algorithms)
Origi na lly t argeted at d at ab ase replic atio n [Demers, et al. PODC 87] E speci ally nice f or un str uct ured net works R umor-mo ngeri ng : prop ag ate ne wly-received upd ate to k r an dom
neighbors E xte nded to ro uti ng
Poi nt-to-poi nt ro uti ng [Vahd at /Becker TR
, 00] R umor-mo ngeri ng o f queries i nste ad o f f loodi ng [Haa s, et al Inf ocom 02]E xte nded to aggreg ate comp ut atio n [Kempe, et al, FOCS 03]Mostly theoretic al ana lyses Us ua lly o f t wo f orms :
Wh a t is the tippi ng poi nt where an epidemic i nf ects the whole pop ul a tio n? (Percol atio n theory) Wh a t is the expected # o f mess ages f or i nf ectio n?
A Cor nell speci alty Demers, Klei nberg, G ehrke, H alper n,
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
49/173
Structured Ov erla y s : Distributed
H ash T ables (D HT s)
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
50/173
D HT O utline
H i gh- level ove rvie wFun dam ent al s of st ru ctu re d net work t opo l og ie s A n d examp le s
O ne con crete D HT ChordSome syste ms i ssue s St orag e mod el s & sof t st ate Loca lity
Chu rn ma n ag e ment
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
51/173
H igh- L e v el Idea : Indirection
Indirection in space L ogical (content-ba sed ) I Ds, ro uting to tho se I Ds
Content-addre ssable net work Tolerant o f ch urn
node s joining and lea v ing the net work to h
y
z
h=y
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
52/173
H igh- L e v el Idea : Indirection
Indirection in space L ogical (content-ba sed ) I Ds, ro uting to tho se I Ds
Content-addre ssable net work Tolerant o f ch urn
node s joining and lea v ing the net work
Indirection in ti me Want so me sche me to te mporally deco uple send and recei v e Per si stence req uired. Ty pical Internet sol ution : so f t state
Co mbo o f per si stence v ia storage and v ia retry Publi sher req ue st s TTL on storage Re publi she s a s needed
Meta phor : Di strib uted H a sh Table
to h z
h=z
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
53/173
W hat is a D HT ?
H ash T ab le data st ruc t ure that ma ps keys t o va lues esse ntia l b ui ldi ng b lock i n s of t wa re syste ms
Dist rib ute d H ash T ab le (D HT ) si mi la r, b ut s prea d a cross the In te rnet
Inte rf a ce i nse rt(key, va lue ) lookup( key)
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
54/173
H ow ?
Eve ry DH T no de supp o rts a single o pe ratio n:
Give n k ey as input; ro ut e me ssag e s t ow ard no de ho lding k ey
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
55/173
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
D HT in action
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
56/173
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
D HT in action
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
57/173
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
D HT in action
Operation: take key as input; route messages to node holding key
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
58/173
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
D HT in action : put()
insert(K 1 ,V1 )
Operation: take key as input; route messages to node holding key
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
59/173
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
D HT in action : put()
Operation: take key as input; route messages to node holding key
insert(K 1 ,V1 )
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
60/173
(K 1 ,V1 )
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
D HT in action : put()
Operation: take key as input; route messages to node holding key
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
61/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
62/173
retrieve (K 1 )
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Iterati v e v s. R ecursi v e R outing
Operation: take key as input; route messages to node holding key
Previously show e d rec ursive. A noth e r option : it e ra tive
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
63/173
D HT Design G oals
A n ove rlay ne two rk wi th : Flexible ma pping o f keys t o physi cal no des Small ne two rk dia me t e r Small deg ree (f ano ut)
Lo cal ro ut ing de cisions Rob us t ness t o chur n Ro ut ing f lexibili t y De cen t lo cali t y (lo w s tre tch )A s t o rage o r me mo ry me chanis m wi th No g ua ran t ees on pe rsis t en ce Main t enan ce via so ft s t a t e
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
64/173
Peers v s Infrastructure
Peer: A pplicat ion users pr ovide n odes f or DHT Exa mples: f iles har in g, etc
Infrastructure: Set of mana ge d n odes pr ovide DHT ser vice
Per ha ps ser v e many a pplicat ions A p2p incubat or ?We ll discuss t his at t he en d of t he tut or ia l
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
65/173
L ibrar y or Ser v ice
L ibrar y : DHT co de b undle d i nt o a pplica t io n Runs o n eac h no de r unn i ng a pplica t io n Eac h a pplica t io n req uire s o wn ro ut i ng
i nf ra st r uc tu re
Ser v ice: si ngle DHT share d b y a pplica t io ns Require s co mmo n i nf ra st r uc tu re
But e li mi na t e s dupl ica t e ro ut i ng syst e ms
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
66/173
D HT O utline
H i gh- level ove rvie wFun dam ent al s of st ru ctu re d net work t opo l ogie s A n d examp le sO ne con crete D HT ChordSome syste ms i ssue s St orag e mod el s & sof t st ate Loca lity Chu rn ma n ag e ment
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
67/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
68/173
An E xample D HT : C hord
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
69/173
An E xample D HT : C hord
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
70/173
An E xample D HT : C hord
Overlayed 2k -Gon s
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
71/173
R outing in C hord
A t mo st one o f e ach Gon E.g. 1-to -0
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
72/173
R outing in C hord
A t mo st one o f e ach Gon E.g. 1-to -0
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
73/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
74/173
R outing in C hord
A t mo st one o f e ach Gon E.g. 1-to -0
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
75/173
R outing in C hord
A t mo st one o f e ach Gon E.g. 1-to -0
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
76/173
R outing in C hord
A t mo st one o f e ach Gon E.g. 1-to -0Wh at h app ened?
We con stru cted the bin ary nu mber 15! R outing f ro m x to y
i s like co mputing y - x mod n bysu mm ing po wer s o f 2
4
1
8
2
Di am eter: log n (1 ho p per gon ty pe )Degree: log n (one out link per gon ty pe )
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
77/173
W hat is happening here ? Algebra!
Un derlying gr oup -the oreti c str uct ure Re call a gr oup is a set S an d an operat or s uch that:
S is cl ose d un der Ass ociativity: ( AB)C = A(BC)There is an i dentity ele ment I S s.t. IX = XI = X for all X S
The re is an inve rse X -1
S for e ach e le me nt X S s.t. XX -1 = X -1X = I
The ge ne ra t ors of a g roup Ele me nts {g 1, , g n} s.t. applica t ion of t he ope ra t or on t he
ge ne rat ors produc es all t he me mbe rs of t he g roup .
Canonical exampl e: (Z n, + ) I de nt ity is 0 A set of ge ne ra t ors: {1 } A diff e re nt set of ge ne rat ors: { 2, 3}
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
78/173
C a y le y G raphs
The C a y le y G raph (S , E ) of a gr oup: Ver tices corresp onding to t he und erl ying se t S E dges corresp onding to t he a ctions of t he ge nera tors
( Comple t e) C h or d is a C a y le y graph for ( Z n,+)
S = Z mod n ( n = 2k ).
G e nera tors {1 , 2, 4, , 2k -1} Tha t s wha t t he gon s are all ab out !
Fa ct : Mos t ( comple t e) DHTs are C a y le y graphs A nd t he y didn t eve n know it ! Foll ows f r om parallel Int er Conne ct Ne twor k s ( ICNs)
Sh own to be gr oup- t he ore tic [ A k ers /Kr ish na mu r t h y 89]
Not e: t he ones t ha t are n t C a y le y G raphs are cose t graphs ,a rela t e d gr oup- t he ore tic s t r uctu re
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
79/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
80/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
81/173
U pshot
Goo d DHT topo lo gies will be Cayley /Coset gra phs A rep lay o f ICN Des ign But DHTs can use funky wiring th at was inf e as ib le in ICNs A ll the gro up -theo ret ic anal ys is be co mes s ugg est ive
Cle an ma th des crib ing the topo lo gy he lps crisp lyanal yze e fficie ncy E.g. de gree /diam ete r t rad eo ff s E.g. sh apes o f t rees we ll see late r f o r aggr e ga t io n o r jo in
Re ally no ex cuse to be s loppy IS A M vs. B-t rees
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
82/173
Pastr y/B amboo
B ase d o n P laxto n Mes h [P laxto n, et a l SP AA 97]Names are fixe d b it str ings To po lo gy : Pre fix Hyper cube For ea ch b it f rom le f t to
r ight, pick a ne ighbor ID wit h commo n flippe d b it a nd commo n pre fix
lo g n de gree & diameter P lus a r ing For re liab ilit y (wit h k
pre d/ s ucc)Suffix Ro ut ing f rom A to B Fix b its f rom le f t to r ight E.g. 1010 to 0001:
1010 0101 00100000 0001
1010
0101
1100 1000
1011
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
83/173
C AN : C ontent Addressable Networ k
Explo it mult iple d imens ions Ea ch node is ass igned a zone Nodes are ident ified by zone bo undar ies Jo in : chose rando m po int, s pl it its zone
(0,0) (1,0)
(0,1)
(0,0.5, 0.5, 1)
(0.5,0.5, 1, 1)
(0,0, 0.5, 0.5)
(0.5,0.25, 0.75, 0.5)
(0.75,0, 1, 0.5)
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
84/173
R outing in 2-dimensions
R outing is n avig ating a d-dimension al ID s pac e R oute to closest neig hbo r in di re ction o f destin ation R outing t ab le cont ains O (d ) neig hbo rs Numbe r o f ho ps is O (d N1 /d )
(0,0) ( 1,0)
(0, 1)
(0,0 .5, 0 .5 , 1)
(0 .5 ,0 .5 , 1, 1)
(0,0, 0 .5, 0 .5) (0 .75 ,0, 1, 0 .5)
(0 .5 ,0 .25, 0 .75 , 0 .5)
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
85/173
K oorde
DeBr ui jn gr aphs Link f ro m node x t o
node s 2x an d 2x+1 De gree 2, d iame t er lo g n
Optimal !
K oorde is Chord -b as ed Basically Chord, b ut with
DeBr ui jn finger s
No t e: No t ver t ex-symme t r ic!No t a Cayley gr aph . But a co se t gr aph o f the b utt er fly t o po lo gy.
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
86/173
T opologies of O ther O ft-cited D HT s
T apestry Very si mil ar to Pastry /Bam boo topology No ri ng
K ade mli a
A lso si mil ar to Pastry /Bam boo But the ri ng is ordered by the X O R metric Used by the O ver net /eDo nkey filesh ari ng syste m
Viceroy A n e mul ated B utterfly net work
Sympho ny A r an do mized s ma ll- world net work
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
87/173
Incomplete G raphs : E mulation
For Chor d, we assume d 2m no des.What i f not? Nee d to emulate a complete graph
even when incomplete. Note : you ve seen this problem
be f ore!Lit win s Linear Hashin g emulates hashtables o f len gth 2m !
DHT-speci f ic schemes use d In Chor d, no de x is responsible f or the
ran ge [x, succ (x) )
The holes on the rin g shoul d be ran domly distribute d due to hashin g
Consistent Hashin g [Kar ger, et al. STOC97]
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
88/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
89/173
J oining the C hord R ing
Need IP o f so me node Pi ck a r ando m ID (e.g. SH A -1(IP))Send msg to current o wner o f th at ID Th at s yo ur prede ce ssor
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
90/173
J oining the C hord R ing
Need IP o f so me node Pi ck a r ando m ID (e.g. SH A -1(IP))Send msg to current o wner o f th at ID Th at s yo ur prede ce ssor Upd ate pred /succ links On ce the ring i s in place, all
i s we ll!In f or m app to move d at a app ro pri ate ly
Se ar ch to in st all f inger s o f varying po wer s o f 2 Or j us t co py f ro m pred /succ
and che ck!Inbo und f inger s f ixed lazi ly
Theore m: If con si sten cy i s re ached be f ore net work do ub le s, lookups re ma in log n
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
91/173
I C N E mulation
A t lea st 3 gene ri c emulation sch eme s have been pro po se d [Nao r/ Wie de r SP AA 03] [ A b ra ham, et al. I PDPS 03] [Manku PODC 03]
A s an exe rci se, f unky I C N + emulation scheme =ne w DHT IHOP: Inte rnet Ha shin g on Pan cake gra phs
[Rataj czak /Helle rstein 04]Pan cake gra ph I C N + A b ra ham, et al. emulation.
Ba se d on Bill Gate s onlypa pe r.Trivia que stion: who wa s hi s a dvi so r/c o -aut ho r?
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
92/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
93/173
f
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
94/173
Storage Models for D HT s
Up to no w we fo cused o n ro ut ing D HT s as co nte nt -addressable net work
Imp l icit in t he na me D HT is so me k ind of storage Or per ha ps a better word is me mory Enables ind ire ct io n in t ime But also ca n be v ie wed as a pla ce to store t hings
Soft state is t he na me of t he ga me in Inter net syste ms
f
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
95/173
A Note on Soft State
Ahyb rid pe rsis ten ce sche me Pe rsis ten ce via sto rage & ret ryJo int re spon sib ility of pub lishe r an d sto rage no de Ite m pub lishe d wit h a Time -To -Live (TTL) Sto rage no de atte mpt s to pre se rve it fo r t hat t ime
Be st effo rt Pub lishe r want s it to la st longe r?
Must re pub lish it (o r rene w it )
Must ba lan ce re liab ility an d re pub lishing ove rhea d Longe r TTL = longe r potent ia l o utage b ut le ss re pub lishing On fa ilure of a sto rage no de Pub lishe r event ua lly re pub lishe s e lse he re On fa ilure of a pub lishe r Sto rage no de event ua lly ga rbage co lle ct s
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
96/173
L li C i N i hb S l i
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
97/173
L ocalit y - C entric Neighbor Selection
Much recent work [Gumm a di, et al. S IGC OMM 03, A braha m, et al. SO D A 04,Dabek, et al. NS DI 04, Rhea, et al.USEN IX 04, etc.]
We sa w f lexibilit y in neighbor selection in Pa str y/Ba mboo
C an al so intro duce so me ran do mization into C hor d, C A N, etc.Ho w to pick
A nalogo us to a d-hoc net works
1. Ping ran do m no de s2. S wa p neighbor set s with neighbor s C o mbine with ran do m ping s to ex plore
3. Provabl y -goo d algorith m to f in d nearb y neighbor s ba se d on sa mpling [Karger an d Ruhl 02]
G d i ff
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
98/173
G eometr y and its effects
Some to po lo gies a llo w more c hoices Choice of nei ghbors in t he nei ghbor tab les (e. g.
Pastr y) Choice of ro utes to send a packet (e. g. Chord ) Cast in terms of geometr y
But rea lly a gro up- t heoretic t ype of ana ly sis
Havin g a rin g is ver y he lpf ul for resi lience Es pecia lly wit h a decent -sized leaf set
(s uccessors /predecessors )Sa y ~ lo g n
[G ummadi, et a l. SIG COMM 03]
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
99/173
Ti
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
100/173
T imeouts
Re call Ite ra tive vs. Re cu rsive Routi ng Ite ra tive: O ri gi na to r requests IP addr ess o f e ach ho p
Mess ag e t ran s po rt is ac tu ally do ne vi a di re ct IP Re cu rsive: Mess ag e t ran s f e rre d ho p-by-ho pEff e ct o n timeout me chan ism Nee d to t rac k late ncy o f commu ni catio n chann e ls Ite ra tive resu lts i n di re ct nv n commu ni catio n
Can t kee p timeout st ats at t ha t s cale So lutio n: vi rtu al coo rdi na te s chemes [Dabek et al. NSDI 04]
Wit h re cu rsive can do T CP-like t rac ki ng o f late ncy
Expo ne nti ally wei gh te d me an and var i anc e Ups hot: Bot h wo rk OK u p to a poi nt T CP-style does some wha t bette r t han vi rtu al coo rds at
mo dest chu rn rates (23 mi n. o r mo re me an sessio n time ) Vi rtu al coo rds be gi ns to fai l at hi ghe r chu rn ra tes
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
101/173
C omplex Quer y Processing
DHT G U E lit L k
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
102/173
D HT s G a v e U s Eq ualit y L oo k ups
What else mi gh t we wa nt? Ra nge Sea rch A ggr e gatio n G roup B y Joi n
Intelli ge nt Que ry Disse mi natio n
T he me A ll ca n be built ele ga ntl y o n D HT s!
T his is t he app roa ch we ta k e i n PIE R
But i n so me i nsta nces ot he r s che mes a re also reaso nable I will t ry to be su re to call t his out T he f loo di ng/g ossip st ra wma n is al wa y s a v ailable
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
103/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
104/173
PHT DHT Cont e nt Logical T rie
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
105/173
P HT g
PHT DHT Cont e ntsLogical T rie
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
106/173
P HT g
Se arch for 011101?
PHT Search
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
107/173
P HT Search
Ob serve: T he DHT a llows direct addre ss ing of P HT n ode s Can jump int o the P HT at any n ode
Interna l, lea f , or be low a lea f! S o, can f ind lea f by binary search
log log | D| search c os t ! If you kne w (r ough ly) the data di stributi on, even better
More over, c on sider a f ai led machine in the system Equa ls a f ai led n ode of the trie Can h op over f ai led n ode s direct ly!
A nd c on sider c oncurrency c ontr olA link-f ree data structure: simp le !
Reusable Lessons from P HTs
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
108/173
R eusable L essons from P HT s
Dire ct-a dd ress in g a lovely way t o emula t e robus t ,eff icien t l inke d da t a s t ru ct ures in the ne twork
Dire ct-a dd ress in g requ ires re gular it y in the da t a
s pa ce par tition in g E .g. works for re gular s pa ce -par tition in g in dexes (t r ies, qua d t rees )
No t so s im ple for da t a -par tition in g (B-t rees, R -t rees ) or irre gular s pa ce par tition in g (kd-t rees )
Aggregation
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
109/173
Aggregation
Two key ob servation s f or DHTs DHTs are multi -ho p, so hierar chi ca l aggregation
can re duc e BWE.g., t he TAG work f or sen sornet s [Ma dden, O SDI 2002]
DHTs provi de tree con str uction in a very nat ura l way
But what i f I don t use DHTs? Ho ld t hat t ho ug ht!
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
110/173
Consider Aggregation in Chord
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
111/173
C onsider Aggregation in C hord
Everybody sends their message to node 0Ass ume greedy j ump s (in creasing Gon -order )
Inter ce pt messages and aggregate a long the way
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
112/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
113/173
Aggregation in Koorde
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
114/173
Aggregation in K oorde
Re ca ll t he DeBr uijn gra ph : Ea ch node x point s to 2x mod n and (2x + 1 ) mod n
(But note: not node -symmetri c)
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
115/173
Aggregation in Pastr y/Bamboo
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
116/173
Aggregation in Pastr y/B amboo
De pen ds on choi ce o f neig hbors But i f y o u fli p exa ct ly one bit ea ch ho p:
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
117/173
Metrics for Aggregation Trees
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
118/173
Metrics for Aggregation T rees
What makes a goo d/ ba d agg tree? Number of e dges? No!Always n -1. Wit h distrib utive /a lgebraic aggs, msg size is fixe d.
Degree of fan -in Affects congestion
Heig ht Deter mines latency Pre dictabi lity of s ubtree s ha pe
Deter mines abi lity to contro l ti ming tig ht ly Stabi lity in t he face of c hurn
Changing tree s ha pe whi le acc umulating can res ult in errors Subtree size distrib ution
Affects jeo par dy of lost messages
So what if I don t ha ve a D HT?
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
119/173
So what if I don t ha v e a D HT ?
Need anothe r t ree -con st ruc tion me chani sm T he re a re many in the NW lite rat ure (e. g. fo r multi ca st ) Requi re maintenan ce me ssa ge s akin to D HT s
Do yo u maintain fo r the life of yo ur que ry en gine ? Or set up/ tea rdown a s needed ?
Can pi ck a t ree sha pe of yo ur own Not at the me rcy of the D HT to po lo gie s E.g. co uld do hi gh fan -in t ree s to mini mize laten cy
A s we noted befo re, we wi ll re use t ree -con st ruc tion
fo r multi ple purp o se s It s handy that they re t ri v ia l in D HT s But co uld re use anothe r sche me fo r multi ple purp o se s a s
we ll
Or, can do a ggre gation v ia go ss i p [Ke mpe, et a l FOCS 03]
Group By
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
120/173
G roup By
A p ie ce o f cake in a DHT Ever y no de se nds t up le s t o war d the hash ID o f
the group ing co lu mns A n agg t ree is nat ur ally co nst ru ct e d per group
No t e nice du al-purpo se u se o f DHT Hash- b as e d p ar titio ning f or p ar alle l group b y
Ju st like p ar alle l DBMS (G amma , the Exchang e op in Vo lcano )
A gg t ree co nst ru ctio n in mu lti-hop over lay ne twork
Hash Join
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
121/173
H ash J oin
We jus t di d hash -base d gro up by.H ash -base d join is ro ugh ly t he sa me dea l, twi ce: Given R.a J oin S.b Ea ch no de:
sen ds ea ch R tuple t o wa rd H (R.a )
sen ds ea ch S tuple t o wa rd H (S.b )A gain, DH T gives H ash -base d pa rt i t ionin g f o r pa ra lle l hash join Tree cons truct ion (no re duct ion a lon g t he way he re, t ho ugh )No t e t he res ult in g co mmu ni ca t ion pa tt e rn A tree is cons truct e d pe r hash des t ina t ion !
Tha t s a lo t o f trees ! No bi g dea l f o r t he DH T -- i t a lrea dy ha d t ha t t o po lo gy t he re.
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
122/173
Symmetric Semi- Join and Bloom Join
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
123/173
S y mmetric Semi J oin and B loom J oin
Quer y re writin g tricks f rom di strib uted DBsSemi- J oin s a la S DD-1 But do it to bot h side s o f t he join Re write R.a J oin S.b a s
( semi-join ) join R.a join S.b
Latter 2 join s can be Fetc h Matc he sB loom J oin s a la R* Require s a bit more f ine sse here A ggre gate R.a B loom f ilter s to a f ixed ha sh ID. Same f or
S.b.
A ll t he R.a B loom f ilter s are OR ed, event uall y m ultica sted to all node s storin g S t uple s S y mmetric f or S.b B loom f ilter Can in princi ple stream re f inin g B loom f ilter s
Quer y Dissemination
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
124/173
Q uer y Dissemination
Ho w do no des f in d out about a quer y ? Up to no w we convenient ly i gnore d t his!
Case 1: Broa dcast A s f ar as we kno w, a ll no des nee d to parti ci pate Nee d to have a broa dcast tree out o f t he quer y no de This is t he o pposite o f an a ggre gation tree!
But ho w to instantiate it?
Nave so lution: F loo d Ea ch no des sen ds quer y to a ll its nei ghbors
Prob lem: no des wi ll re ceive quer y mu lti ple times waste d ban dwi dt h
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
125/173
Q uer y Dissemination II
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
126/173
Quer y Dissemination II
Su ppose y ou have a sim ple equa lit y quer y Se le ct * From R W here R. c = 5 I f R.c is a lrea dy in dexe d in t he DHT, can route quer y via
DHTQ uer y Dissemination is an a ccess met ho d Basi ca lly t he same as an in dex
Can take more com plex queries an d disseminate sub -parts Se le ct * From R, S, T
Where R.a = S.b A n d S.c = T. d A n d R.c = 5
PIER
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
127/173
PI ER
Pee r-to- Pee r I nformation E xchang e & R e trieval P uts tog e the r man y of the t e chnique s de scribe d ab ove A ggr e ssive ly use s DHTs
But agnostic to choice Use s Bam b oo , has worke d on C A N and Chord
[Hueb sch , e t al. VLDB 03]
De ploye d R unning N que rie s on ~400 nod e s around the world
(P lane t Lab ) Simulat e d on up to 10K nod e sCurre nt A pplications I mpro ve d File sharing I nt e rne t Monitoring (N ) Customi zab le R outing via R e cursive Que rie s
http : //pi e r.cs.be rke ley.e du
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
128/173
Nati v e Simulation
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
129/173
Enti re system i s e v ent -dri v en Enable s di screte -e v ent simulation to be sli d in Re pla ce s lo we st -le v el
net wo rkin g & sche dule r Run s all t he re st o f PIER
nati v elyV e ry hel pf ul f o r
debu gg in g a ma ss i v elydi st ribute d system!
Initial T idbits from PI ER E fforts
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
130/173
Multiresol ution sim ulation criti cal Native sim ulator was hug ely hel pf ul E m ulab allo ws control over link -level performan ce PlanetLab is a ni ce a pproximation of realityDeb ugg in g still very hard Need to have a tra ced exe cution mode.
R adiolo gi cal dye? Intensive lo gg in g?
DB workloads on NW te chnolo gy: mismat ches E .g. Bamboo a ggressively chan ges nei ghbors for sin gle -
messa ge resilien ce /performan ce
Can wreak havo c wit h statef ul a gg re gation trees E .g. ret urnin g res ults: SE LE CT * from Fire walls
1 Me ga Node of ma chines want to send yo u a t up le!A relational q uery pro cessor w/o stora ge Where s t he metadata?
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
131/173
T raditional F ileS y stems on p2p ?
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
132/173
y p p
Lots o f proje cts OceanStore, F arSite, C F S, Ivy , P A S T , et c.Lots o f challen ges Motivation & Viabilit y
S hort & lon g term
Reso ur ce m gmt Load balan cin g w/h etero geneit y , et c.Economi cs come stron gl y into pla y
Billin g and capa cit y plannin g? Reliabilit y & A vailabilit y
Repli cation, server sele ction Wide -area repli cation (+ consisten cy o f updates )
Se curit y En cr y ption & ke y m gmt, rat her t han a ccess control
Non-traditional Storage Models
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
133/173
g
Very long ter m ar chival storage LOCKSS
Ephe meral storage Pali mpsest, O pen DHT
L OCK SS [Maniatis e t al SOSP 04]
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
134/173
Digital Pre se rvation of A cad e mic Mat e rials A cad e mic pub lishing is moving from pap e r to
digital le asingL ib rarians are scar e d with good re ason A cce ss de pe nds on the fat e of the pub lishe r Time is unkind to b its aft e r de cad e s Ple nt y of e ne mie s (ide ologie s, gove rnm e nts ,
corporations)
Goal: Pre se rve acce ss for local patrons , for a ve ry long time
[Maniatis , e t al. S O S P 04]
Protocol T hreats
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
135/173
A ss ume co nve nt io nal plat f or m/ soc ial attacks Mit igate further da ma ge thro ugh protocol T o p a dversary goal: Stealth Mo dificat io n Mo dif y re pl icas to co nta in a dversary s vers io n Har d to re instate or iginal co nte nt a f ter lar ge
pro port io n o f re pl icas are mo difie d
Other goals
De nial o f serv ice Syste m slo wdo wn Co nte nt the f t
T he L OCK SS Solution
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
136/173
Pee r-to -pee r au ditin g an d re pa i r syste m f o r re pli cate d do cu ment s / no f ile sh ar in gA pee r pe rio di cally au dit s it s o wn re pli ca , bycallin g an o pinion poll When a pee r su spe ct s an att ack, it ra i se s an al arm f o r a hu ma n o pe ra to r C o rrel ate d failu re s IP addr e ss spoo f in g Syste m slo wdo wn
2n d ite ra tion o f a de ploye d syste m
Sampled O pinion Poll
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
137/173
p p
Ea ch pee r hold s re f e ren ce li st o f pee rs i t ha s di scove red friend s li st o f pee rs i t kno ws ext e rnallyPe riodi cally (f a st e r than ra t e o f bi t ro t) Take a sample o f the re f e ren ce li st
Invi t e them t o send a ha sh o f thei r repli ca Compa re vo t e s wi th lo cal copy O ve rwhelmin g a greemen t (> 70% ) Sleep bli ssfu lly O ve rwhelmin g di sa greemen t (
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
138/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
139/173
L imit the rate of operation
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
140/173
Peer s determine their rate of operation a utonomo usly A dver sary m us t wait for the next po ll to atta ck
thro ugh the proto co l
No operationa l path i s fa ster than other s A rtifi cia lly inf late co st of cheap operation s No atta ck can o ccur fa ster than norma l op s
B imodal S y stem B eha v ior
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
141/173
Whe n most re pli cas are the same, no alarms In bet wee n, ma ny alarms To get f rom mostl y corre ct to mostl y wro ng re pli cas,
s y stem m ust pass thro ugh moat o f alarmi ng states
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
142/173
B imodal S y stem B eha v ior
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
143/173
Whe n most re pli cas are the same, no alarms In bet wee n, ma ny alarms To get f rom mostl y corre ct to mostl y wro ng re pli cas,
s y stem m ust pass thro ugh moat o f alarmi ng states
C hurn F riends into R eference L ist
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
144/173
C hurn adjusts the bi as in the reference list Hi gh churn f avors friends R educes the effects of Sybi l att acks But offers e asy t ar gets for focused att ackL o w churn f avors str an gers It offers Sybi l att acks free rei gn
Bad peers no min ate b ad ; good peers no min ate so me b ad Makes focused att ack h arder, since advers ary c an predict
less of the po ll s ampl e Go al: strike a b alance
Palimpsest [R oscoe & H and , H ot O S 03]
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
145/173
R ob ust , available , sec ure epheme ral sto ra ge Small and ve ry simple So f t -capacity f o r se rvice p rovide rs
Con gestion -based p ricin gA utomatic space reclamation Flexible client and se rve r policies
W e ll i gno re the economics
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
146/173
H ow does it do this ?
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
147/173
To w rite a f i le: Eras ure code it Ro ute it th ro ugh a netwo rk o f si mple b lo ck sto res Pay to sto re it
Each b lo ck sto re is a f ixed -le ng th F IFO Blo ck sto res ma y be ow ned by multi ple provide rs Blo ck sto res do n't care who the use rs are No o ne sto re needs to be t rusted Blo cks are eve nt ually lost o ff the e nd o f the q ue ue
Storing a file
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
148/173
Ea ch file ha s a name an d a k e y .File Di sper sal
Use a ratele ss co de to sprea d blo cks into fragment sRabin 's ID A over GF(216), 10 24-b y te blo cks
Fragment En cr yption Se curit y , a ut henti cit y , i dentifi cation
A ES in Off set Co deboo k Mo de
Fragment Pla cement En cr ypt: (SH A 256(name ) frag.i d) 256-bit ID
Sen d (fragment, ID) to a blo ck store us ing DHTA n y DHT will do
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
149/173
R etrie v ing a file
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
150/173
Generate en oug h frag ment IDsR eq ue st frag ment s fr om bl ock st ore sWait until n come ba ck t o you
De crypt an d v erifyIn v ert t he ID A V oila !
Unf ort unately
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
151/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
152/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
153/173
Securit y and T rust
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
154/173
F ree R iders
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
155/173
F i les ha ri ng s tudies Lot s of pe ople d ownloa d F e w pe ople serve f i les
Is th is b ad? If there s no i nce nt ive to serve, why d o pe ople d o
s o? What i f there are s t r ong disi nce nt ives to bei ng a
ma jor server?
Simple Solution : T hreshholds
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
156/173
Many pro gr ams allo w a threshhold to be set Don t uplo ad a f ile to a peer unless it sh ares > k
f iles
Problems : Wh at s k ? Ho w to ensure the sh ared f iles are interestin g?
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
157/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
158/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
159/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
160/173
S q uelching S y bil
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
161/173
Ce rt i f ic at e au t h ori ty Cen tra lize one t hing: t he signing of ID ce rt i f ic at e sCen tra l se rve r i s ot he rwi se ou t of t he l oop
Or h ave an inne r ring of tru st e d n ode s do t hi sUsing pra c t ic al By zan t ine ag ree men t proto c ol s [Castro/ Li skov OS DI
01]
We ak secu re IDs ID = SH A -1(IP addr e ss) A ssu me atta cke r c on tro l s a mod e st nu mbe r of n ode s Be fore rou t ing t h rough a n ode, ch allenge i t to prod uce t he
righ t IP addr e ssRe q ui re s i t e rat ive rou t ing
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
162/173
E xample : R edundant Agg in C hord
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
163/173
| support (0)| = 16
|s upport (1-8)| = 1|s upport (9-12)| = 2|s upport (13-14)| = 4|s upport (15)| = 8
|s upport (8)| = 16
|s upport (9-0)| = 1|s upport (1-4)| = 2|s upport (5-6)| = 4|s upport (7)| = 8
log (n ) role s w/binomial size di stribution (avg = 3 )
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
164/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
165/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
166/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
167/173
C losing T houghts
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
168/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
169/173
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
170/173
T he D B C ommunit y H as Much to O ffer
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
171/173
C om plex (mu lti -o perator ) queries & o ptimization NW fo lks have ten de d to bui ld sin gle -o perator s y stemsE.g. a gg re gation on ly , or mu lti -d ran ge -search on ly
A da ptivit y require dB ut ma y not look like a da ptive Q P in databases
Dec larative lan gua ge semantics Dea l with streamin g, c lock jitter an d soft state!Data re duction techniques For visua lization, a pproximate quer y processin gB u lk-com putation workloa ds Quite different from the ones the NW an d s y stems fo lks envision Recursive quer y processin g T he net work is a gra ph!
Metareferences
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
172/173
Y ou r fav or ite searc h en gine s hould f in d t he in line refs Pr o ject IRIS has a lot of part ic ipants pa pers on line htt p: //www .pr o ject -ir is.or gIEEE Distr ib ute d Syste ms On line htt p: //d s on line.c ompu ter. or g/o s /re late d/p2p/O Re illy Open P2P htt p: //www .open p2p .c omKar l A berer s ICDE 2002 t ut or ia l htt p: //ls ir pe ople.e pf l.c h/ aberer /Ta lks /ICDE2002-Tut or ia l.pd f Ross /Rubenste in Inf oCom 2003 t ut or ia l htt p: // c is.poly.e du/ ~r oss /t ut or ia ls /P2P t ut or ia lInf oc om .pd f PlanetLab htt p: //www .planet -lab. or gOpen DHT htt p: //www .open dh t.or g
-
8/7/2019 Architectures and Algorithms for Internet-Scale (P2P) Data Management by Joe Hellerstein
173/173