![Page 1: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/1.jpg)
OceanStoreToward Global-Scale, Self-Repairing,
Secure and Persistent Storage
John KubiatowiczUniversity of California at Berkeley
![Page 2: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/2.jpg)
OceanStore:2University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
OceanStore Context: Ubiquitous Computing
• Computing everywhere:– Desktop, Laptop, Palmtop– Cars, Cellphones– Shoes? Clothing? Walls?
• Connectivity everywhere:– Rapid growth of bandwidth in the interior of the net– Broadband to the home and office– Wireless technologies such as CMDA, Satelite, laser
• Where is persistent data????
![Page 3: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/3.jpg)
OceanStore:3University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Utility-based Infrastructure?
Pac Bell
Sprint
IBMAT&T
CanadianOceanStore
IBM
• Data service provided by federation of companies• Cross-administrative domain • Pay for Service
![Page 4: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/4.jpg)
OceanStore:4University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
OceanStore: Everyone’s Data, One Big
Utility “The data is just out there”
• How many files in the OceanStore?– Assume 1010 people in world– Say 10,000 files/person (very conservative?)– So 1014 files in OceanStore!
– If 1 gig files (ok, a stretch), get 1 mole of bytes!
Truly impressive number of elements…… but small relative to physical constants
Aside: new results: 1.5 Exabytes/year (1.51018)
![Page 5: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/5.jpg)
OceanStore:5University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Key Observation:Want Automatic Maintenance
• Can’t possibly manage billions of servers by hand!
• System should automatically:– Adapt to failure – Exclude malicious elements– Repair itself – Incorporate new elements
• System should be secure and private– Encryption, authentication
• System should preserve data over the long term (accessible for 1000 years):– Geographic distribution of information– New servers added from time to time– Old servers removed from time to time– Everything just works
![Page 6: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/6.jpg)
OceanStore:6University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
OceanStore Prototype exists!• Runs on Planet-Lab infrastructure
– 150,000 lines of Java code – Experiments have run on 100+ servers at 42 sites
in US and Europe• Working applications:
– NFS File service– Anonymous storage – IMAP/SMTP through OceanStore– Web Caching through OceanStore
• Still pieces missing, of course– Some of the security, advanced adaptation, etc.
• Also, not running continuously– (I am not using it for data that I care about – Yet!)– Not holding a mole of data
![Page 7: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/7.jpg)
OceanStore:7University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Today we will explore the Thesis:
OceanStore is an instance of a new type of system –
aThermodynamic Introspective system(ThermoSpective?)
![Page 8: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/8.jpg)
OceanStore:8University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
On the consequences of Scale
• Humans building large, richly connected systems:– Chips: 108 transistors, 8 layers of metal– Internet: 109 hosts, terabytes of bisection bandwidth– Societies: 108 to 109 people, 6-degrees of separation
• Complexity is a liability:– More components Higher failure rate– Chip verification > 50% of design team– BGP instability in the internet– Large societies unstable (especially when centralized)– Never know whether things will work as designed
• Complexity is a good thing!– Redundancy and interaction can yield stable behavior– Engineers are not at all used to thinking this way– Might design systems to correct themselves
![Page 9: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/9.jpg)
OceanStore:9University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Question: Can we exploit Complexity to our Advantage?
Moore’s Law gains Potential for Stability
![Page 10: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/10.jpg)
OceanStore:10University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
The Biological Inspiration• Biological Systems are built from (extremely)
faulty components, yet:– They operate with a variety of component failures
Redundancy of function and representation– They have stable behavior Negative feedback– They are self-tuning Optimization of common
case
• Introspective (Autonomic)Computing:– Components for performing– Components for monitoring and
model building– Components for continuous
adaptationAdapt
Dance
Monitor
![Page 11: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/11.jpg)
OceanStore:11University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
The Thermodynamic Analogy
• Large Systems have a variety of latent order– Connections between elements– Mathematical structure (erasure coding, etc)– Distributions peaked about some desired behavior
• Permits “Stability through Statistics”– Exploit the behavior of aggregates (redundancy)
• Subject to Entropy– Servers fail, attacks happen, system changes
• Requires continuous repair– Apply energy (i.e. through servers) to reduce
entropy– Introspection restores distributions
![Page 12: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/12.jpg)
OceanStore:12University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Application-Level Stability
• End-to-end and everywhere else:– To provide guarantees about QoS, Latency,
Availability, Durability, must distribute responsibility
– One view: make the infrastructure understand the vocabulary or semantics of the application
• Must exploit the infrastructure:– Locality of communication– Redundancy of State and Communication
Paths– Quality of Service enforcement– Denial of Service restriction
![Page 13: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/13.jpg)
OceanStore:13University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Today: Four Technologies
• Decentralized Object Location and Routing– Highly connected, self-repairing
communication
• Object-Based, Self-Verifying Data– Let the Infrastructure Know What is important
• Self-Organized Replication– Increased Availability and Latency Reduction
• Deep Archival Storage– Long Term Durability
![Page 14: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/14.jpg)
OceanStore:14University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
DecentralizedObject Location
and Routing
![Page 15: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/15.jpg)
OceanStore:15University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Locality, Locality, LocalityOne of the defining
principles• “The ability to exploit local resources over
remote ones whenever possible”• “-Centric” approach
– Client-centric, server-centric, data source-centric
• Requirements:– Find data quickly, wherever it might reside
• Locate nearby object without global communication • Permit rapid object migration
– Verifiable: can’t be sidetracked• Data name cryptographically related to data
![Page 16: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/16.jpg)
OceanStore:16University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Enabling Technology: DOLR(Decentralized Object Location and
Routing)
GUID1
DOLR
GUID1GUID2
![Page 17: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/17.jpg)
OceanStore:17University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
4
2
3
3
3
2
2
1
2
4
1
2
3
3
1
34
1
1
4 3
2
4
NodeID0xEF34
NodeID0xEF31NodeID
0xEFBA
NodeID0x0921
NodeID0xE932
NodeID0xEF37
NodeID0xE324
NodeID0xEF97
NodeID0xEF32
NodeID0xFF37
NodeID0xE555
NodeID0xE530
NodeID0xEF44
NodeID0x0999
NodeID0x099F
NodeID0xE399
NodeID0xEF40
NodeID0xEF34
Basic Tapestry MeshIncremental Prefix-based Routing
![Page 18: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/18.jpg)
OceanStore:18University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Use of Tapestry MeshRandomization and
Locality
![Page 19: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/19.jpg)
OceanStore:19University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Stability under Faults• Instability is the common case….!
– Small half-life for P2P apps (1 hour????)– Congestion, flash crowds, misconfiguration, faults
• BGP convergence 3-30 mins!• Must Use DOLR under instability!
– Insensitive to faults and denial of service attacks• Route around bad servers and ignore bad data
– Repairable infrastructure• Easy to reconstruct routing and location information
• Tapestry is natural framework to exploit redundant elements and connections
• Thermodynamic analogies: – Heat Capacity of DOLR network– Entropy of Links (decay of underlying order)
![Page 20: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/20.jpg)
OceanStore:20University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
It’s Alive!• Tapestry currently running on Planet-Lab
– (100+ soon to be 1000+ servers spread around world)
– Dynamic Integration Algorithms (SPAA 2002)– Continuous system repair
• Preliminary Numbers for a working system: Latency of Single Node Integration
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 50 100 150 200 250 300 350 400 450
Size of Network (# of nodes)
La
ten
cy
(m
s)
Routing Performance
0
1
2
3
4
5
6
7
0 50 100 150 200 250 300 350
IP Round Trip Time (ms)
RDP
![Page 21: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/21.jpg)
OceanStore:21University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Object-Based Storage
![Page 22: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/22.jpg)
OceanStore:22University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
OceanStore Data Model• Versioned Objects
– Every update generates a new version– Can always go back in time (Time Travel)
• Each Version is Read-Only– Can have permanent name (SHA-1 Hash)– Much easier to repair
• An Object is a signed mapping between permanent name and latest version– Write access control/integrity involves managing
these mappings
Comet Analogy updates
versions
![Page 23: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/23.jpg)
OceanStore:23University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Secure Hashing
• Read-only data: GUID is hash over actual data– Uniqueness and Unforgeability: the data is what it is!– Verification: check hash over data
• Changeable data: GUID is combined hash over a human-readable name + public key– Uniqueness: GUID space selected by public key– Unforgeability: public key is indelibly bound to GUID
• Thermodynamic insight: Hashing makes “data particles” unique, simplifying interactions
SHA-1DATA 160-bit GUID
![Page 24: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/24.jpg)
OceanStore:24University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Self-Verifying Objects
DataBlocks
VGUIDi VGUIDi + 1
d2 d4d3 d8d7d6d5 d9d1
Data B -Tree
IndirectBlocks
M
d'8 d'9
Mbackpointer
copy on write
copy on write
AGUID = hash{name+keys}
UpdatesHeartbeats +
Read-Only Data
Heartbeat: {AGUID,VGUID, Timestamp}signed
![Page 25: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/25.jpg)
OceanStore:25University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
The Path of an OceanStore Update
Second-TierCaches
Multicasttrees
Inner-RingServers
Clients
![Page 26: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/26.jpg)
OceanStore:26University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Self-OrganizedReplication
![Page 27: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/27.jpg)
OceanStore:27University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Self-Organizing Soft-State Replication
• Simple algorithms for placing replicas on nodes in the interior– Intuition: locality properties
of Tapestry help select positionsfor replicas
– Tapestry helps associateparents and childrento build multicast tree
• Preliminary resultsshow that this is effective
![Page 28: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/28.jpg)
OceanStore:28University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Effectiveness of second tier
![Page 29: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/29.jpg)
OceanStore:29University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Second Tier Adaptation: Flash Crowd
• Actual Web Cache running on OceanStore– Replica 1 far away– Replica 2 close to most requestors (created t ~ 20)– Replica 3 close to rest of requestors (created t ~ 40)
![Page 30: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/30.jpg)
OceanStore:30University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Introspective Optimization• Secondary tier self-organized into
overlay multicast tree:– Presence of DOLR with locality to suggest
placement of replicas in the infrastructure– Automatic choice between update vs invalidate
• Continuous monitoring of access patterns:– Clustering algorithms to discover object
relationships• Clustered prefetching: demand-fetching related
objects• Proactive-prefetching: get data there before needed
– Time series-analysis of user and data motion• Placement of Replicas to Increase Availability
![Page 31: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/31.jpg)
OceanStore:31University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Deep Archival Storage
![Page 32: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/32.jpg)
OceanStore:32University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Two Types of OceanStore Data
• Active Data: “Floating Replicas”– Per object virtual server– Interaction with other replicas for consistency– May appear and disappear like bubbles
• Archival Data: OceanStore’s Stable Store– m-of-n coding: Like hologram
• Data coded into n fragments, any m of which are sufficient to reconstruct (e.g m=16, n=64)
• Coding overhead is proportional to nm (e.g 4)• Other parameter, rate, is 1/overhead
– Fragments are cryptographically self-verifying
• Most data in the OceanStore is archival!
![Page 33: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/33.jpg)
OceanStore:33University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Archival Disseminationof Fragments
![Page 34: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/34.jpg)
OceanStore:34University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Fraction of Blocks Lost per Year (FBLPY)
• Exploit law of large numbers for durability!• 6 month repair, FBLPY:
– Replication: 0.03– Fragmentation: 10-35
![Page 35: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/35.jpg)
OceanStore:35University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
The Dissemination Process:Achieving Failure Independence
Model Builder
Set Creator
IntrospectionHuman Input
Network
Monitoringmodel
Inner Ring
Inner Ringse
t
set
probe
type
fragments
fragments
fragments
![Page 36: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/36.jpg)
OceanStore:36University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Independence Analysis
• Information gathering: – State of fragment servers (up/down/etc)
• Correllation analysis:– Use metric such as mutual information – Cluster via that metric– Result partitions servers into uncorrellated clusters
![Page 37: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/37.jpg)
OceanStore:37University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Active Data Maintenance
• Tapestry enables “data-driven multicast”– Mechanism for local servers to watch each other– Efficient use of bandwidth (locality)
3274
4577
5544
AE87
3213
9098
1167
6003
0128
L2L2
L1
L1
L2
L2
L3
L3
L2
L1L1
L2L3
L2
Ring of L1Heartbeats
![Page 38: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/38.jpg)
OceanStore:38University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
1000-Year Durability?• Exploiting Infrastructure for Repair
– DOLR permits efficient heartbeat mechanism to notice:• Servers going away for a while• Or, going away forever!
– Continuous sweep through data also possible– Erasure Code provides Flexibility in Timing
• Data continuously transferred from physical medium to physical medium– No “tapes decaying in basement”– Information becomes fully Virtualized
• Thermodynamic Analogy: Use of Energy (supplied by servers) to Suppress Entropy
![Page 39: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/39.jpg)
OceanStore:39University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
PondStorePrototype
![Page 40: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/40.jpg)
OceanStore:40University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
First Implementation [Java]:• Event-driven state-machine model
– 150,000 lines of Java code and growing• Included Components
DOLR Network (Tapestry)• Object location with Locality• Self Configuring, Self R epairing
Full Write path• Conflict resolution and Byzantine agreement
Self-Organizing Second Tier• Replica Placement and Multicast Tree Construction
Introspective gathering of tacit info and adaptation• Clustering, prefetching, adaptation of network routing
Archival facilities • Interleaved Reed-Solomon codes for fragmentation• Independence Monitoring• Data-Driven Repair
• Downloads available from www.oceanstore.org
![Page 41: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/41.jpg)
OceanStore:41University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Event-Driven Architecture of an OceanStore Node
• Data-flow style– Arrows Indicate flow of messages
• Potential to exploit small multiprocessors at each physical node
World
![Page 42: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/42.jpg)
OceanStore:42University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
First Prototype Works!
• Latest: it is up to 8MB/sec (local area network)– Biggest constraint: Threshold Signatures
• Still a ways to go, but working
![Page 43: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/43.jpg)
OceanStore:43University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Update Latency
• Cryptography in critical path (not surprising!)
![Page 44: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/44.jpg)
OceanStore:44University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Reality: Web Caching through
OceanStore
![Page 45: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/45.jpg)
OceanStore:45University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Other Apps
• Better file system support– NFS (working – reimplementation in progress)– Windows Installable file system (soon)
• Working Email through OceanStore– IMAP and POP proxies– Let normal mail clients access mailboxes in OS
• Anonymous file storage:– Nemosyne uses Tapestry by itself
• Palm-pilot synchronization– Palm data base as an OceanStore DB
![Page 46: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/46.jpg)
OceanStore:46University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Conclusions• Exploitation of Complexity
– Large amounts of redundancy and connectivity– Thermodynamics of systems:
“Stability through Statistics”– Continuous Introspection
• Help the Infrastructure to Help you– Decentralized Object Location and Routing (DOLR)– Object-based Storage– Self-Organizing redundancy– Continuous Repair
• OceanStore properties:– Provides security, privacy, and integrity– Provides extreme durability– Lower maintenance cost through redundancy,
continuous adaptation, self-diagnosis and repair
![Page 47: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/47.jpg)
OceanStore:47University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
For more info:http://oceanstore.org
• OceanStore vision paper for ASPLOS 2000“OceanStore: An Architecture for Global-Scale
Persistent Storage”
• Tapestry algorithms paper (SPAA 2002):“Distributed Object Location in a Dynamic
Network”
• Bloom Filters for Probabilistic Routing (INFOCOM 2002):
“Probabilistic Location and Routing”
• Upcoming CACM paper (not until February):– “Extracting Guarantees from Chaos”
![Page 48: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/48.jpg)
OceanStore:48University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Backup Slides
![Page 49: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/49.jpg)
OceanStore:49University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Secure Naming
• Naming hierarchy:– Users map from names to GUIDs via hierarchy of
OceanStore objects (ala SDSI)– Requires set of “root keys” to be acquired by user
FooBarBaz
Myfile
Out-of-Band“Root link”
![Page 50: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/50.jpg)
OceanStore:50University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Parallel Insertion Algorithms (SPAA ’02)
• Massive parallel insert is important – We now have algorithms that handle “arbitrary
simultaneous inserts”– Construction of nearest-neighbor mesh links
• Log2 n message complexityfully operational routing mesh
– Objects kept available during this process • Incremental movement of pointers
• Interesting Issue: Introduction service– How does a new node find a gateway into the
Tapestry?
![Page 51: OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage](https://reader036.vdocuments.mx/reader036/viewer/2022062519/56814f6a550346895dbd2016/html5/thumbnails/51.jpg)
OceanStore:51University of Maryland Distinguished Lecture ©2002 John Kubiatowicz/UC Berkeley
Can You Delete (Eradicate) Data?
• Eradication is antithetical to durability!– If you can eradicate something, then so can someone
else! (denial of service)– Must have “eradication certificate” or similar
• Some answers:– Bays: limit the scope of data flows– Ninja Monkeys: hunt and destroy with certificate
• Related: Revocation of keys– Need hunt and re-encrypt operation
• Related: Version pruning – Temporary files: don’t keep versions for long– Streaming, real-time broadcasts: Keep? Maybe– Locks: Keep? No, Yes, Maybe (auditing!)– Every key stroke made: Keep? For a short while?