security and replication … and course wrap-up zachary g. ives university of pennsylvania cis 455 /...
TRANSCRIPT
![Page 1: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/1.jpg)
Security and Replication… and Course Wrap-up
Zachary G. IvesUniversity of Pennsylvania
CIS 455 / 555 – Internet and Web Systems
April 19, 2023
PNUTS slide content courtesy of Brian Cooper
![Page 2: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/2.jpg)
2
Secure Transactions
Authentication using public/private key pairs is essential today
Consider every Web transaction – we want to know whom we’re conversing with!
… versus ending up with a phishing attack!
![Page 3: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/3.jpg)
3
Secure Sockets Layer (SSL)
Relies on a trusted third party Certificate authority (CA) issues certificates
to certify a server and its public key Verisign is perhaps the best known of these
A server S generates public-private keypair Sends the public key, other info (plus $$$) to
Verisign (etc.) Gets back a certificate with:
CA name S’s name, URL, public key Timestamp and expiration info
![Page 4: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/4.jpg)
4
Example Certificate
Owner: CN=GTE CyberTrust Root, O=GTE Corporation, C=US
Issuer: CN=GTE CyberTrust Root, O=GTE Corporation, C=US
Serial number: 1a3
Valid from: Fri Feb 23 23:01:00 GMT 1996 until: Thu Feb 23 23:59:00 GMT 2006
Certificate fingerprints:MD5: C4:D7:F0:B2:A3:C5:7D:61:67:F0:04:CD:43:D3:BA:58SHA1: 90:DE:DE:9E:4C:4E:9F:6F:D8:86:17:57:9D:D3:91:BC:65:A6:89:64
![Page 5: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/5.jpg)
5
The SSL Protocol
Client C connects to server S from enterprise E S sends E’s certificate (cleartext) C validates the certificate using the CA (e.g.,
Verisign)’s public key C generates and sends to S a session key
encrypted with E’s public key
Java has built-in support for SSL (Java Secure Socket Extension, integrated in 1.4) and a tool for managing certificates (keytool)
![Page 6: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/6.jpg)
6
So…
The client and server know each other given SSL How do we go ahead and make a purchase?
Most commonly: you enter your credit card number Sometimes this is stored in the retailer’s system for
future purposes! Best case:
The CC info is stored in a special, firewalled server, not part of the web site
Web server has other account info about you When a transaction goes through, web site sends order
to this special server, which combines it with CC info and sends it onward
![Page 7: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/7.jpg)
7
Replication… Core of the Cloud
The vision of the “cloud”: a “computing utility” that is geographically distributed
At its core: geographical replication as well as partitioning What to replicate (including granularity) Where to replicate How to maintain consistency
(and how fresh data needs to be)
![Page 8: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/8.jpg)
8
What to Replicate
Cost to maintaining consistency if data is changing Larger objects, slower networks, frequent updates,
freshness requirements replication is more expensive May be able to send a “diff” instead of the whole object
Thus, difference between LAN and WAN replication: Local-area / cluster:
Single-writer, multiple-reader data is often replicated e.g., CNN
Wide-area: Need to limit replication to seldom-updated data, or relax
the freshness or consistency constraints e.g., Akamai (images, video), Google index
![Page 9: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/9.jpg)
9
Where to Place Replicas in the Internet
Want to place them at points where they can handle many requests and reduce traffic in bottlenecks
Commonly, at least one replica in Europe, Asia, US West Coast, US East Coast
Server 1 Server 2congested or
failure-prone linkC3
C2
C1
C4
C5 C6
C7
C8
C9
![Page 10: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/10.jpg)
10
Schemes for Maintaining Consistency
Goal is to trade off performance vs. consistency guarantees
Lock-based protocols Invalidation Lease Time-to-live
![Page 11: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/11.jpg)
11
Lock-Based Protocols
Guarantee strong consistency Similar to distributed version of what’s done in a
database Client request for an item requires a read lock at
its handling server Update to an item requires a write lock Multiple read locks can be held concurrently;
write lock must be exclusive
What are the potential pitfalls of this approach? Is it resilient to network partition?
![Page 12: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/12.jpg)
12
Invalidation Protocols
If a server is to update an item, it can multicast this to all replicas
Requires servers to know who all of the other parties are
May be somewhat weaker than lock-based models – why?
Common variation: lease-based protocol A replicated item is “leased” for a particular period If the item is updated during its lease, it is
invalidated/refreshed After it expires, it is dropped
What are the pros and cons of these protocols?
![Page 13: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/13.jpg)
13
Time-to-Live-Based Replication
Generally used when freshness constraints aren’t severe
Replicas are provided with an expectation for how likely they are likely to be current
After the “time-to-live” expires, they need to be revalidated
How does this compare to the previous approaches?
![Page 14: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/14.jpg)
Replication in “Cloud” Services
Yahoo’s PNUTS, Google’s BigTable are based on the notion that there is locality of data access Consider consistency within each record but
ignore cross-record consistency
e.g., in a social network, we should coordinate accesses to the same user (but don’t care about consistency with unrelated friends)… but even here, we might be able to tolerate relaxed consistency among the users
14
![Page 15: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/15.jpg)
15
Yahoo’s PNUTS Platform
E 75656 C
A 42342 EB 42521 W
C 66354 W
D 12352 E
F 15677 E
E 75656 C
A 42342 EB 42521 W
C 66354 W
D 12352 E
F 15677 E
Parallel databaseParallel database Geographic replicationGeographic replication
Indexes and viewsIndexes and views
Structured, flexible schemaStructured, flexible schema
A 42342 E
B 42521 W
C 66354 W
D 12352 E
E 75656 C
F 15677 E
![Page 16: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/16.jpg)
16
Query model
Per-record operations Get Set Delete
Multi-record operations Multiget Scan Getrange
Web service (RESTful) API
![Page 17: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/17.jpg)
System Architecture
17
Storageunits
Routers
Tablet controller
REST API
Clients
Local region Remote regions
YMB
![Page 18: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/18.jpg)
18
Tablet splitting and balancing
Each storage unit has many tablets (horizontal partitions of the table)Each storage unit has many tablets (horizontal partitions of the table)
Tablets may grow over timeTablets may grow over timeOverfull tablets splitOverfull tablets split
Storage unit may become a hotspotStorage unit may become a hotspot
Shed load by moving tablets to other serversShed load by moving tablets to other servers
Storage unitTablet
![Page 19: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/19.jpg)
20
Storage unit 1 Storage unit 2 Storage unit 3
Range queries
Router
AppleAvocadoBananaBlueberry
CanteloupeGrapeKiwiLemon
LimeMangoOrange
StrawberryTomatoWatermelon
Grapefruit…Pear?
Grapefruit…Lime?
Lime…Pear?
MIN-Canteloupe
SU1
Canteloupe-Lime
SU3
Lime-Strawberry
SU2
Strawberry-MAX
SU1
SU1Strawberry-MAX
SU2Lime-Strawberry
SU3Canteloupe-Lime
SU1MIN-Canteloupe
![Page 20: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/20.jpg)
21
Updates
1
Write key k
2
Write key k7Sequence # for key k
8Sequence # for key k
SU SU SU
3
Write key k4
5
SUCCESS
6
Write key k
RoutersMessage brokers
![Page 21: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/21.jpg)
22
Asynchronous replication and
consistency
![Page 22: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/22.jpg)
23
Asynchronous Replication
![Page 23: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/23.jpg)
24
Goal: make it easier for applications to reason about updates and cope with asynchrony
Consider a single record for Brian Cooper’s Facebook entry:
Time
Record inserted
Update Update Update UpdateUpdate Delete
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Update Update
Consistency Model
![Page 24: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/24.jpg)
25
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Current version
Stale versionStale version
Read (local)
Consistency Model
![Page 25: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/25.jpg)
26
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Read up-to-date
Current version
Stale versionStale version
Consistency Model
![Page 26: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/26.jpg)
27
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Read ≥ v.6
Current version
Stale versionStale version
Consistency Model
![Page 27: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/27.jpg)
28
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Write
Current version
Stale versionStale version
Consistency Model
![Page 28: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/28.jpg)
29
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Write if = v.7
ERROR
Current version
Stale versionStale version
Consistency Model
![Page 29: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/29.jpg)
30
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Write if = v.7
ERROR
Current version
Stale versionStale version
Mechanism: per record mastershipMechanism: per record mastership
Consistency Model
![Page 30: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/30.jpg)
PNUTS Recap
An interesting compromise between consistency and performance/availability
Used underneath many of Yahoo’s properties
… And an exemplar of the new generation of cloud services
31
![Page 31: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/31.jpg)
32
Experiments – Show It’s So!
The general goal: to help demonstrate and show why a real-world artifact provides a benefit Versus some benchmark or naïve strategy We also want to understand why there’s a benefit
Some common kinds of experiments: Usability: some sort of user tests, versus a benchmark Performance: as we increase the workload, what
happens? Scalability: as we increase the data, devices, nodes,
what happens? Complexity: especially for things like code, what
happens as we make the task harder or bigger?
![Page 32: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/32.jpg)
33
Experimentation In general, experiments should follow the scientific
method: Hypothesis (e.g., our method will do better than XYZ on
workloads like QWV, which are representative of domain ABC)
Experiment (examine this – may need many trials, random workloads, etc.)
Conclusion (show, with statistically significant measurements, that the hypothesis is true)
Often, the hypothesis almost goes unsaid in computer science – it’s implicit in the choice of the problem – but it is there!
Note that many attributes, e.g., elegance, style, are not very amenable to experiments
Others, like expressiveness, generally need to be proven rather than run
![Page 33: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/33.jpg)
34
Experimental Workloads There are generally three kinds of systems
experiments: Synthetic microbenchmark: experimental runs are done
over inputs that are generated to stress a specific factor, but is not particularly realistic
Examples: a hard disk random access test; a web server’s maximum throughput
Really shows the factor of interest; can be tweaked, scaled, etc.
Synthetic based on real behavior: experimental runs are done over inputs that are modeled after real data, but perhaps generated randomly
Examples: SPEC benchmarks; TPC-W web transaction benchmark
Enables us to generate more inputs, testing scalability, etc. Real-world: traces are collected of real system behavior
over real data Disadvantage: hard to quantify or control the different factors
![Page 34: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/34.jpg)
Experimental Methodology
Consider the important factors that you wish to examine (and demonstrate) Scalability – can typically be in terms of running time, size of the
problem, space consumed, etc. Here: performance is what matters
Break it down into individual parameters Crawl & index time; time to answer a query; etc.
Consider a workload that helps measure the parameter Crawl 1000 documents; run 50 queries 10 times apiece; etc.
Vary one parameter at a time, study effects Number of machines; number of threads per machine; etc.
Run experiment multiple times; average and show 95% confidence intervals in line (continuous) or bar (discrete) chart
35
![Page 35: Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS](https://reader036.vdocuments.mx/reader036/viewer/2022062314/56649e415503460f94b33e3b/html5/thumbnails/35.jpg)
36
Course Recap(Until Next Week’s Midterm 2!)
Distributed, Web-scale systems are here to stay! They create many issues that are not totally
resolved, and for which there is no one answer: Heterogeneity Timing Partitioning and replication Consistency and integrity Etc.
This course tried to give you a sense of the issues and state-of-the-art – as well as the skills to go out and work in this domain I hope the amount of work we all sank into the material
(and the homeworks) will pay off for you! And stay tuned – there’s lots more to come!
Sensor networks, semantic Web, mobile systems, location-based services, …