fastreplica 1 fastreplica : efficient large file distribution within content delivery networks lucy...
TRANSCRIPT
FastReplica1
FastReplica: Efficient Large File Distribution within Content Delivery Networks
Lucy Cherkasova and Jangwon Lee HPLabs, Palo Alto UT Austin
FastReplica2
What Is the Problem?
Content Delivery Networks (CDNs): large-scale distributed network of servers, servers are located closer to the edges of the Internet.
Main goal of CDN’s architecture is minimize the network impact in the content delivery path, overcome server overload problem for popular sites.
Content distribution within CDNs, i.e. to the edge servers pull model: performance penalty is insignificant for small/medium
documents; push model: active replication of the original content is desirable for large
documents such as software download packages, media files, etc.
Replicating a large file to a large set of edge servers is a challenging and resource intensive task!!!
FastReplica3
Content Distribution in the Internet Environment
Satellite distribution content distribution server (or original site) has a transmitting antenna, replica-servers (edge servers) have a satellite receiving dish, content distribution server broadcasts a file via satellite channel,
• requires special hardware, expensive. Multicast distribution
requires a multicast support in routers,• not widely available across the Internet infrastructure.
Application-level multicast distribution nodes act as intermediate routers to distribute a content along predefined
mesh or tree• performance is limited by the bottleneck link in the path,
informed content delivery across adaptive overlay networks (SIGCOMM,2002)
FastReplica4
What Do We Propose? FastReplica
Presentation Outline: FastReplica in the small
(algorithm core, applicable to 10-30 nodes)
Preliminary performance analysis of FastReplica in the small
FastReplica in the large
(scaling algorithm core to thousands of nodes)
Reliable FastReplica Algorithm
Performance evaluation of FastReplica prototype in a
wide-area testbed
FastReplica5
FastReplica in the Small
Problem Statement: Let N0 be a node which has an original file F and let Size(F) denote the
size of file F in bytes; Let R = {N1, … , Nn} be a replication set of nodes.
The problem consists in replicating file F across nodes N1, … , Nn
while minimizing the overall replication time. Let set N1, … , Nn be in a range 10-30 nodes.
File F is divides in n equal subsequent files:
F1, … , Fn
where Size(Fi) = Size(F) / n bytes for each i = 1, … , n.
FastReplica in the small consists of two steps: Distribution Step , Collection Step.
FastReplica6
N3
File F
F1 F2 F3 F n-1 F n
F1
F n-1
F n
F3F2
N0
N1
N2 N n-1
N n
FastReplica in the Small: Distribution Step
Origin node N0 opens n concurrent connections to nodes N1, … , Nn and sends to each node the following items:• a distribution list of nodes R = {N1, … , Nn} to which subfile Fi has to be sent on the next step;• subfile Fi .
FastReplica7
F n-1
F n
File F
F1
F1
F2 F3 F n-1 F n
N0
N1
N2
N3
N n-1
N n
F1
F1
F3F2
F1
F1
FastReplica in the SmallCollection Step (View “from a Node”)
After receiving Fi , node Ni opens (n-1) concurrent network connections to remaining nodes in the group and sends subfile Fi to them.
FastReplica8
File F
F1
F1
F2 F3 F n-1 F n
F n-1
F n
F3F2
N0
N1
N2
N3
N n-1
N n
F2F3
F n-1
F n
FastReplica in the SmallCollection Step (View “to a Node”)
Thus each node Ni has:• (n - 1) outgoing connections for sending subfile Fi ,• (n - 1) incoming connections from the remaining nodes in the group for
sending complementary subfiles F1, … , Fi-1 , Fi+1 , … , Fn .
FastReplica9
What Is the Main Idea of FastReplica?
Instead of typical replication of the entire file F to n nodes
using n Internet paths FastReplica exploits (n x n) different Internet paths within the replication group, where each path is used for transferring 1/n-th of file F.
Benefits: The impact of congestion along the involved paths is
limited for a transfer of 1/n-th of the file, FastReplica takes advantage of the upload and download
bandwidth of recipient nodes.
FastReplica10
Preliminary performance analysis of FastReplica in the small
Two performance metrics: average and maximum replication time. Idealistic setting: all the nodes and links are homogeneous, and let
each node can support n network connections to other nodes at B bytes/sec.
Timedistr = Size(F) / (nxB)
Timecollect = Size(F) / (nxB)
FastReplica: TimeFR =Timedistr + Timecollect = 2 x Size(F) / (nxB)
Multiple Unicast: TimeMU = Size(F) / B
Replication_Time_Speedup = TimeMU / TimeFR = n / 2
FastReplica11
Uniform-Random Model
Let BW denote bandwidth matrix, where
BW [i][j] reflects available bandwidth of the path from Ni to Nj.
Let BW [i][j] = B x random(1,Var), where Var is a bandwidth variance.
FastReplica12
File F
F1
F1
F2 F3 F n-1 F n
F n-1
F n
F3F2
N0
N1
N2
N3
N n-1
N n
Maximum Latency Speedup under Uniform-Random Model
Worst path transferring the entire file F against worst path with two segments transferring 1/n-th of file Fleads to n/2 in maximum latency improvement.
FastReplica13
Example with Skewed Path Bandwidth
File F
F1
F1
F2 F3 F 9 F 10
F 9
F 10
F3F2
N0
N1
N2
N3
N 9
N 100.1B
BB
B
B
0.1B
0.1B
0.1B
B
Bandwidth of Paths
At a first glance, the cross-nodes connections have significantly worse available bandwidth.Question: What is FastReplica performance in this configuration?
FastReplica14
FastReplica Performance for “Skewed” Example
While the average replication time is almost the same under Fastreplica andMultiple Multicast, the maximum replication time under Fastreplica provides 5 times performance benefits!
FastReplica15
Modified Example
File F
F1
F1
F2 F3 F 9 F 10
F 9
F 10
F3F2
N0
N1
N2
N3
N 9
N 10 B
BB
B
B
0.1B
B
Bandwidth of Paths
Let all the connections from origin node to recipient nodes are B, while all the cross-nodes connections have available bandwidth of 0.1B.Question: What is performance of FastReplica in this configuration?
0.1B
0.1B0.1B
FastReplica16
FastReplica Performance for Modified “Skewed” Example
In this configuration, FastReplica does not provide any performance benefitscompared to Multiple Multicast.
Number n of nodes in FastReplica in the small plays an important role here:a larger value of n provides a higher “safety” level for FastReplica performance.
A larger value of n helps to offset a higher difference in bandwidth between • the available bandwidth from the origin node to the nodes in the replication group, and • the available bandwidth within the replication group.
FastReplica17
FastReplica in the Large
Scaling Process:
• All the nodes are partitioned in groups of k nodes,where k is a number of network connections chosen for concurrent transfers between a single node and multiple receiving nodes.• Once a group of nodes receives the entire file F, they act as origin nodes and replicate file F to the next set of nodes.
Example. Let k =10. In 3 iterations (each taking 2 steps: distribution and collection), the original file can be replicated to 1000 nodes (10x10x10).
FastReplica18
Reliable FastReplica
The basic algorithm is sensitive to node failures: if node N1 fails during either distribution or collection step then this
event may impact all the nodes N2 , … , Nn in the group because
each node depends on node N1 to receive subfile F1.
if node N1 fails when it acts as an origin node, this failure impact all
of the replication groups in the dependent replication subtree.
Goal: to design an algorithm which efficiently deals with node failures by making local repair decision within the particular group of nodes.
FastReplica19
Reliable FastReplica
G/
N0
N k
/N1
/Ni
/
…
N0
/
…
^
G/
G
/
Heartbeat Group: origin and recipient nodes:the recipient nodes send heartbeat messages to the origin node:“I’m alive. I perform a distribution (or collection) step to nodes {Ni1, ….,Nij} in group G/ “. Different failure modes of a node:• node acts as an origin node;• node acts as a recipient node performing distribution/collection step.
If node N /0 fails while acting as origin node
for replication group G / then G / should be “reattached” to a higher-level origin node N0
and N0 acts as a replacement node for N /0
^
^
FastReplica20
/
File F
F1
F k-1
F k
FiF2
N0
N1
N2
N i
N k-1
N kFiFiFi
Fi
F1 F2 F3 F n-1 F n
/
/
/
/
/
Reliable FastReplica (cont.)
• If N/i fails while acting as a recipient node either during
collection (or distribution) step then N/0 performs the
following repair step:
FastReplica21
Reliable FastReplica (cont.)
Proposed algorithm handles a single node failure within a group with minimal performance penalty.
The number of heart-beat messages in such a group is very small (because only the recipient nodes send the heart-beat messages to their origin node).This structure significantly simplifies the protocol.
FastReplica22
Performance Evaluation of FastReplica Prototype in a Wide-Area Testbed
Thanks to our summer interns, we built a wide-area testbed of 9 nodes and used it for performance evaluation of FastReplica prototype.
FastReplica23
N6
N0
N5
N8
N1
N4
N2
N3
N7
Experimental Wide-Area Testbed
Geographic location of hosts:
FastReplica24
Goals of Performance Study
We compare the following distribution schemes: FastReplica in the small Sequential Unicast -- approximates distribution via IP multicast,
measures transfer time of entire file from the source to each recipient independently;
Multiple Unicast -- simultaneously transfers the entire file to all the recipient nodes by using concurrent connections.
We evaluate two metrics: average replication time maximum replication time
We experimented with 9 different size files:80 KB, 750 KB, 1.5 MB, 3 MB, 4.5 MB, 6 MB, 7.5 MB, 9 MB, 36 MB.
Each point in the results averages 10 different runs which were performed over 10 day period.
FastReplica25
Average Replication Time
n paths transferring the entire file vs (n x n) paths transferring only 1/n-th of the file
Congestion on any of the n paths from origin node to recipient nodes impact both Multiple Unicast and Sequential Multicast. FastReplica uses any of those paths for transferring only 1/n-th of the file.
FastReplica significantly outperforms Multiple Unicast and, in most cases ,outperforms Sequential Multicast.
FastReplica26
Maximum Replication Time
FastReplica significantly outperforms both Multiple Unicast and Sequential Multicast.
Maximum replication time under Multiple Unicast and Sequential Multicastis much higher than corresponding average replication time.
FastReplica27
FastReplica: Average and Maximum Replication Times
Maximum and average replication time under FastReplica are very close.
These results demonstrate the robustness and predictability of performance results under new strategy.
FastReplica28
FastReplica Performance (cont.)
Figure shows the average replication time measured by different, individual recipient nodes for a 9MB file and 8 nodes in replication set.
High variability of replication time under Multiple and Sequential Multicast.
File replication time under FastReplica across different nodes in replication set are much more stable and predictable.
FastReplica29
Average and Maximum Time Speedupunder FastReplica
FastReplica significantly outperforms Multiple Unicast.
For configuration of 8 nodes, performance benefits are 4 (aver) - 13 (max) times for a 1.5 MB file, 3.5 (aver) - 5 (max) times for a 9 MB file, 4 (aver) - 6.5 (max) times for a 36 MB file
FastReplica30
File Size Sensitivity Analysis
The files of 80 KB and 750 KB are the smallest ones used in our experiments.
For a 80 KB, FastReplica is not efficient, while for 750 KB, it becomes efficient. (These results are dependent on the number of nodes in the replication set!!!).
FastReplica31
Experiments with Different Configuration
The additional analysis revealed that the available bandwidth of the paths between the origin node N0 (hp.com) and nodes N1, N2 , … , N7 (universities’ machines) is significantly lower than the cross bandwidth between nodes N1 , N2 , … , N7 .
Node N8 had also a very limited incoming bandwidth from N0, N1 , … , N7. The outgoing bandwidth from N8 to N0, N1 , … , N7
was significantly higher.
Different configuration: let N1 (utexas.edu) be the origin node.
What is FastReplica performance in a new configuration?
FastReplica32
FastReplica Speedup in a New Configuration
In the new configuration, the average replication times under FastReplica and Multiple Unicast are similar,but the maximum speedup under FastReplica is significantly better than under Multiple Unicast.
FastReplica33
Conclusion and Future Directions
In this work, we introduce FastReplica for efficient and reliable
replication of large files in the Internet environment
FastReplica is simple and inexpensive. It does not require any
changes or modification to the existing Internet infrastructure, and
significantly reduces the file replication time.
Interesting future directions are how to better cluster nodes in the replication groups?
how to build an efficient overlay tree on top of those groups?
designing ALM-FastReplica via combination of FastReplica’s ideas
with ALM (Application Level Multicast).
FastReplica34
Acknowledgements
We would like to thank:
HPLabs summer interns who helped us to build wide-area
testbed: Yun Fu, Weidong Cui,Taehyun Kim, Kevin Fu, Zhiheng
Wang, Shiva Chetan, Xiaoping Wei, and Jehan Wickramasuriya;
John Apostolopoulos for motivating discussions;
John Sontag for his active support of this work;
our shepherd Srinivasan Seshan and the anonymous referees for
useful remarks and insightful questions.
Their help is highly appreciated !