fastreplica 1 fastreplica : efficient large file distribution within content delivery networks lucy...

FastReplica1

FastReplica: Efficient Large File Distribution within Content Delivery Networks

Lucy Cherkasova and Jangwon Lee HPLabs, Palo Alto UT Austin

FastReplica2

What Is the Problem?

Content Delivery Networks (CDNs): large-scale distributed network of servers, servers are located closer to the edges of the Internet.

Main goal of CDN’s architecture is minimize the network impact in the content delivery path, overcome server overload problem for popular sites.

Content distribution within CDNs, i.e. to the edge servers pull model: performance penalty is insignificant for small/medium

documents; push model: active replication of the original content is desirable for large

documents such as software download packages, media files, etc.

Replicating a large file to a large set of edge servers is a challenging and resource intensive task!!!

FastReplica3

Content Distribution in the Internet Environment

Satellite distribution content distribution server (or original site) has a transmitting antenna, replica-servers (edge servers) have a satellite receiving dish, content distribution server broadcasts a file via satellite channel,

• requires special hardware, expensive. Multicast distribution

requires a multicast support in routers,• not widely available across the Internet infrastructure.

Application-level multicast distribution nodes act as intermediate routers to distribute a content along predefined

mesh or tree• performance is limited by the bottleneck link in the path,

informed content delivery across adaptive overlay networks (SIGCOMM,2002)

FastReplica4

What Do We Propose? FastReplica

Presentation Outline: FastReplica in the small

(algorithm core, applicable to 10-30 nodes)

Preliminary performance analysis of FastReplica in the small

FastReplica in the large

(scaling algorithm core to thousands of nodes)

Reliable FastReplica Algorithm

Performance evaluation of FastReplica prototype in a

wide-area testbed

FastReplica5

FastReplica in the Small

Problem Statement: Let N0 be a node which has an original file F and let Size(F) denote the

size of file F in bytes; Let R = {N1, … , Nn} be a replication set of nodes.

The problem consists in replicating file F across nodes N1, … , Nn

while minimizing the overall replication time. Let set N1, … , Nn be in a range 10-30 nodes.

File F is divides in n equal subsequent files:

F1, … , Fn

where Size(Fi) = Size(F) / n bytes for each i = 1, … , n.

FastReplica in the small consists of two steps: Distribution Step , Collection Step.

FastReplica6

N3

File F

F1 F2 F3 F n-1 F n

F1

F n-1

F n

F3F2

N0

N1

N2 N n-1

N n

FastReplica in the Small: Distribution Step

Origin node N0 opens n concurrent connections to nodes N1, … , Nn and sends to each node the following items:• a distribution list of nodes R = {N1, … , Nn} to which subfile Fi has to be sent on the next step;• subfile Fi .

FastReplica7

F n-1

F n

File F

F1

F1

F2 F3 F n-1 F n

N0

N1

N2

N3

N n-1

N n

F1

F1

F3F2

F1

F1

FastReplica in the SmallCollection Step (View “from a Node”)

After receiving Fi , node Ni opens (n-1) concurrent network connections to remaining nodes in the group and sends subfile Fi to them.

FastReplica8

File F

F1

F1

F2 F3 F n-1 F n

F n-1

F n

F3F2

N0

N1

N2

N3

N n-1

N n

F2F3

F n-1

F n

FastReplica in the SmallCollection Step (View “to a Node”)

Thus each node Ni has:• (n - 1) outgoing connections for sending subfile Fi ,• (n - 1) incoming connections from the remaining nodes in the group for

sending complementary subfiles F1, … , Fi-1 , Fi+1 , … , Fn .

FastReplica9

What Is the Main Idea of FastReplica?

Instead of typical replication of the entire file F to n nodes

using n Internet paths FastReplica exploits (n x n) different Internet paths within the replication group, where each path is used for transferring 1/n-th of file F.

Benefits: The impact of congestion along the involved paths is

limited for a transfer of 1/n-th of the file, FastReplica takes advantage of the upload and download

bandwidth of recipient nodes.

FastReplica10

Preliminary performance analysis of FastReplica in the small

Two performance metrics: average and maximum replication time. Idealistic setting: all the nodes and links are homogeneous, and let

each node can support n network connections to other nodes at B bytes/sec.

Timedistr = Size(F) / (nxB)

Timecollect = Size(F) / (nxB)

FastReplica: TimeFR =Timedistr + Timecollect = 2 x Size(F) / (nxB)

Multiple Unicast: TimeMU = Size(F) / B

Replication_Time_Speedup = TimeMU / TimeFR = n / 2

FastReplica11

Uniform-Random Model

Let BW denote bandwidth matrix, where

BW [i][j] reflects available bandwidth of the path from Ni to Nj.

Let BW [i][j] = B x random(1,Var), where Var is a bandwidth variance.

FastReplica12

File F

F1

F1

F2 F3 F n-1 F n

F n-1

F n

F3F2

N0

N1

N2

N3

N n-1

N n

Maximum Latency Speedup under Uniform-Random Model

Worst path transferring the entire file F against worst path with two segments transferring 1/n-th of file Fleads to n/2 in maximum latency improvement.

FastReplica13

Example with Skewed Path Bandwidth

File F

F1

F1

F2 F3 F 9 F 10

F 9

F 10

F3F2

N0

N1

N2

N3

N 9

N 100.1B

BB

B

B

0.1B

0.1B

0.1B

B

Bandwidth of Paths

At a first glance, the cross-nodes connections have significantly worse available bandwidth.Question: What is FastReplica performance in this configuration?

FastReplica14

FastReplica Performance for “Skewed” Example

While the average replication time is almost the same under Fastreplica andMultiple Multicast, the maximum replication time under Fastreplica provides 5 times performance benefits!

FastReplica15

Modified Example

File F

F1

F1

F2 F3 F 9 F 10

F 9

F 10

F3F2

N0

N1

N2

N3

N 9

N 10 B

BB

B

B

0.1B

B

Bandwidth of Paths

Let all the connections from origin node to recipient nodes are B, while all the cross-nodes connections have available bandwidth of 0.1B.Question: What is performance of FastReplica in this configuration?

0.1B

0.1B0.1B

FastReplica16

FastReplica Performance for Modified “Skewed” Example

In this configuration, FastReplica does not provide any performance benefitscompared to Multiple Multicast.

Number n of nodes in FastReplica in the small plays an important role here:a larger value of n provides a higher “safety” level for FastReplica performance.

A larger value of n helps to offset a higher difference in bandwidth between • the available bandwidth from the origin node to the nodes in the replication group, and • the available bandwidth within the replication group.

FastReplica17

FastReplica in the Large

Scaling Process:

• All the nodes are partitioned in groups of k nodes,where k is a number of network connections chosen for concurrent transfers between a single node and multiple receiving nodes.• Once a group of nodes receives the entire file F, they act as origin nodes and replicate file F to the next set of nodes.

Example. Let k =10. In 3 iterations (each taking 2 steps: distribution and collection), the original file can be replicated to 1000 nodes (10x10x10).

FastReplica18

Reliable FastReplica

The basic algorithm is sensitive to node failures: if node N1 fails during either distribution or collection step then this

event may impact all the nodes N2 , … , Nn in the group because

each node depends on node N1 to receive subfile F1.

if node N1 fails when it acts as an origin node, this failure impact all

of the replication groups in the dependent replication subtree.

Goal: to design an algorithm which efficiently deals with node failures by making local repair decision within the particular group of nodes.

FastReplica19

Reliable FastReplica

G/

N0

N k

/N1

/Ni

/

…

N0

/

…

^

G/

G

/

Heartbeat Group: origin and recipient nodes:the recipient nodes send heartbeat messages to the origin node:“I’m alive. I perform a distribution (or collection) step to nodes {Ni1, ….,Nij} in group G/ “. Different failure modes of a node:• node acts as an origin node;• node acts as a recipient node performing distribution/collection step.

If node N /0 fails while acting as origin node

for replication group G / then G / should be “reattached” to a higher-level origin node N0

and N0 acts as a replacement node for N /0

^

^

FastReplica20

/

File F

F1

F k-1

F k

FiF2

N0

N1

N2

N i

N k-1

N kFiFiFi

Fi

F1 F2 F3 F n-1 F n

/

/

/

/

/

Reliable FastReplica (cont.)

• If N/i fails while acting as a recipient node either during

collection (or distribution) step then N/0 performs the

following repair step:

FastReplica21

Reliable FastReplica (cont.)

Proposed algorithm handles a single node failure within a group with minimal performance penalty.

The number of heart-beat messages in such a group is very small (because only the recipient nodes send the heart-beat messages to their origin node).This structure significantly simplifies the protocol.

FastReplica22

Performance Evaluation of FastReplica Prototype in a Wide-Area Testbed

Thanks to our summer interns, we built a wide-area testbed of 9 nodes and used it for performance evaluation of FastReplica prototype.

FastReplica23

N6

N0

N5

N8

N1

N4

N2

N3

N7

Experimental Wide-Area Testbed

Geographic location of hosts:

FastReplica24

Goals of Performance Study

We compare the following distribution schemes: FastReplica in the small Sequential Unicast -- approximates distribution via IP multicast,

measures transfer time of entire file from the source to each recipient independently;

Multiple Unicast -- simultaneously transfers the entire file to all the recipient nodes by using concurrent connections.

We evaluate two metrics: average replication time maximum replication time

We experimented with 9 different size files:80 KB, 750 KB, 1.5 MB, 3 MB, 4.5 MB, 6 MB, 7.5 MB, 9 MB, 36 MB.

Each point in the results averages 10 different runs which were performed over 10 day period.

FastReplica25

Average Replication Time

n paths transferring the entire file vs (n x n) paths transferring only 1/n-th of the file

Congestion on any of the n paths from origin node to recipient nodes impact both Multiple Unicast and Sequential Multicast. FastReplica uses any of those paths for transferring only 1/n-th of the file.

FastReplica significantly outperforms Multiple Unicast and, in most cases ,outperforms Sequential Multicast.

FastReplica26

Maximum Replication Time

FastReplica significantly outperforms both Multiple Unicast and Sequential Multicast.

Maximum replication time under Multiple Unicast and Sequential Multicastis much higher than corresponding average replication time.

FastReplica27

FastReplica: Average and Maximum Replication Times

Maximum and average replication time under FastReplica are very close.

These results demonstrate the robustness and predictability of performance results under new strategy.

FastReplica28

FastReplica Performance (cont.)

Figure shows the average replication time measured by different, individual recipient nodes for a 9MB file and 8 nodes in replication set.

High variability of replication time under Multiple and Sequential Multicast.

File replication time under FastReplica across different nodes in replication set are much more stable and predictable.

FastReplica29

Average and Maximum Time Speedupunder FastReplica

FastReplica significantly outperforms Multiple Unicast.

For configuration of 8 nodes, performance benefits are 4 (aver) - 13 (max) times for a 1.5 MB file, 3.5 (aver) - 5 (max) times for a 9 MB file, 4 (aver) - 6.5 (max) times for a 36 MB file

FastReplica30

File Size Sensitivity Analysis

The files of 80 KB and 750 KB are the smallest ones used in our experiments.

For a 80 KB, FastReplica is not efficient, while for 750 KB, it becomes efficient. (These results are dependent on the number of nodes in the replication set!!!).

FastReplica31

Experiments with Different Configuration

The additional analysis revealed that the available bandwidth of the paths between the origin node N0 (hp.com) and nodes N1, N2 , … , N7 (universities’ machines) is significantly lower than the cross bandwidth between nodes N1 , N2 , … , N7 .

Node N8 had also a very limited incoming bandwidth from N0, N1 , … , N7. The outgoing bandwidth from N8 to N0, N1 , … , N7

was significantly higher.

Different configuration: let N1 (utexas.edu) be the origin node.

What is FastReplica performance in a new configuration?

FastReplica32

FastReplica Speedup in a New Configuration

In the new configuration, the average replication times under FastReplica and Multiple Unicast are similar,but the maximum speedup under FastReplica is significantly better than under Multiple Unicast.

FastReplica33

Conclusion and Future Directions

In this work, we introduce FastReplica for efficient and reliable

replication of large files in the Internet environment

FastReplica is simple and inexpensive. It does not require any

changes or modification to the existing Internet infrastructure, and

significantly reduces the file replication time.

Interesting future directions are how to better cluster nodes in the replication groups?

how to build an efficient overlay tree on top of those groups?

designing ALM-FastReplica via combination of FastReplica’s ideas

with ALM (Application Level Multicast).

FastReplica34

Acknowledgements

We would like to thank:

HPLabs summer interns who helped us to build wide-area

testbed: Yun Fu, Weidong Cui,Taehyun Kim, Kevin Fu, Zhiheng

Wang, Shiva Chetan, Xiaoping Wei, and Jehan Wickramasuriya;

John Apostolopoulos for motivating discussions;

John Sontag for his active support of this work;

our shepherd Srinivasan Seshan and the anonymous referees for

useful remarks and insightful questions.

Their help is highly appreciated !

fastreplica 1 fastreplica : efficient large file distribution within content delivery networks lucy...

Documents