beyond music sharing: an evaluation of peer-to-peer data dissemination techniques in

26
Beyond Music Sharing: An Evaluation of Peer-to- Peer Data Dissemination Techniques in Large Scientific Collaborations Thesis defense: Samer Al-Kiswany

Upload: wright

Post on 20-Jan-2016

23 views

Category:

Documents


2 download

DESCRIPTION

Beyond Music Sharing: An Evaluation of Peer-to-Peer Data Dissemination Techniques in Large Scientific Collaborations. Thesis defense: Samer Al-Kiswany. /26. Samer Al-Kiswany. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

Beyond Music Sharing: An Evaluation of Peer-to-Peer

Data Dissemination Techniques in Large Scientific Collaborations

Thesis defense:

Samer Al-Kiswany

Page 2: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

2

Introduction

Data-intensive science: large-scale simulations and new scientific instruments generate huge volumes of data (PetaBytes).

User communities: large, geographically dispersed

Requirement : Efficient data dissemination tools

Samer Al-Kiswany /26

Page 3: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

3

Introduction - Example

Samer Al-Kiswany /26

Page 4: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

4

Question ?

What data dissemination strategies perform best in today's Grids deployments?

Data dissemination solutions: IP-Multicast, Bullet, BitTorrent, SPIDER, OMNI, ALMI, Logistical-Multicast, Narada, Scribe, GridoGrido, FastReplica… and many others.

Samer Al-Kiswany /26

Page 5: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

5

Workload characteristics

Deployment platform characteristics

Data dissemination proposed solutions

Evaluation Recommendations

What data dissemination strategies perform best in today's Grids deployments?

Roadmap

Samer Al-Kiswany /26

Page 6: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

6

Data-intensive scientific collaboration characteristics: Scale of data: massive data collections (TeraBytes) Data usage: Uniform popularity distributions, and co‑usage Near real time processing.

Workload and Deployment Platform

Resource availability: low churn rate, high node availability, well-provisioned networks.

Collaborative environments: no freeriding, thus less effort is needed to control fair resource sharing.

Deployment platform characteristics:

Samer Al-Kiswany /26

Page 7: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

7

Workload characteristics

Deployment platform characteristics

Data dissemination proposed solutions

Evaluation Recommendations

What data dissemination strategies perform best in today's Grids deployments?

Roadmap

Samer Al-Kiswany /26

Page 8: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

8

Classification of Approaches

TechniqueTechnique ProtocolProtocol

Tree based techniques ALM and SPIDER

Swarming Bullet and BitTorrent

Techniques employing intermediate storage capabilities

Logistical Multicasting

Base Cases:• IP-Multicast.• Parallel transfers: separate data channels from the source to

each destination.

Samer Al-Kiswany /26

Page 9: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

9

Separate Transfer from the Source to every Destination

/26

Drawbacks:

• Overwhelms the source – does not scale

• Generates high duplicate traffic at the links around the source

• Does not exploit all available transport capacity.

Page 10: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

10

IP Multicasting

/26

10

10

10

10

1010

10

10

1010

10

5

10

10

10

10

1010

10

10

1010

10

5

Page 11: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

11

IP Multicast

/26

Drawbacks:

• Limited deployment

• Vulnerability to nodes failures

• Does not exploit all available transport capacity.

• Throughput limited by bottleneck link

10

10

10

10

1010

10

10

10

10 10

5

Page 12: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

12

Tree Based Techniques: Application Level Multicast (ALM)

Source

1

3

2

4

5

6

Source

1 5

6 3 24

ALM Tree

/26

Page 13: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

13

Tree Based Techniques: Application Level Multicast (ALM)

/26

Source

1

3

2

4

5

6

Source

1 5

6 3 24

ALM Tree

Drawbacks:

• Vulnerability to nodes failures

• Does not exploit all possible routes in the network.

Page 14: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

14

Swarming Techniques: BitTorrent and Bullet

1 2 3 4Complete file

12

3

/26

4

Page 15: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

15

4

Swarming Techniques: BitTorrent and Bullet

1 2 3 4Complete file

1

2

3

4

1

/26

3

1

2

Page 16: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

16

Swarming Techniques: BitTorrent and Bullet

/26

1 2 3 4Complete file

12

3

4

1

1

2

3

4

Drawbacks:

• Generates high duplicate traffic.

Page 17: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

17

Logistical Multicasting

/26

Page 18: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

18

Roadmap

Question: What data dissemination strategies perform best in today's Grids deployments?

Evaluation

Workload characteristics

Deployment platform characteristics

Data dissemination proposed solutions

Recommendations Analytical Modeling Deployment based Simulation

Evaluation Approaches:

Samer Al-Kiswany /26

Page 19: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

19Samer Al-Kiswany

Methodology

Simulator Design:• Block-level simulation.• Simulates physical layer link-contention

/26

Inputs:- Real topologies of three deployed Grid testbeds: LCG, GridPP, EGEE.- Generated topologies: 100 (using BRITE)

Page 20: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

20

Methodology

Success criteria Metrics

Dissemination time Transfer time.

Overhead MB x hop

Load balancing Volume of in/out data.

Fairness Link stress

Samer Al-Kiswany /26

Page 21: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

21

Transfer Time

Number of destinations that have completed the file transfer for the original EGEE topology.

0

5

10

15

20

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19Time (10s)

# of

com

plet

ed tr

ansf

ers

. Logistical MT

0

5

10

15

20

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19Time (10s)

# of

com

plet

ed tr

ansf

ers

.

Bullet

ALM

Logistical MT

BitTorrent

0

5

10

15

20

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19Time (10s)

# of

com

plet

ed tr

ansf

ers

.

BulletALMIP-Multicast

Logistical MTBitTorrent

0

5

10

15

20

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19Time (10s)

# of

com

plet

ed tr

ansf

ers

.

BulletSeparate transfALMIP-MulticastLogistical MTBitTorrent

Samer Al-Kiswany /26

Page 22: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

22

Transfer Time – With reduced core-link bandwidth

Number of destinations that have completed the file transfer – EGEE topology with core bandwidth reduced to 1/8 of the

original one.

0

5

10

15

20

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30Time (10s)

# of

com

plet

ed tr

ansf

ers

.

Logistical MT

Conclusions:• On well-provisioned

topologies even naïve algorithms perform well.

• On constrained topologies application‑level techniques perform uniformly well: are among the first to finish the transfer with good intermediate progress.

0

5

10

15

20

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30Time (10s)

# of

com

plet

ed tr

ansf

ers

.

Bullet

ALM

Logistical MT

BitTorrent

0

5

10

15

20

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30Time (10s)

# of

com

plet

ed tr

ansf

ers

.

Bullet

ALM

IP-Multicast

Logistical MT

BitTorrent

0

5

10

15

20

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30Time (10s)

# of

com

plet

ed tr

ansf

ers

.

Bullet

Separate transf

ALM

IP-Multicast

Logistical MT

BitTorrent

Samer Al-Kiswany /26

Page 23: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

23

Summary

Motivating question: What data dissemination strategies perform best in today's Grids deployments?

In this project, we:

Simulated representative solutions.

Considering the characteristics of the workload and deployed platforms

Our results provide guidelines for selecting the data dissemination technique, depending on the:

Target environment.

Overall system workload characteristics.

Success Criteria.

Samer Al-Kiswany /26

Page 24: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

24

Research Publications

Samer Al-Kiswany /26

This work resulted in two refereed publications, and one journal submission:

Beyond Music Sharing: An Evaluation of Peer-to-Peer Data Dissemination Techniques in Large Scientific Collaborations, S. Al-Kiswany, M. Ripeanu, A. Iamnitchi, and S. Vazhkudai, Submitted to the Journal of Grid Computing.

Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?, S. Al-Kiswany, M. Ripeanu, A. Iamnitchi, and S. Vazhkudai, EuroPar, 2007, France.( acceptance rate = 26%)

A Simulation Study of Data Distribution Strategies for Large-scale Scientific Data Collaborations, S. Al-Kiswany and M. Ripeanu, IEEE CCECE 2007.

Page 25: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

25

Other Research Work

I am involved in another two research projects:

Scavenged Storage System stdchk: A Checkpoint Storage System for Desktop Grid

Computing A High-Performance GridFTP Server at Desktop Cost

StoreGPUExploiting the GPU for computationally intensive storage system operations.

Samer Al-Kiswany /26

Page 26: Beyond Music Sharing:   An Evaluation of Peer-to-Peer  Data Dissemination Techniques in

26

Thank you

www.ece.ubc.ca/~samera