1 a distributed task scheduler optimizing data transfer time...

31
1 A distributed Task Scheduler Optimizing Data Transfer Time ( デデデデデデデデデデデデデデデデデデデデデデデデ ) Taura lab. Kei Takahashi (56428)

Upload: joella-robbins

Post on 05-Jan-2016

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

1

A distributed Task SchedulerOptimizing Data Transfer Time(データ転送時間を最適化する分散タスクスケジュラー )

Taura lab. Kei Takahashi (56428)

Page 2: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

2

Task SchedulersTask Schedulers A system which distributes many serial

tasks onto available nodes►Task assignments►Data transfers

Users can easily execute many serial tasks in parallel on distributed environment

A system which distributes many serial tasks onto available nodes►Task assignments►Data transfers

Users can easily execute many serial tasks in parallel on distributed environment

AA

SchedulerScheduler

File File BB CC DD

File File File File File File File File

TaskTask TaskTask TaskTask TaskTask TaskTask

Page 3: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

3

Data Intensive ApplicationsData Intensive Applications A computation using a large amount of data

►Natural language processing, data mining, etc. Data transfer time takes considerable part of

the total processing time►Example: parsing a set of data collected by cra

wlers located in some hosts• Typical network bandwidth: 1Gbps~50Mbps• Throughput of Chasen parser: 1.7MB/s = 14Mbps• In the worst case, the transfer takes 20% of the tot

al processing time

A computation using a large amount of data►Natural language processing, data mining, etc.

Data transfer time takes considerable part of the total processing time►Example: parsing a set of data collected by cra

wlers located in some hosts• Typical network bandwidth: 1Gbps~50Mbps• Throughput of Chasen parser: 1.7MB/s = 14Mbps• In the worst case, the transfer takes 20% of the tot

al processing timecomputationcomputation

computationcomputation

Transfer Time

Page 4: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

4

Considering Transfer CostConsidering Transfer Cost GrADS[1] considers transfer time when

scheduling tasks►Measure bandwidth between any two hosts ►Estimate transfer time by using the bandwidth►Assign task based on the time

Since they only use static bandwidth values, their prediction can be far from the real network behavior when multiple data transfer share a link

GrADS[1] considers transfer time when scheduling tasks►Measure bandwidth between any two hosts ►Estimate transfer time by using the bandwidth►Assign task based on the time

Since they only use static bandwidth values, their prediction can be far from the real network behavior when multiple data transfer share a link

[1] Mandal. et al. "Scheduling Strategies for Mapping Application Workflows onto the Grid“ in IEEEInternational Symposium on High Performance Distributed Computing (HPDC 2005)

Page 5: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

5

Using ReplicasUsing Replicas Machida et. al[1] have developed a scheduler whic

h utilizes multiple replicas►When data are copied from the source to another nod

e, the copied node is registered as a replica►When another node requires the same data, the data i

s copied from the nearest replicas

In this work, a task schedule itself is not optimized. Also, it does not care about the effect caused by of link sharing of multiple transfers

Machida et. al[1] have developed a scheduler which utilizes multiple replicas►When data are copied from the source to another nod

e, the copied node is registered as a replica►When another node requires the same data, the data i

s copied from the nearest replicas

In this work, a task schedule itself is not optimized. Also, it does not care about the effect caused by of link sharing of multiple transfers

[1] 町田悠哉 滝澤真一朗 中田秀基 松岡聡 `` レプリカ管理システムを利用したデータインテンシブアプリケーション向けスケジューリングシステム '‘ (HPCS2006)

File File

File File

FileFile File

File File

File

Page 6: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

6

Effects of Link SharingEffects of Link Sharing If several transfers share a link, the sum of their

throughputs cannot exceed the link bandwidth► throughput(File1) + throughput (File2)

<= (Link bandwidth)

Bandwidth between two hosts varies when multiple data are transferred►The value is reduced in 25% if 4 transfers share a link

If several transfers share a link, the sum of their throughputs cannot exceed the link bandwidth► throughput(File1) + throughput (File2)

<= (Link bandwidth)

Bandwidth between two hosts varies when multiple data are transferred►The value is reduced in 25% if 4 transfers share a link

File1 File1

File 2File 2

File1 File1

50Mbps

100Mbps 50Mbps

Page 7: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

7

Considering TopologyConsidering Topology The throughput is limited by the narrowest link in t

he transfer The throughput may become larger by altering

source of transfers or changing task assignment

The throughput is limited by the narrowest link in the transfer

The throughput may become larger by altering source of transfers or changing task assignment

Throughputs of the two transfers are limited on this link

File 2File 2 File1File1File1 File1 Task1Task1 Task2Task2

File1 can be transferred from the other source

Page 8: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

8

Research PurposeResearch Purpose Design and implement an efficient

distributed task scheduler for data intensive applications►Minimize data transfer time by using network

topology and bandwidth information• Create a schedule which needs less transfers• Plan multicasts for data transfers• Maximize throughput by using linear programming

Design and implement an efficient distributed task scheduler for data intensive applications►Minimize data transfer time by using network

topology and bandwidth information• Create a schedule which needs less transfers• Plan multicasts for data transfers• Maximize throughput by using linear programming

Page 9: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

9

AgendaAgenda Background Purpose Our Approach

►Task Scheduling Algorithm►Transfer Planning Algorithm

Conclusion

Background Purpose Our Approach

►Task Scheduling Algorithm►Transfer Planning Algorithm

Conclusion

Page 10: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

10

Input and OutputInput and Output Input:

►Data locations►Network topology and bandwidth►Task information (required data)

Output:►Task Scheduling:

• Assignment of nodes to tasks

►Transfer Scheduling:• From/to which host data are transferred• Limit bandwidth during the transfer if needed

Final goal: Minimizing the total completion time

Input:►Data locations►Network topology and bandwidth►Task information (required data)

Output:►Task Scheduling:

• Assignment of nodes to tasks

►Transfer Scheduling:• From/to which host data are transferred• Limit bandwidth during the transfer if needed

Final goal: Minimizing the total completion time

Page 11: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

11

Immediate GoalImmediate Goal Final goal: Minimizing the total completion time Our idea is to minimize data transfer time Immediate goal: Maximizing the total of data arriv

al throughput on each node

Final goal: Minimizing the total completion time Our idea is to minimize data transfer time Immediate goal: Maximizing the total of data arriv

al throughput on each node))((

__ __,

nodeeveryfori fileeveryforjjifilebandwidth

(filei, j : the j th file required by taski )

Maximize the total of these throughputsMaximize the total of these throughputs

Page 12: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

12

The Whole AlgorithmThe Whole Algorithm

Create Initial Task Schedule

Plan Efficient Transfers

Improve the Schedule

Plan Multicast

Optimize Throughput

Re-plan Multicast

Assign tasks to nodes

Some nodes are unscheduledSome nodes are unscheduled

Transfer FilesTransfer Files

Execute TasksExecute Tasks

(A) Task Scheduling Algorithm

(B)TransferPlanning

Algorithm

Page 13: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

13

Task Scheduling AlgorithmTask Scheduling Algorithm

Create Initial Task Schedule

Plan Efficient Transfers

Improve the Schedule

Plan Multicast

Optimize Throughput

Re-plan Multicast

Assign tasks to nodes

Some nodes are unscheduledSome nodes are unscheduled

Transfer FilesTransfer Files

Execute TasksExecute Tasks

(A) Task Scheduling Algorithm

Page 14: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

14

Task Scheduling AlgorithmTask Scheduling Algorithm When some nodes are unscheduled,

►Create candidate task schedules►Plan efficient file transfers

• From which node file is transfered• Bandwidth of each file transfer

►Search for the best schedule which maximizes data transfer throughput(by using heuristics like GA, SA)

Decide the task schedule, and start file transfers and task executions

When some nodes are unscheduled,►Create candidate task schedules►Plan efficient file transfers

• From which node file is transfered• Bandwidth of each file transfer

►Search for the best schedule which maximizes data transfer throughput(by using heuristics like GA, SA)

Decide the task schedule, and start file transfers and task executions

Page 15: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

15

Transfer Planning AlgorithmTransfer Planning Algorithm

Create Initial Task Schedule

Plan Efficient Transfers

Improve the Schedule

Plan Multicast

Optimize Throughput

Re-plan Multicast

Assign tasks to nodes

Some nodes are unscheduledSome nodes are unscheduled

Transfer FilesTransfer Files

Execute TasksExecute Tasks

(B)TransferPlanning

Algorithm

Page 16: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

16

Transfer Planning AlgorithmTransfer Planning Algorithm When source nodes and destination nodes for ea

ch file is given:►Decide source node for each destinations by using Kru

skal's Algorithm• If multiple destinations uses one source, the data are multic

asted

►Calculate bandwidth value for each transfer to maximize throughput by using linear programming

► If the originally chosen source is not optimal, modify the multicast tree by using a new bandwidth topology

When source nodes and destination nodes for each file is given:►Decide source node for each destinations by using Kru

skal's Algorithm• If multiple destinations uses one source, the data are multic

asted

►Calculate bandwidth value for each transfer to maximize throughput by using linear programming

► If the originally chosen source is not optimal, modify the multicast tree by using a new bandwidth topology

Page 17: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

17

Planing MulticastPlaning Multicast

Create Initial Task Schedule

Plan Efficient Transfers

Improve the Schedule

Plan Multicast

Optimize Throughput

Re-plan Multicast

Assign tasks to nodes

Some nodes are unscheduledSome nodes are unscheduled

Transfer FilesTransfer Files

Execute TasksExecute Tasks

(B)TransferPlanning

Algorithm

Page 18: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

18

Pipeline Multicast (1)Pipeline Multicast (1) For a given schedule, it is known which

nodes require which files When multiple nodes need a common file,

pipeline multicast shortens transfer time(in the case of large files)

The speed of a pipeline broadcast is limited by the narrowest link in the tree

A broadcast can be sped up by efficiently using multiple sources

For a given schedule, it is known which nodes require which files

When multiple nodes need a common file, pipeline multicast shortens transfer time(in the case of large files)

The speed of a pipeline broadcast is limited by the narrowest link in the tree

A broadcast can be sped up by efficiently using multiple sources

Page 19: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

19

Pipeline Multicast (2)Pipeline Multicast (2) The tree is constructed in depth-first manner Every related link is only used twice

(upward/downward) Since disk access is as slow as network, the disk

access bandwidth should be also counted

The tree is constructed in depth-first manner Every related link is only used twice

(upward/downward) Since disk access is as slow as network, the disk

access bandwidth should be also counted

SourceDestination

Page 20: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

20

Multi-source MulticastMulti-source Multicast M nodes have the same source data; N nodes need it For each link in the order of bandwidth:

► If the link connects two nodes/switches which are already connected to the source node:

→ Discard the link►Otherwise: → Adopt the link(Kruskal's Algorithm: it maximizes the narrowest link in the pipeline

s)

M nodes have the same source data; N nodes need it For each link in the order of bandwidth:

► If the link connects two nodes/switches which are already connected to the source node:

→ Discard the link►Otherwise: → Adopt the link(Kruskal's Algorithm: it maximizes the narrowest link in the pipeline

s)

Do not use this link

Pipeline 1

Pipeline 2Source Destination

Page 21: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

21

Maximizing ThroughputMaximizing Throughput

Create Initial Task Schedule

Plan Efficient Transfers

Improve the Schedule

Plan Multicast

Optimize Throughput

Re-plan Multicast

Assign tasks to nodes

Some nodes are unscheduledSome nodes are unscheduled

Transfer FilesTransfer Files

Execute TasksExecute Tasks

(B)TransferPlanning

Algorithm

Page 22: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

22

Maximizing ThroughputMaximizing Throughput After constructing multicast trees for every file,

decide the bandwidth each transfer uses►By using linear programming

• Maximize: (bw0 + bw1 + bw2 * 3 + bw3 * 2 + bw4)

• Conditions: bw0 + bw1 ≤ (const)

• bw1 + bw3 ≤ (const)

• bw0 + bw2 + bw3 ≤ (const)

• For local data, use disk access cost as the bandwidth

After constructing multicast trees for every file, decide the bandwidth each transfer uses►By using linear programming

• Maximize: (bw0 + bw1 + bw2 * 3 + bw3 * 2 + bw4)

• Conditions: bw0 + bw1 ≤ (const)

• bw1 + bw3 ≤ (const)

• bw0 + bw2 + bw3 ≤ (const)

• For local data, use disk access cost as the bandwidth

30Mbps

60Mbps

50Mbps

100Mbps

100Mbps

bw0

bw1

bw2

bw3

bw4

Page 23: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

23

Re-planning MulticastRe-planning Multicast

Create Initial Task Schedule

Plan Efficient Transfers

Improve the Schedule

Plan Multicast

Optimize Throughput

Re-plan Multicast

Assign tasks to nodes

Some nodes are unscheduledSome nodes are unscheduled

Transfer FilesTransfer Files

Execute TasksExecute Tasks

(B)TransferPlanning

Algorithm

Page 24: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

24

Re-planning MulticastRe-planning Multicast Now every transfer schedule is decided, but it may not be

optimal. Since multicast trees are planned independently, unnecessary conflictions may occur.

By re-planning multicast by using current bandwidth information, the multicast trees are optimized

► When re-optimizing a transfer, first create an available bandwidth map and construct multicast trees

Now every transfer schedule is decided, but it may not be optimal. Since multicast trees are planned independently, unnecessary conflictions may occur.

By re-planning multicast by using current bandwidth information, the multicast trees are optimized

► When re-optimizing a transfer, first create an available bandwidth map and construct multicast trees

Oldroute

Pink links: available bandwidth

New route

Page 25: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

25

Improve Task ScheduleImprove Task Schedule

Create Initial Task Schedule

Plan Efficient Transfers

Improve the Schedule

Plan Multicast

Optimize Throughput

Re-plan Multicast

Assign tasks to nodes

Some nodes are unscheduledSome nodes are unscheduled

Transfer FilesTransfer Files

Execute TasksExecute Tasks

Page 26: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

26

Improve Task ScheduleImprove Task Schedule After efficient data transfer plan has

obtained, the scheduler tries to reduce the transfer size by altering the task schedule

We are thinking of using GA or Simulated Annealing. Since the most crowded link has found, we can try to reduce transfers on this link in the mutation phase.

After efficient data transfer plan has obtained, the scheduler tries to reduce the transfer size by altering the task schedule

We are thinking of using GA or Simulated Annealing. Since the most crowded link has found, we can try to reduce transfers on this link in the mutation phase.

Page 27: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

27

Actual TransfersActual Transfers After the transfer schedule is determined, th

e plan is performed as simulated The bandwidth of each transfer is limited to

the previously calculated value When detecting a significant change in ban

dwidth, the schedule is reconstructed►The bandwidth is measured by using existing

methods (eg. Nettimer[1])

After the transfer schedule is determined, the plan is performed as simulated

The bandwidth of each transfer is limited to the previously calculated value

When detecting a significant change in bandwidth, the schedule is reconstructed►The bandwidth is measured by using existing

methods (eg. Nettimer[1])

[1] Kevin Lai et al. ``Measuring Link Bandwidths Using a Deterministic Model of Packet Delay''SIGCOMM '00, Stockholm, Sweden.

Page 28: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

28

Re-scheduling TransfersRe-scheduling Transfers When one of the following events occurs, b

andwidth assignments are recalculated►A transfer has finished►Bandwitdth has changed►New tasks are scheduled

When one of the following events occurs, bandwidth assignments are recalculated►A transfer has finished►Bandwitdth has changed►New tasks are scheduled

30Mbps

60Mbps

50Mbps

100Mbps

100Mbps

bw0

bw1

bw2

bw3

bw4

90Mbps

40Mbps

Page 29: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

29

Current SituationCurrent Situation The algorithm has determined The implementation is ongoing

►Plan data transfers when topology and a task schedule is given

►Create a schedule with heuristics►Perform the real file transfer and task

execution Evaluation will be done by comparing to

existing schedulers

The algorithm has determined The implementation is ongoing

►Plan data transfers when topology and a task schedule is given

►Create a schedule with heuristics►Perform the real file transfer and task

execution Evaluation will be done by comparing to

existing schedulers

Page 30: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

30

ConclusionConclusion Introduced a new scheduling algorithm

►Predict transfer time by using network topology, and search for a better task schedule

►Plan an efficient multicast►Maximize throughput by linear programming

and by limiting bandwidth►Dynamically re-scheduling transfers

Introduced a new scheduling algorithm►Predict transfer time by using network

topology, and search for a better task schedule►Plan an efficient multicast►Maximize throughput by linear programming

and by limiting bandwidth►Dynamically re-scheduling transfers

Page 31: 1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab

31

PublicationsPublications 高橋慧 , 田浦健次朗 , 近山隆 . マイグレーションを支援する分散集合オブジ

ェクト.並列/分散/協調処理に関するサマーワークショップ ( ポスター発表 ) ( SWoPP2005 ),武雄, 2005 年 8 月.

高橋慧 , 田浦健次朗 , 近山隆 . マイグレーションを支援する分散集合オブジェクト . 先進的計算基盤シンポジウム( SACSIS 2005 ),筑波, 2005 年 5月.

高橋慧 , 田浦健次朗 , 近山隆 . マイグレーションを支援する分散集合オブジェクト.並列/分散/協調処理に関するサマーワークショップ ( ポスター発表 ) ( SWoPP2005 ),武雄, 2005 年 8 月.

高橋慧 , 田浦健次朗 , 近山隆 . マイグレーションを支援する分散集合オブジェクト . 先進的計算基盤シンポジウム( SACSIS 2005 ),筑波, 2005 年 5月.