task scheduling algorithm for multicore processor systems with turbo boost and hyper-threading

33
Yosuke Wakisaka Naoki Shibata Junji Kitamichi Keiichi Yasumoto Minoru Ito Nara Institute of Science and Technology University of Aizu Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading 1

Upload: naoki-shibata-

Post on 18-Nov-2014

173 views

Category:

Software


0 download

DESCRIPTION

Yosuke Wakisaka, Naoki Shibata, Keiichi Yasumoto, Minoru Ito, and Junji Kitamichi : Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading, In Proc. of The 2014 International Conference on Parallel and Distributed Processing Techniques and Applications(PDPTA'14), pp. 229-235 In this paper, we propose a task scheduling algorithm for multiprocessor systems with Turbo Boost and Hyper-Threading technologies. The proposed algorithm minimizes the total computation time taking account of dynamic changes of the processing speed by the two technologies, in addition to the network contention among the processors. We constructed a clock speed model with which the changes of processing speed with Turbo Boost and Hyper-threading can be estimated for various processor usage patterns. We then constructed a new scheduling algorithm that minimizes the total execution time of a task graph considering network contention and the two technologies. We evaluated the proposed algorithm by simulations and experiments with a multiprocessor system consisting of 4 PCs. In the experiment, the proposed algorithm produced a schedule that reduces the total execution time by 36% compared to conventional methods which are straightforward extensions of an existing method.

TRANSCRIPT

Page 1: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

Yosuke Wakisaka Naoki Shibata Junji Kitamichi‡ Keiichi Yasumoto Minoru Ito

Nara Institute of Science and Technology‡University of Aizu

Task Scheduling Algorithm forMulticore Processor Systems withTurbo Boost and Hyper-Threading

1

Page 2: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

2

• Multicore processors– Widely used in distributed computing environments

• Data centers, supercomputers, ordinary PCs, mobile devices

– Dynamic control of the CPU’s operating frequency• Turbo Boost(Intel), Turbo Core(AMD), Dynamic frequency scaling(nVidia

GPUs)

– Efficient computation while saving power• Hyper Threading(Intel), Simultaneous multithreading(IBM POWER)

Background

Page 3: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

3

Objective of Research

• Efficient task scheduling– Utilizing the latest technologies– Minimize total computation time

• No known scheduling algorithms takes account of these technologies

Dynamic scaling of operating speed by Turbo Boost and Hyper-Threading

Delay of communication by network contention

Page 4: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

4

Related Works(1/2)

• Task scheduling algorithm taking account of network contention [1]

• Formulates communication delay between processors

– Reduces network contention

– No multicore processors– No scaling of operating frequency

[1] O. Sinnen and LA. Sousa: ``Communication Contention in Task Scheduling," IEEE Transactions on Parallel and Distributed Systems, Vol. 16, No. 6, pp. 503-515, 2005.

Page 5: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

5

Related Works(2/2)

• Task scheduling algorithm taking account of multicore processor and fail-stop model [2]– Considers both network contention and failure of

multiple processors on a single die– No consideration for change of operating

frequency

[2] S. Gotoda, N. Shibata and M. Ito : ``Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault," International Symposium on Cluster, Cloud, and Grid Computing (CCGrid), pp. 260-267, 2012.

Page 6: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

6

Outline

1. Background

2. Related Works

3. Modelling of Computing System

4. Proposed Method

5. Evaluation

6. Conclusion

Page 7: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

Task Graph• A group of tasks that can be

executed in parallel• Vertex (task node)

– Task to be executed on a single CPU core

• Edge (task link)– Data dependence between tasks

7

Task node Task link

Task graph

Page 8: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

Processor Graph• Topology of the computer network• Vertex (Processor node)

– CPU core (circle)• has only one link

– Switch (rectangle)• has more than 2 links

• Edge (Processor link)– Communication path between

processors

8

Processor node Processor linkSwitch

Processor graph

321

Page 9: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

Task Scheduling• Task scheduling problem

– assigns a processor node to each task node

– minimizes total execution time– An NP-hard problem

9

1

One processor node is assigned to each task node

321

Processor graph

Task graph

Page 10: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

10

Cor

e2

2 processors are working

Turbo Boost• Technology for accelerating multicore processors

– Scales operating frequencies according to the usage of each core

– Higher frequency if small number of cores are working– Lower frequency if all of the cores are working

• Assumption– Operating frequency only depends on usage of each core

• It does not depend on temperature

– Frequency is switched instantly, anytime

Cor

e0

Cor

e1

Cor

e3

4 processors are working

Idle

3 processors are workingFrequency

Page 11: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

11

Hyper-Threading

Physical processor core

HardwareResource

Register file 1

Register file 2

Thread 1

Thread 2

Hardware Thread 1

Hardware Thread 2

• Multiple hardware threads are executed on a single physical processor– Hardware resources are shared among multiple threads– Reduces performance penalty by pipeline stall

• We define Effective operating frequency • Speed index reflecting reduction of executing speed by

sharing resources between hardware threads

Page 12: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

12

Modelling Frequency Control by Turbo Boost and Hyper-Threading

• Effective operating frequency depends on only– The number of working processors– The property of executed program

• We measure the operating frequencies corresponding to processor states beforehand– Programs with different properties are executed on

multiple cores at a time– We measure the operating frequency of the real

devices

Page 13: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

13

Modelling Effective Frequency

State of cores Frequency

[ c , i ] , [ i , i ] , [ i , i ] , [ i , i ] To be measured

[ c , i ] , [ c , i ] , [ i , i ] , [ i , i ] To be measured

Operating frequency when Turbo Boost is enabled( The processor has 4 cores 8 hardware

threads )

i:idle, c:computation heavy

m : memory access heavy, n : In between c and m

• We want to predict the operating frequency from the number of working cores– We changed the number of working cores– We measured the operating frequency

Physical core Hardware thread

Page 14: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

14

Results of Measurement

State Freq Ratio Effective freq.

[ c , c ] 3.1GHz 0.84 2.6GHz

[ m , m ] 3.1GHz 0.76 2.3GHz

[ n , n ] 3.1GHz 0.79 2.5GHz

Only Turbo Boost is used

Only Hyper Threading is used

State of cores i : idle c : computation heavy m : memory-access

heavy n : In between c and m

When Turbo Boost is used Effective freq. depends on

number of working cores When Hyper-Threading is

used Effective freq. depends on

properties of programs

State of cores Effective freq.

[ c , i ] [ i , i ] [ i , i ] [ i , i ] 3.7GHz

[ c , i ] [ c , i ] [ i , i ] [ i , i ] 3.5GHz

[ c , i ] [ c , i ] [ c , i ] [ i , i ] 3.3GHz

[ c , i ] [ c , i ] [ c , i ] [ c , i ] 3.1GHz

[ m , i ] [ i , i ] [ i , i ] [ i , i ] 3.7GHz

[ m , i ] [ m , i ] [ m , i ] [ m , i ] 3.1GHz

[ n , i ] [ i , i ] [ i , i ] [ i , i ] 3.7GHz

[ n , i ] [ n , i ] [ n , i ] [ n , i ] 3.1GHz

Page 15: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

15

Outline

1. Background

2. Related Works

3. Modelling of Computing System

4. Proposed Method

5. Evaluation

6. Conclusion

Page 16: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

16

Problem in Existing Scheduling

• Tasks are assigned in order• Assumes fixed executing speed

• When operating frequency is dynamically changed– Operating frequency is high when the first task is

assigned– Frequency is reduced as many tasks are assigned to

coresCore1

Core2 

Core3

Core4

aResulting schedule

Existing scheduling technique

:Freq. for 1 core execution

1 2 3 4

a b c12

1 2

1

d e

:Freq. for 2 core execution

Execution Time

Page 17: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

17

• Tasks are assigned in order• Assumes fixed executing speed

• When operating frequency is dynamically changed– Operating frequency is high when the first task is

assigned– Frequency is reduced as many tasks are assigned to

cores

Problem in Existing Scheduling

Core1

Core2 

Core3

Core4

aResulting schedule

:Freq. for 1 core execution

1 2 3 4

a b c12

1 2

1

d e

:Freq. for 2 core execution

b

Execution Time

Existing scheduling technique

Page 18: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

18

• Tasks are assigned in order• Assumes fixed executing speed

• When operating frequency is dynamically changed– Operating frequency is high when the first task is

assigned– Frequency is reduced as many tasks are assigned to

cores

Problem in Existing Scheduling

:Freq. for 1 core execution

:Freq. for 2 core execution

aResulting schedule

1 2 3 4

a b c12

1 2

1

d e

Core1

Core2 

Core3

Core4

a

b

Execution Time

Existing scheduling technique

Page 19: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

19

• Tasks are assigned in order• Assumes fixed executing speed

• When operating frequency is dynamically changed– Operating frequency is high when the first task is

assigned– Frequency is reduced as many tasks are assigned to

cores

Problem in Existing Scheduling

:Freq. for 1 core execution

:Freq. for 2 core execution

Core1

Core2 

Core3

Core4

Resulting schedule

a

1 2 3 4

a b c12

1 2

1

d e

b

Execution time for a is increased because b is assigned

Execution Time

Existing scheduling technique

Page 20: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

20

• Execution time changes by assigning following tasks• We cannot know which core is assigned to which task until

the scheduling is finished

Problem and SolutionProblem

• Tentatively assign tasks and estimate the operating frequencies

Solution

Page 21: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

21

Proc2

N1

Execution TimeProc

1

Core1

Core2

Core3

Core4

Estimating Execution Time of Tasks

Node

1

32 4

Link

5

a

c db

d

2 2 2

22 21 43

Task graph Processor graph

comm.TB

1coreTB

2Core HT

• Initial assignment of the first task

Page 22: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

22

Proc2

N1

Execution TimeProc

1

Core1

Core2

Core3

Core4

Estimating Execution Time of Tasks

Node

1

32 4

Link

5

a

c db

d

2 2 2

22 21 43

comm.TB

1coreTB

2Core HT

N 2

• Initial assignment of the second task

Task graph Processor graph

Page 23: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

23

Proc2

N1

N4

C

Execution TimeProc

1

Core1

Core2

Core3

Core4

Tentative Assignment

Node

1

32 4

Link

5

a

c db

d

2 2 2

22 21 43

comm.TB

1coreTB

2Core HT

• We calculate communication time using the existing method

• Operating frequency is tentatively fixed

N3

C

N5

Tentatively assigned

N2

Task graph Processor graph

Page 24: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

24

Proc2

• Operating frequency is determined by observing usage of each core

C

Execution TimeProc

1

Core1

Core2

Core3

Core4

Determining Operating Freq.

Node

1

32 4

Link

5

a

c db

d

2 2 2

22 21 43

comm.TB

1coreTB

2Core HT

C

N1

N3

N4

N2 N5

Task graph Processor graph

Page 25: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

25

Proc2

C

Proc1

Core1

Core2

Core3

Core4

C

N1

N4

N2

N5

Selecting Processor to Assign

• Processor is assigned so that the whole processing time of the task graph is minimized

Proc2

C

Execution TimeProc

1

Core1

Core2

Core3

Core4

comm.TB

1coreTB

2Core HT

C

N1

N3

N4

N2 N5

Execution Time

C

N3 C

0 5 10 0 5 109 11

Compare the execution time of the whole task graph

Page 26: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

26

Outline

1. Background

2. Related Works

3. Modelling of Computing System

4. Proposed Method

5. Evaluation

6. Conclusion

Page 27: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

27

Evaluation• Items to be evaluated

– Execution time of the whole task graph– Accuracy of estimation of execution time

• Compared method– We extended the Sinnen’s scheduling method in a straight-

forward way• Sinnen’s method only considers network contention• We switched the subroutine to calculate the task execution time with our

model• The first method only assigns to the physical

cores ( SinnenPhysical )• The second method assigns to all hardware

threads ( SinnenLogical )

Page 28: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

28

Configuration

• System– CPU:Intel Core i7 3770T(2.5GHz, 4 cores, 8 hardware

threads, single socket)– Memory : 16.0GB– OS:Windows7 SP1(64bit)– Java(TM) SE (1.6.0 21, 64bit)– Gigabit LAN subsystem using the Intel 82579V Gigabit

Ethernet Controller

• Network– 4 computers are connected with Gigabit Ethernet– 420Mbps bandwidth (Measured with the real system)

Page 29: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

29

Network Topologies for Evaluation

1 2 3 4 1 2 3 4

1234

1 2 3 4 1 2 3 4

1 2 3 4 1 2 3 4

1234

1 2 3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Star16 processors

5 switches

Mesh16 processors

4 switches( Only for simulation )

Tree16 processors

7 switches

Page 30: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

30

Task Graphs for Evaluation High parallelism ( Matrix Solver )

98 nodes 67 edges

Low parallelism ( Robot Control ) 90 nodes 135 edges

T. Tobita and H. Kasahara: ``A standard task graph set for fair evaluation of multi-processor scheduling algorithms," Journal of Scheduling, Vol. 5, pp. 379-394,2002.

Page 31: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

31

Execution Time of Whole Task Graph

• Execution time is reduced by up to 35%

• Higher parallelism leads to more reduction– Because of greater freedom of choosing which core to assign

Task

Exe

cuti

on

Tim

e (s

ec)

16% 35% 9%

24%

Robot Solver Robot SolverTree Star

0

50

100

150

200

250

SinnenPhysical SinnenLogical Proposed

Page 32: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

32

Robot Solver Robot SolverTree Star

0

50

100

150

200

250

SinnenPhysical Simulation SinnenPhysical Experiment System SinnenLogical SimulationSinnenLogical Experiment System Proposed Simulation Proposed Experiment System

Accuracy of Estimation of Execution Time

• Estimation error is 4% in average, 7% at maximum

Task

Exe

cuti

on

Tim

e (s

ec)

5% 6

% 7%

7% 6

% 5%

5%

6%

4% 1

% 2% 1

%

Page 33: Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

33

Conclusion

• Task scheduling method for multicore processor

systems– Considering dynamic scaling of operating

frequency by Turbo Boost and Hyper Threading– Network contention

• We modeled frequency control by Turbo Boost and Hyper-Threading