research article probabilistic analysis of steady-state...

Research ArticleProbabilistic Analysis of Steady-State Temperature andMaximum Frequency of Multicore Processors consideringWorkload Variation

Biying Zhang12 Zhongchuan Fu1 Hongsong Chen3 and Gang Cui1

1School of Computer Science and Technology Harbin Institute of Technology Harbin 150001 China2School of Computer and Information Engineering Harbin University of Commerce Harbin 150028 China3School of Computer and Communication Engineering University of Science and Technology Beijing Beijing 100083 China

Correspondence should be addressed to Zhongchuan Fu 15124585561163com

Received 3 December 2015 Accepted 6 April 2016

Academic Editor Veljko Milutinovic

Copyright copy 2016 Biying Zhang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

A probabilistic method is presented to analyze the temperature and the maximum frequency for multicore processors based onconsideration of workload variation in this paper Firstly at the microarchitecture level dynamic powers are modeled as the linearfunction of IPCs (instructions per cycle) and leakage powers are approximated as the linear function of temperature Secondlythe microarchitecture-level hotspot temperatures of both active cores and inactive cores are derived as the linear functions of IPCsThe normal probabilistic distribution of hotspot temperatures is derived based on the assumption that IPCs of all cores follow thesame normal distribution Thirdly and lastly the probabilistic distribution of the set of discrete frequencies is determined It canbe seen from the experimental results that hotspot temperatures of multicore processors are not deterministic and have significantvariations and the number of active cores and running frequency simultaneously determine the probabilistic distribution of hotspottemperatures The number of active cores not only results in different probabilistic distribution of frequencies but also leads todifferent probabilities for triggering DFS (dynamic frequency scaling)

1 Introduction

Continuous technology scaling and miniaturization haveescalated the power density and temperature of multicoreprocessors In order to decrease manufacturing costs thepackages of multicore processors are mostly designed basedon average power dissipation instead of the maximum andtemperature is controlled with dynamic thermal manage-ment (DTM) techniques such as dynamic voltage and fre-quency scaling (DVFS) and dynamic frequency scaling (DFS)[1]When temperature of processor reaches or approaches thecritical point DVFS orDFS are invoked to ensure the thermalconstraint at the cost of sacrificing the speed of processorsTherefore it is crucial for design space exploration to analyzetemperature and running frequency accurately and fast at theearly stage

11 Motivation To explore the design space of thermal-aware multicore processors at the early stage some thermalmodels have been proposed to estimate the temperature andperformance of processors [2ndash5] and most of estimationapproaches are based on transient analysis [6ndash14] For tran-sient analysis temporal variations of temperature and per-formance depending on workloads are traced contributingto high estimation accuracy However transient analysis istime-consuming and in particular for multicore processorstime complexity is unacceptable at the early design stageAccordingly to speed up the estimation of temperature andperformance of multicore processors researchers resort tosteady-state analysis [15 16] Nevertheless to the best of ourknowledge all previous work related to steady-state analysisis based on the assumption that every workload has the samethermal contribution which greatly hinders the estimation

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2016 Article ID 2462504 17 pageshttpdxdoiorg10115520162462504

2 Mathematical Problems in Engineering

accuracy In fact temperature of multicore processors hasgreat variations between different workloads [10 17] Ourpreliminary work has demonstrated that the dynamic powerof processors is highly correlated with IPCs (instructions percycle) and that within a small temperature range leakagepower linearly depends on temperature [18] According to theHotSpot thermal model temperature can be derived giventhe power of processors [3] Thus processor temperature hashigher correlation with IPC According to CLT (central limittheorem) when a large number of instructions are executedin the processor the probabilistic distribution of IPC tendsto follow the normal distribution Accordingly given boththe probabilistic distribution of IPC and the relationshipbetween the temperature and IPC the probabilistic distribu-tion of processor temperature can be derived and analyzedSubsequently the probabilistic distribution of the maximumrunning frequency can be inferred given that the zero-slackDTM policy is used by the processor which means thatthe speed of the processor is set to a value which makesthe temperature of the hotspot be the maximum thresholdallowed by the processor [7]

12 Contributions In this paper a probabilistic methodis proposed to analyze the steady-state temperature andfrequency of multicore processors taking into account thevariation of workloads In order to simplify the analyzingprocesses DFS technique rather than DVFS technique isadopted to manage the temperatures of processor where thevoltage is constant and only frequency is adjusted And thedynamic power can be modeled as the linear function of thefrequency [12 16]Themain contributions of this work are asfollows

(i) At the microarchitecture level the dynamic powerof processors is modeled as the linear function ofIPC and running frequency and the leakage powerof processor is approximated as the linear model oftemperature

(ii) The microarchitecture-level hotspot temperatures ofboth active cores and inactive cores are derived as thelinear functions of IPCs of all active cores

(iii) It is inferred that the hotspot temperatures of bothactive cores and inactive cores follow the normalprobabilistic distribution based on the assumptionthat IPCs of all active cores follow the same normaldistribution

(iv) The probabilistic distribution of the set of frequenciesis determined given the zero-slack DTM policy [7]

The remainder of this paper is organized as followsIn Section 2 related work is overviewed In Section 3 themicroarchitecture-level steady-state temperature of a core isformulated as the linear function of its powers based on theHotspot thermalmodel In Section 4 at themicroarchitecturelevel the dynamic powers of processors are modeled asthe linear function of IPC and the running frequency andthe leakage powers are approximated as the linear modelof temperature In Section 5 the microarchitecture-level

hotspot temperatures of both active cores and inactive coresare derived as the linear function of IPCs of all active coresIt is inferred that the hotspot temperatures of both activecores and inactive cores follow the normal probabilisticdistribution based on the assumption that the IPCs of allactive cores follow the same normal probabilistic distribu-tion In Section 6 the probabilistic distribution of the set offrequencies is determined given the zero-slack DTM policyIn Section 7 experimental results are presentedThis paper isconcluded in Section 8

2 Related Work

The estimation approach of temperature and performance ofprocessors can be classified into transient analysis and steady-state analysis and so far most researches have been based ontransient analysis

21 Transient Analysis In order to transiently analyze thetemperature and explore the design space of processors at theearly stage several thermal models for processors have beenproposed Skadron et al and Huang et al [2 3] presenteda compact thermal modeling methodology based on theanalogy between thermal and electrical phenomena namelyHotSpot Using HotSpot the spatial and temporal variationsof processor temperature can be obtained through transientanalysis To improve the accuracy of thermal simulation Janget al [11]made an extension to the thermalmodel forHotSpotby taking into account the different ambient temperatureowing to workload variations To accelerate thermal analysisof multicore processors at the architecture level Wang et al[5] presented a composite thermal model termed Therm-Comp to optimize the model for different large processorsLi et al [4] proposed a parameterized architecture-leveldynamic thermal model namely ParThermPOF in whichmany parameters can be set such as the location of thermalsensors and the conductivity of different components

In order to improve the performance of thermal-awaremulticore processors various DTM techniques have beeninvestigated based on thermal models such as Hotspotand analyzed transiently for the estimation of performanceHanumaiah et al [7 19] presented an online thermal man-agement algorithm for thermal-aware multicore processorsin which DVFS and task allocation techniques are simulta-neously adopted In the context of hard real-time systemsthe time-varying voltage and frequency of multicores arecomputed to satisfy not only the thermal constraint butalso the deadline constraint [6] Shi et al [12] presented aDTM policy under soft thermal constraint in which thetemperature constraint can be exceeded sometimes

In order to simulate the thermal behavior fast andaccurately for multicore processors several researches havebeen performed Wojciechowski et al [10 20] analyzedthe transient characteristics of workloads based on a finiteFourier series expansion to accurately predict the thermalbehavior of multicore processors and a new DVFS approachis presented Liu et al [13] proposed a transient analysismethod of temperature of multicore processors based on

Mathematical Problems in Engineering 3

moment matching and it is used to guide the migrationprocesses of tasks To account for the nondeterministicbehavior of tasks in terms of executing times and decisionbranches Das et al [14] formulated thermal analysis as ahybrid automata reachability verification problem and analgorithm for constructing the automata was provided

For transient analysis temporal variations of temperatureand performance depending onworkload are traced and thenhigh estimation accuracy is obtained However transientanalysis is time-consuming and in particular for multicoreprocessors time complexity is unacceptable at the early designstage

22 Steady-State Analysis In order to speed up the estima-tion of temperature and performance at the early designstage of multicore processors researchers resort to steady-state analysis and have carried out a lot of work Based onAmdahlrsquos Law Lee and Kim [15 21] introduced variations ofprocess and workload parallelism into the analyzing modeland optimized the throughput of thermal-aware multicoreprocessors by exploitingDVFS and the per-core power-gating(PCPG) Based on HotSpot Rao et al [16 22] describedan approximate thermal model for homogeneous multicoreprocessors to fast and accurately predict the maximumsteady-state throughput under thermal constraints In thecontext of a hard real-time system of a single-core processorMohaqeqi et al [23] studied stochastic behavior of the systemfor example performance temperature and reliability basedon Markovian view

To the best of our knowledge all previous work of steady-state analysis is based on the assumption that every workloadhas the same thermal contribution to processors resultingin inaccuracy of temperature and performance estimationThis is the focus of our work In this paper the variation ofworkloads is taken into account to model the thermal andfrequency more accurately

3 Thermal Model

In this paper a microarchitecture-level thermal model fora multicore processor is created by replication of a single-core processor based onHotSpot [3]Themulticore processoris divided into four layers that is chip thermal interfacematerial (TIM) heat spreader and heat sink There are 119898

thermal blocks in the chip and TIM and there are five andnine thermal blocks in the heat spreader and heat sinkrespectively Totally there are 119873 = 119899119898 + 14 thermal blocksin the multicore processor where 119899 is the number of cores

The microarchitecture-level thermal model can be repre-sented by the state-space differential equation as follows [16]

119889T119889119905

= AT + BP (1)

where T and P are119873-dimension vectors respectively denot-ing the temperature and power of the multicore processorand A and B are constant matrices of 119873 times 119873 dimensiondepending on the thermal conduction and capacitance of theprocessor

DieCore 1

DieCore 2

TIMCore 1

TIMCore 2

DieCore 1

DieCore 2

TIMCore 1 0 0 0

TIMCore 2

0 0 0 0

Spreader

Spreadercenter

center

Spreaderperipheral

and heat sink

Spreaderperipheral

and heat sink

Gdie-int

Gint-die

Gint-spr

Gspr-int

Figure 1Thermal conductionmatrixG of dual-core processor [16]

When the temperature of the processor is in the steady-state 119889T119889119905 = 0 then

GT = P (2)

where G = minusBminus1A isthe thermal conductance matrix ofHotSpot model

The thermal conductance matrix G can be obtainedfrom the HotSpot simulation tool The thermal conductancematrix of a dual-core processor is as shown in Figure 1 wherethe submatrices Gdie Gint and Gpkg along the diagonal arelateral thermal conductance of the chip TIM and packagerespectively the submatrices Gdie-int and Gint-die are thevertical conductance between die and TIM and the vectorsGint-spr and G1015840spr-int are the vertical conductance between theTIM and the spreader Gdie-int is equal to Gint-die and Gint-spris equal to G1015840spr-int The detailed conductance matrix of Gpkgis as shown in Figure 2

Lemma 1 According to HotSpot thermal model when thetemperature of the multicore processor is in the steady-statethere exist 119898-dimension matrices R Q and Z such that

T119894

= RP119894

+Q119899

119895=1

P119895

+ Z (3)

where T119894

and P119894

are119898-dimension vectors representing respec-tively temperature and power of the 119894th core of the processor

Proof According to the arrangement of elements in theconductance matrix G in HotSpot thermal model (2) can bedecomposed into the following equations

GdieT119894 + Gdie-intTint119894 = P119894

Gdie-intT119894 + GintTint119894 + Gint-spr119879spr = 0 (5)

Spreadercenter

Spreaderperipheral

and heat

Spreader peripheraland heat sink

Gother-spr

Gspr-other

Gother-other

Figure 2 Thermal conduction matrix of the package [16]

G1015840int-spr119899

119895=1

Tint119895 + 119866spr119879spr + G1015840spr-otherTother = 0 (6)

Gspr-other119879spr + Gother-otherTother = Pambient (7)

where Tint119894 is119898-dimension vector representing the tempera-ture of TIM layer of the 119894th core 119879spr is a scalar representingthe temperature of the central block of the spreader andTotheris 13-dimension vector representing the temperature of otherblocks of the spreader and the sink

According to (6) and (7)

119879spr = (G1015840spr-otherGminus1

other-otherGspr-other minus 119866spr)minus1

sdot (G1015840int-spr119899

119894=1

Tint119894 + G1015840spr-otherGminus1

other-otherPambient)

GSPRO = (G1015840spr-otherGminus1

otherGspr-other minus 119866spr)minus1

Gint-spr

119866spamb = (G1015840spr-otherGminus1

sdot G1015840spr-otherGminus1

other-otherPambient

Equation (8) can be converted into

119879spr = G1015840SPRO119899

119894=1

Tint119894 + 119866spamb (10)

Substitute (10) into (5) and get

Gdie-intT119894 + GintTint119894

+ Gint-spr (G1015840SPRO119899

119894=1

Tint119894 + 119866spamb) = 0(11)

According to (4) and (11) get

(Gdie-int minus GintGminus1

die-intGdie)T119894

minus Gint-sprG1015840

SPROGminus1

die-intGdie

119899

119895=1

T119895

+ GintGminus1

die-intP119894

+ Gint-sprG1015840

SPROGminus1

die-int

119899

119895=1

P119895

+ Gint-spr119866spamb

= Gdie-int minus GintGminus1

die-intGdie

= Gint-sprG1015840

SPROGminus1

die-intGdie

= GintGminus1

die-int

= Gint-sprG1015840

SPROGminus1

die-int

= Gint-spr119866spamb

Then (12) can be transformed into

T119894

minus G2

119899

119895=1

T119895

P119894

119899

119895=1

P119895

= 0 (14)

According to (14) get

119899

119895=1

T119895

minus 119899G2

119899

119895=1

T119895

119899

119895=1

P119895

+ 119899G4

119899

119895=1

P119895

+ 119899G5

119899

119895=1

T119895

= (119899G2

minus G1

)minus1

+ 119899G4

119899

119895=1

P119895

+ 119899 (119899G2

minus G1

)minus1G5

Tdie119894

= Gminus11

(119899G2

minus G1

)minus1

+ 119899G4

) minus G4

119899

119895=1

P119895

minus Gminus11

P119894

+ Gminus11

(119899G2

minus G1

)minus1

minus E)G5

R = minusGminus11

Q = Gminus11

(119899G2

minus G1

)minus1

+ 119899G4

) minus G4

Z = Gminus11

(119899G2

minus G1

)minus1

minus E)G5

T119894

= RP119894

+Q119899

119895=1

P119895

+ Z (19)

This means that in the steady-state of temperature of amulticore processor given the powers of all cores the tem-perature of any core can be calculated according to Lemma 1at the microarchitecture level

4 Power Model

It is assumed thatmulticore processors have two power statesactive mode and inactive mode and the power state of eachcore can be set separately And global dynamic frequencyscaling (DFS) technique is used where frequencies of all coresare scaled uniformly

41 Active Mode When a core is in the active mode work-loads are executed and the core dissipates both dynamicpower and leakage power At the microarchitecture level thepower of the active core is defined as

Pa119894 = Pdyna119894 + Pleaa119894 (20)

wherePa119894Pdyna119894 andPleaa119894 are respectively the total powerdynamic power and leakage power of the 119894th active core ofsize119898 times 1

For the processor using DVFS technique the dynamicpower is proportional to the product of the square of thevoltage and the frequency [7] that is Pdyn prop voltage2 timesf requency In this paper the primary purpose is to analyzethe impact of workload variation on the temperatures ofprocessors So in order to simplify the analyzing processesDFS technique rather thanDVFS technique is used tomanagethe temperatures of processor where the voltage is constantand only frequency is adjusted Hence the dynamic powercan be modeled as the linear function of the frequency [1216] In addition the dynamic power caused by workloadexecution has a close linear relationship with IPCs

Let 119883a119894 be the IPC of the 119894th core when workloads arerunning on it and let 119891 be the normalized frequency between0 and 1 and then the microarchitecture-level dynamic powerPdyna119894 of the 119894th active core is defined as

Pdyna119894 = (Gdyn119883a119894 + G1198890

+ Edyn) 119891 (21)

where Gdyn and G1198890

are the linear regression coefficientswhen119891 is set tomaximum 1 andEdyn is the regression residualwhich follows the normal distribution with mean of zeronamely Edyn sim 119873(01205902

119864119889

) Gdyn G1198890 Edyn and 120590119864119889 arevectors of size 119898 times 1 and 1205902

119864119889

is the element-by-elementsquares of 120590

119864119889

At the microarchitecture level in order to simplify the

analyzing procedure the relationship between the leakagepower Pleaa119894 and the temperature Ta119894 of the 119894th active coreis approximated by the linear model as follows

Pleaa119894 = GleaaTa119894 + G1198970a + Eleaa (22)

whereGleaa andG1198970a are the regression coefficients and Eleaais the regression residual which follows the normal distribu-tion with mean of zero namely Eleaa sim 119873(01205902

119864a) Gleaa is adiagonal matrix of size 119898 times 119898 G

1198970a Eleaa 120583119864a and 120590119864a are

vectors of size 119898 times 1 1205902119864a is the element-by-element squares

of 120590119864aAccording to (20) (21) and (22) the power of the active

core Pa119894 is represented by

Pa119894 = (Gdyn119883119894 + G1198890

) 119891 + Edyn + GleaaTa119894 + G1198970a

+ Eleaa(23)

42 Inactive Mode When a core is in the inactive mode itis powered off using power-gating technique The inactivecore only dissipates leakage power which is much lower thanthat of the active core Hence the power of the 119894th inactivecore Pina119894 is the leakage power Pleaina119894 which depends on thecorersquos temperature Tina119894 and it can be approximated by thelinear model at the microarchitecture level as follows

Pina119894 = Pleaina119894 = GleainaTina119894 + G1198970ina + Eleaina (24)

whereGleaina andG1198970ina are regression coefficients andEleainais the regression residuals which follow normal distributionwith mean of zero namely Eleaina sim 119873(01205902

119864ina) Gleaina is adiagonal matrix of size 119898 times 119898 G

1198970ina Eleaina and 120590119864ina arevectors of size119898times1 1205902

119864ina is the element-by-element squaresof 120590119864ina

5 Thermal Analysis

Lemma 2 When the temperature of a multicore processor isin the steady-state the temperatures and the powers of differentinactive cores are same that isT

119894119899119886119894

= T119894119899119886119895

and P119894119899119886119894

= P119894119899119886119895

forforall119894 119895 (119894 = 119895) whereT119894119899119886119894

andP119894119899119886119894

are the temperatures andthe powers of the 119894th inactive core respectively

Proof According to (3) for any inactive cores 119894 and 119895

Tina119894 minus Tina119895 = R (Pina119894 minus Pina119895) (25)

Tina119894 minus Tina119895 = RGleaina (Tina119894 minus Tina119895) (26)

(E minus RGleaina) (Tina119894 minus Tina119895) = 0 (27)

The parameters R and Gleaina are invertible matrices soEminusRGleaina is an invertible matrixTherefore Tina119894 minusTina119895 =0 that is Tina119894 = Tina119895 And Pina119894 = Pina119895 can also beobtained according to (24)

To be convenient set Tina119894 = Tina and Pina119894 = Pina (24)can be simplified into

Pina = GleainaTina + G1198970ina + Eleaina (28)

Theorem 3 Assume that all cores of a processor have thesame hotspot Let H = [ℎ

ℎ119898

] be the selectionvector of the hotspot where only one element of H can beset to 1 and the others are 0 s indicating that the corre-sponding functional unit is hotspot There exist functions1205951

(H 119899 119899119886

119891) 1205952

(H 119899 119899119886

119891) 1205953

(H 119899 119899119886

119891) 1205954

(H 119899 119899119886

119891) 1205955

(H 119899 119899119886

119891) and 1205956

(H 119899 119899119886

119891) such that the hotspot119879ℎ119900119905119886119894

of the 119894th active core can be formulated by

119879hot119886119894 = 1205951

(H 119899 119899119886

119891)119883119886119894

+ 1205952

(H 119899 119899119886

119891)

119899

119886

119896=1

119883119886119896

+ 1205953

(H 119899 119899119886

119891)E119889119910119899

+ 1205954

(H 119899 119899119886

119891)E119897119890119886119886

+ 1205955

(H 119899 119899119886

119891)E119897119890119886119894119899119886

+ 1205956

(H 119899 119899119886

119891)

There exist functions 1205851

(H 119899 119899119886

119891) 1205852

(H 119899 119899119886

119891)1205853

(H 119899 119899119886

119891) 1205854

(H 119899 119899119886

119891) and 1205855

(H 119899 119899119886

119891) such thatthe hotspot 119879

ℎ119900119905119894119899119886

of the inactive core can be formulated by

119879ℎ119900119905119904

= 1205851

(H 119899 119899119886

119891)

119899

119886

119896=1

119883119886119896

+ 1205852

(H 119899 119899119886

119891)E119889119910119899

+ 1205853

(H 119899 119899119886

119891)E119897119890119886119886

+ 1205854

(H 119899 119899119886

119891)E119897119890119886119894119899119886

+ 1205855

(H 119899 119899119886

119891)

Proof According to (3) (23) and (28) the temperature of theinactive core Tina can be derived as

Tina = (E minus (R + (119899 minus 119899a)Q)Gleaina)minus1QGleaa

119899a

119896=1

Ta119896

+ 119891 (E minus (R + (119899 minus 119899a)Q)Gleaina)minus1

sdotQGdyn

119899a

119896=1

119883a119896 + 119891119899a (E

minus (R + (119899 minus 119899a)Q)Gleaina)minus1QEdyn + 119899a (E

minus (R + (119899 minus 119899a)Q)Gleaina)minus1QEleaa + (E

minus (R + (119899 minus 119899a)Q)Gleaina)minus1

(R + (119899 minus 119899a)Q)

sdot Eleaina + (E minus (R + (119899 minus 119899a)Q)Gleaina)minus1

sdot ((R + (119899 minus 119899a)Q)G1198970ina + 119891119899aQG

1198890

+ 119899aQG1198970a

(119899 119899a 119891) = (R + (119899 minus 119899a)Q)G1198970ina + 119891119899aQG

1198890

+ 119899aQG1198970a + Z

(119899 119899a) = (E minus (R + (119899 minus 119899a)Q)Gleaina)minus1

Tina = Φ2

(119899 119899a)QGleaa

119899a

119896=1

Ta119896

+ 119891Φ2

(119899 119899a)QGdyn

119899a

119896=1

119883a119896

+ 119891119899aΦ2 (119899 119899a)QEdyn + 119899aΦ2 (119899 119899a)QEleaa

(119899 119899a) (R + (119899 minus 119899a)Q)Eleaina

(119899 119899a)Φ1 (119899 119899a 119891)

According to (3) (23) and (28) the temperature of the 119894thactive core Ta119894 can be derived as

Ta119894 = (E minus RGleaa)minus1QGleaa

119899a

119896=1

Ta119896 + 119891 (E

minus RGleaa)minus1QGdyn

119899a

119896=1

119883a119896 + 119891 (E minus RGleaa)minus1

sdot RGdyn119883a119894 + (119899 minus 119899a) (E minus RGleaa)minus1QGleainaTina

+ 119891 (E minus RGleaa)minus1

(R + 119899aQ)Edyn + (E

minus RGleaa)minus1

(R + 119899aQ)Eleaa + (119899 minus 119899a) (E

minus RGleaa)minus1QEleaina + (E minus RGleaa)

minus1

(119891RG1198890

+ 119899a119891QG1198890

+ RG1198970a + 119899aQG

1198970a

+ (119899 minus 119899a)QG1198970ina + Z)

(119899 119899a 119891) = 119891RG1198890

+ 119899a119891QG1198890

+ RG1198970a

+ 119899aQG1198970a + (119899 minus 119899a)QG

1198970ina

= (E minus RGleaa)minus1

Ta119894 = Φ4QGleaa

119899a

119896=1

Ta119896 + 119891Φ4

119899a

119896=1

119883a119896

+ 119891Φ4

RGdyn119883a119894 + (119899 minus 119899a)Φ4QGleainaTina

+ 119891Φ4

(R + 119899aQ)Edyn +Φ4

(R + 119899aQ)Eleaa

+ (119899 minus 119899a)Φ4QEleaina +Φ4Φ3 (119899 119899a 119891)

Ta119894 = (Φ4

QGleaa

+ (119899 minus 119899a)Φ4QGleainaΦ2 (119899 119899a)QGleaa)

119899a

119896=1

Ta119896

+ 119891 (Φ4

+ (119899 minus 119899a)Φ4QGleainaΦ2 (119899 119899a)QGdyn)

119899a

119896=1

119883a119896

+ 119891Φ4

RGdyn119883a119894 + 119891 (Φ4

(R + 119899aQ)

+ 119899a (119899 minus 119899a)Φ4QGleainaΦ2 (119899 119899a)Q)Edyn

+ (Φ4

(R + 119899aQ)

+ 119899a (119899 minus 119899a)Φ4QGleainaΦ2 (119899 119899a)Q)Eleaa

+ ((119899 minus 119899a)Φ4Q

+ (119899 minus 119899a)Φ4QGleainaΦ2 (119899 119899a) (R + (119899 minus 119899a)Q))

sdot Eleaina + (119899 minus 119899a)Φ4QGleainaΦ2 (119899 119899a)

sdotΦ1

(119899 119899a 119891) +Φ4

(119899 119899a 119891)

LetΦ5

(119899 119899a) = (Φ4

QGleaa + (119899 minus 119899a)

sdotΦ4

QGleainaΦ2 (119899 119899a)QGleaa)

(119899 119899a 119891) = 119891 (Φ4

QGdyn + (119899 minus 119899a)

sdotΦ4

QGleainaΦ2 (119899 119899a)QGdyn)Φ7 (119891)

= 119891Φ4

(119891) = 119891Φ4

(119899 119899a 119891) = 119891 (Φ4

(R + 119899aQ) + 119899a (119899 minus 119899a)

sdotΦ4

QGleainaΦ2 (119899 119899a)Q)

(119899 119899a) = (Φ4

(R + 119899aQ) + 119899a (119899 minus 119899a)

sdotΦ4

QGleainaΦ2 (119899 119899a)Q)

(119899 119899a) = ((119899 minus 119899a)Φ4Q + (119899 minus 119899a)

sdotΦ4

QGleainaΦ2 (119899 119899a) (R + (119899 minus 119899a)Q))

(119899 119899a 119891) = (119899 minus 119899a)Φ4QGleainaΦ2 (119899 119899a)

sdotΦ1

(119899 119899a 119891) +Φ4

(119899 119899a 119891)

Then (37) is transformed into

Ta119894 = Φ5 (119899 119899a)

119899a

119896=1

Ta119896 +Φ6 (119899 119899a 119891)

119899a

119896=1

119883a119896

(119891)119883a119894 +Φ8 (119899 119899a 119891)Edyn

(119899 119899a)Eleaa +Φ10 (119899 119899a)Eleaina

(119899 119899a 119891)

According to (39) get119899a

119896=1

Ta119896 = (E minus 119899aΦ5 (119899 119899a))minus1

sdot (119899aΦ6 (119899 119899a 119891) +Φ7

(119891))

119899a

119896=1

119883a119896

+ 119899a (E minus 119899aΦ5 (119899 119899a))minus1

(119899 119899a 119891)Edyn

(119899 119899a)Eleaa

(119899 119899a)Eleaina

(119899 119899a 119891)

Ta119894 = Φ7 (119891)119883a119894

+ (Φ5

(119899 119899a) (E minus 119899aΦ5 (119899 119899a))minus1

(119899aΦ6 (119899 119899a 119891)

(119891)) +Φ6

(119899 119899a 119891))

119899a

119896=1

119883a119896 + (119899aΦ5 (119899 119899a)

sdot (E minus 119899aΦ5 (119899 119899a))minus1

(119899 119899a 119891) +Φ8

(119899 119899a 119891))

sdot Edyn + (119899aΦ5 (119899 119899a) (E minus 119899aΦ5 (119899 119899a))minus1

sdotΦ9

(119899 119899a) +Φ9 (119899 119899a))Eleaa + (119899aΦ5 (119899 119899a) (E

minus 119899aΦ5 (119899 119899a))minus1

(119899 119899a) +Φ10 (119899 119899a))Eleaina

+ 119899aΦ5 (119899 119899a) (E minus 119899aΦ5 (119899 119899a))minus1

(119899 119899a 119891)

The hotspot temperature 119879hota119894 of the active core can begiven by

119879hota119894 = HTa119894 = HΦ7

(119891)119883a119894 +H (Φ5

(119899 119899a)

(119899aΦ6 (119899 119899a 119891) +Φ7

(119891))

(119899 119899a 119891))

119899a

119896=1

119883a119896 +H (119899aΦ5 (119899 119899a)

(119899 119899a 119891)

(119899 119899a 119891))Edyn +H (119899aΦ5 (119899 119899a)

(119899 119899a) +Φ9 (119899 119899a))Eleaa

+H (119899aΦ5 (119899 119899a) (E minus 119899aΦ5 (119899 119899a))minus1

(119899 119899a)

(119899 119899a))Eleaina +H (119899aΦ5 (119899 119899a)

(119899 119899a 119891)

(119899 119899a 119891))

Let1205951

(H 119891) = HΦ7

(119891)

1205952

(H 119899 119899a 119891) = H (Φ5

sdot (119899aΦ6 (119899 119899a 119891) +Φ7

(119891)) +Φ6

(119899 119899a 119891))

1205953

(H 119899 119899a 119891) = H (119899aΦ5 (119899 119899a)

(119899 119899a 119891)

(119899 119899a 119891))

1205954

(H 119899 119899a) = H (119899aΦ5 (119899 119899a) (E minus 119899aΦ5 (119899 119899a))minus1

sdotΦ9

(119899 119899a) +Φ9 (119899 119899a))

1205955

(H 119899 119899a 119891) = H (119899aΦ5 (119899 119899a)

(119899 119899a) +Φ10 (119899 119899a))

1205956

(H 119899 119899a 119891) = H (119899aΦ5 (119899 119899a)

(119899 119899a 119891)

(119899 119899a 119891))

119879hota119894 = 1205951

(H 119891)119883a119894 + 1205952

(H 119899 119899a 119891)

119899a

119896=1

119883a119896

+ 1205953

(H 119899 119899a 119891)Edyn + 1205954

(H 119899 119899a)Eleaa

+ 1205955

(H 119899 119899a 119891)Eleaina + 1205956

(H 119899 119899a 119891)

Tina = (Φ2

(119899 119899a)QGleaa (E minus 119899aΦ5 (119899 119899a))minus1

sdot (119899aΦ6 (119899 119899a 119891) +Φ7

(119891)) + 119891Φ2

(119899 119899a)QGdyn)

119899a

119896=1

119883a119896 + (119899aΦ2 (119899 119899a)

sdotQGleaa (E minus 119899aΦ5 (119899 119899a))minus1

(119899 119899a 119891)

+ 119891119899aΦ2 (119899 119899a)Q)Edyn + (119899aΦ2 (119899 119899a)

(119899 119899a)

+ 119899aΦ2 (119899 119899a)Q)Eleaa + (119899aΦ2 (119899 119899a)

(119899 119899a)

(119899 119899a) (R + (119899 minus 119899a)Q))Eleaina

+ 119899aΦ2 (119899 119899a)QGleaa (E minus 119899aΦ5 (119899 119899a))minus1

sdotΦ11

(119899 119899a 119891) +Φ2

(119899 119899a)Φ1 (119899 119899a 119891)

The hotspot temperature 119879hotina of the inactive core isgiven by

119879hotina = HTina = H (Φ2

(119899 119899a)

sdot (119899aΦ6 (119899 119899a 119891) +Φ7

(119891)) + 119891Φ2

(119899 119899a)QGdyn)

119899a

119896=1

119883a119896 +H (119899aΦ2 (119899 119899a)

(119899 119899a 119891)

+ 119891119899aΦ2 (119899 119899a)Q)Edyn +H (119899aΦ2 (119899 119899a)

(119899 119899a)

+ 119899aΦ2 (119899 119899a)Q)Eleaa +H (119899aΦ2 (119899 119899a)

(119899 119899a)

+H (119899aΦ2 (119899 119899a)QGleaa (E minus 119899aΦ5 (119899 119899a))minus1

sdotΦ11

(119899 119899a 119891) +Φ2

(119899 119899a)Φ1 (119899 119899a 119891))

Let1205851

(H 119899 119899a 119891)

= H (Φ2

(119899aΦ6 (119899

119899a 119891) +Φ7

(119891)) + 119891Φ2

(119899 119899a)QGdyn)

1205852

(H 119899 119899a 119891) = H (119899aΦ2 (119899 119899a)QGleaa (E minus 119899aΦ5 (119899

119899a))minus1

(119899 119899a 119891) + 119891119899aΦ2 (119899 119899a)Q)

1205853

(H 119899 119899a) = H (119899aΦ2 (119899 119899a)QGleaa (E minus 119899aΦ5 (119899

119899a))minus1

(119899 119899a) + 119899aΦ2 (119899 119899a)Q)

1205854

119899a))minus1

(119899 119899a) +Φ2 (119899 119899a) (R + (119899 minus 119899a)Q))

1205855

119899a))minus1

(119899 119899a 119891) +Φ2

(119899 119899a)Φ1 (119899 119899a 119891))

119879hot119904 = 1205851 (H 119899 119899a 119891)

119899a

119896=1

119883a119896 + 1205852 (H 119899 119899a 119891)Edyn

+ 1205853

(H 119899 119899a 119891)Eleaa

+ 1205854

(H 119899 119899a 119891)Eleaina + 1205855 (H 119899 119899a 119891)

According toTheorem 3 it can be known that the hotspottemperature of any core can be expressed as the linearfunction of IPCs of all cores

Theorem 4 Suppose that (a) the random variable 119883119886119894

forthe IPC follows normal distribution with the mean 120583

119883

andthe variance 120590

119883

that is 119883119886119894

sim 119873(120583119883

1205902

119883

) (b) the tasksrunning on different cores are mutually independent that is119883119886119894

and 119883119886119895

are independent for 1 le forall119894 119895 le 119898 and (c)workload balancing techniques are used in the processor suchthat 119883

119886119894

and 119883119886119895

follow the same normal distribution Thenthere exist functions 120583

119879119886

(H 119899 119899119886

119891) and 120590119879119886

(H 119899 119899119886

119891)such that the hotspot temperature 119879

ℎ119900119905119886119894

of the active corefollows the normal distributionwithmean120583

119879119886

(H 119899 119899119886

119891) andvariance 120590

119879119886

(H 119899 119899119886

119891) that is

119879ℎ119900119905119886119894

sim 119873(120583119879119886

(H 119899 119899119886

119891) 120590119879119886

(H 119899 119899119886

119891)) (49)

(H 119899 119899119886

119891) and 120590119879119894119899119886

(H 119899 119899119886

ℎ119900119905119894119899119886

of the inactive corefollows the normal distribution with mean 120583

119879119894119899119886

(H 119899 119899119886

119891)

and variance 120590119879119894119899119886

(H 119899 119899119886

119891) that is

119879ℎ119900119905119894119899119886

sim 119873(120583119879119894119899119886

(H 119899 119899119886

119891) 120590119879119894119899119886

(H 119899 119899119886

119891))

Proof According to (29) get

119879hota119894 = (1205951

(H 119899 119899a 119891) + 1205952

(H 119899 119899a 119891))119883a119894

+ 1205952

(H 119899 119899a 119891)

119899a

119896=1

119896 =119894

119883a119896

+ 1205953

(H 119899 119899a 119891)Edyn

+ 1205954

(H 119899 119899a 119891)Eleaa

+ 1205955

(H 119899 119899a 119891)Eleaina + 1205956

(H 119899 119899a 119891)

The norm-distributed random variables 119883a119894 and 119883a119895 areindependent in the case of 1 le forall119894 119895 le 119898 so that thelinear combination (120595

(H 119899 119899a 119891) + 1205952

(H 119899 119899a 119891))119883a119894 +

1205952

(H 119899 119899a 119891)sum119899a119896=1119896 =119894

119883a119896 of 119883a119894 still follows the normaldistribution

All elements Edyn119896 in the random vector Edyn follow nor-mal distribution and are independent so that the linear com-bination 120595

(H 119899 119899a 119891)Edyn of all elements in Edyn followsnormal distribution In the sameway120595

(H 119899 119899a 119891)Eleaa and1205955

(H 119899 119899a 119891)Eleaina also follow normal distribution(1205951

(H 119899 119899a 119891) + 1205952

(H 119899 119899a 119891))119883a119894 1205952

(H 119899 119899a119891)sum119899a119896=1119896 =119894

119883a119896 1205953

(H 119899 119899a 119891)Edyn 1205954

(H 119899 119899a 119891)Eleaaand 120595

(H 119899 119899a 119891)Eleaina all follow normal distribution andare mutually independent so that 119879hota119894 follows normal

distribution where the mean 120583119879ina(H 119899 119899a 119891) is calculated

120583119879a (H 119899 119899a 119891)

= (1205951

(H 119891) + 119899a1205952

(H 119899 119899a 119891)) 120583119883

+ 1205953

(H 119899 119899a 119891)120583119864119889

+ 1205954

(H 119899 119899a)120583119864a

+ 1205955

(H 119899 119899a 119891)120583119864ina + 1205956 (H 119899 119899a 119891)

and the variance 120590119879ina2

(H 119899 119899a 119891) is calculated by

120590119879a2

(H 119899 119899a 119891)

= (1205951

(H 119891) + 1205952

(H 119899 119899a 119891))2

1205902

119883

+ (119899a minus 1)1205952

(H 119899 119899a 119891) 1205902

119883

+ 1205953

(H 119899 119899a 119891)1205902

119864119889

+ 1205954

(H 119899 119899a)1205902

119864a

+ 1205955

(H 119899 119899a 119891)1205902

119864ina

In the same way 1205851

(H 119899 119899a 119891)sum119899a119896=1

119883a119896 1205852(H 119899 119899a119891)Edyn 1205853(H 119899 119899a 119891)Eleaa and 1205854(H 119899 119899a 119891)Eleaina in (30)all follow the normal distribution and are mutually indepen-dent so that 119879hot119904 follows the normal distribution where themean 120583

119879119904

120583119879119904

(H 119899 119899a 119891) = 119899a1205851 (H 119899 119899a 119891) 120583119883

+ 1205852

(H 119899 119899a 119891)120583119864119889

+ 1205853

(H 119899 119899a)120583119864a

+ 1205854

(H 119899 119899a 119891)120583119864ina

+ 1205855

(H 119899 119899a 119891)

and the variation 120590119879119904

120590119879119904

(H 119899 119899a 119891) = 119899a12058512

(H 119899 119899a 119891) 1205902

119883

+ 1205852

(H 119899 119899a 119891)1205902

119864119889

+ 1205853

(H 119899 119899a)1205902

119864a

+ 1205854

(H 119899 119899a 119891)1205902

119864ina

According toTheorem 4 it can be known that the hotspottemperature of any core follows the normal probabilisticdistribution

6 Frequency Analysis

The zero-slack policy is used as the DTM strategy of proces-sors that is the speed of processor is set to a value whichmakes the temperature of the hotspot be the threshold [7]However the frequencies are discrete in this workThereforein most cases there is no frequency in the set of frequenciesmaking the hotspot temperature be the threshold exactly

Theorem 5 Let 119865119878119890119905 = 1198911

119891120587

119891120587+1

119891119891119888119900119906119899119905

be the set of frequencies of multicore processors where1198911

lt sdot sdot sdot lt 119891120587

lt 119891120587+1

then the probabilisticdistribution of the frequency f follows

119875 119891 = 119891120587

119875 119879ℎ119900119905119886

(119899119886

119891120587

) lt 119879V119886119897V119890 119891120587

= 119891119888119900119906119899119905

119875 119879ℎ119900119905119886

(119899119886

119891120587

) le 119879V119886119897V119890 minus 119875 119879ℎ119900119905119886

(119899119886

119891120587+1

) le 119879V119886119897V119890 119891120587

lt 119891119888119900119906119899119905

where119879ℎ119900119905119886

(119899119886

119891120587

) is the function of the number of active cores119899119886

and the frequency 119891120587

representing the hotspot temperatureof the active core and 119879V119886119897V119890 is the temperature threshold of theprocessor

Proof When 119891120587

= 119891count according to the zero-slack DTMpolicy obviously

119875 119891 = 119891120587

= 119875 119879hota (119899a 119891120587) le 119879valve (57)

When 119891120587

lt 119891count then the probabilities of 119891 = 119891120587

can bebroken into two cases

(a) if 119879hota(119899a 119891120587) gt 119879hota(119899a 119891120587+1) according to thezero-slack DTM strategy the probability of 119891 = 119891

120587

is 0 that is

119875 119891 = 119891120587

| 119879hota (119899a 119891120587) gt 119879hota (119899a 119891120587+1) = 0 (58)

(b) if 119879hota(119899a 119891120587) le 119879hota(119899a 119891120587+1) the probability of119891 = 119891

120587

is given by

119875 119891 = 119891120587

| 119879hota (119899a 119891120587) le 119879hota (119899a 119891120587+1)

= 119875 119879hota (119899a 119891120587) le 119879valve 119879hota (119899a 119891120587+1)

gt 119879valve | 119879hota (119899a 119891120587) le 119879hota (119899a 119891120587+1)

The right-hand side of (59) can be derived by

119875 119879hota (119899a 119891120587) le 119879valve 119879hota (119899a 119891120587+1)

= 119875 119879hota (119899a 119891120587) le 119879valve minus 119875 119879hota (119899a 119891120587)

le 119879valve 119879hota (119899a 119891120587+1) le 119879valve | 119879hota (119899a 119891120587)

le 119879hota (119899a 119891120587+1)

le 119879valve | 119879hota (119899a 119891120587) le 119879hota (119899a 119891120587+1)

= 119875 119879hota (119899a 119891120587+1) le 119879valve

119875 119891119879hota = (119899a 119891120587) 119891120587 le 119879hota (119899a 119891120587+1)

= 119875 119879hota (119899a 119891120587) le 119879valve

minus 119875 119879hota (119899a 119891120587+1) le 119879valve

Hence when 119891120587

lt 119891count according to (58) and (61) get

119875 119891 = 119891120587

= 119875 119879hota (119899a 119891120587) le 119879valve

minus 119875 119879hota (119899a 119891120587+1) le 119879valve (62)

Therefore according to Theorem 5 the probabilistic dis-tribution of the set of frequencies can be obtained based onthe assumption that the zero-slack policy is used

Given the probabilistic distribution of the frequency andthe mean of hotspot temperature of the active core for acertain frequency the average hotspot temperature of theactive core 119879

averhota can be obtained by

119879averhota = sum

119891isin119865Set120583119879a (H 119899 119899a 119891) sdot 119875 (119891) (63)

where 119865Set = 1198911

119891120587

119891120587+1

119891119891count denotes the set

of frequencies of multicore processors and 119875(119891) denotes theprobability that the frequency is 119891 120583

119879a(H 119899 119899a 119891) denotesthe mean of hotspot temperature of the active core given thefrequency 119891

7 Experimental Results

71 Experimental Methodology Amulticore version of Alpha21264 processor is used as the processor model in ourexperiment [24] and there are eight cores in the processorThe cores have two working states active state and inac-tive state and the working state of each core can be setseparately The processor employs a global DFS techniquewhich means that frequencies of all cores in the processorare scaled uniformly There are four discrete frequenciesused by the processor that is 15 GHz 2GHz 25 GHz and3GHz To facilitate analysis the frequencies are normalizedinto the interval [0 1] so that the maximum frequency isnormalized to 1 After normalization the set of frequenciesis 05 067 083 1 According to our previous work thehotspot of Alpha 21264 processor is the branch predictor[18] So the second element of the hotspot selection vectorH corresponding to the branch predictor is set to 1 and theother elements are set to 0 sThe thermal threshold that is themaximum temperature allowed by processor is set to 100∘C

TheHotSpot is used as the thermalmodel of themulticoreprocessor and the parameters such as thermal conductanceand capacitance are set to default values of HotSpot sim-ulation tool [3] In order to construct the linear model ofdynamic power PTScalar is modified to obtain both the

dynamic power profiles of each functional unit in Alpha21264 processor and the IPC profiles [25] The parameters ofPTScalar are set to default values as well The mean 120583

119883

andvariance 120590

119883

of norm-distributed 119883a119894 are determined usingIPC profiles based on the maximum likelihood estimationSome representative tasks such as mesa ammp quake bzipmcfmath and qsort fromMiBench [26] and SPECCPU2000[27] are selected as the benchmarks These selected tasks aremutually independent that is no task takes precedence overthe others so the tasks can be parallel executed at the tasklevel In addition workload balancing techniques are used inthe processor A task is not fixed on a core and the tasks canbe migrated among all cores such that the IPCs of differentcores are equal The simple linear regression analysis is usedto determine the coefficients Gdyn and G

1198890

in (21) and thevariance 1205902

119864119889

of regression residuals Edyn is obtainedThe leakage power is the nonlinear monotonic increasing

function of temperature which is given by [25]

119875119897

= 120572 sdot 1198792

sdot 119890120574119879

+ 120573 (64)

where 120572 120573 and 120574 are parameters which depend on topologysize technology and design of processors In order toconstruct the linear model of leakage power (64) is regressedlinearly to determine the coefficients Gleaa and G

1198970a in (22)and the coefficients Gleaina and G

1198970ina in (24) as well asthe variance 1205902

119864a and 1205902

119864ina of regression residuals Eleaa andEleaina The parameters 120572 120573 and 120574 are set to the defaultvalues of PTScalar The smaller the range of temperaturethe higher the linear correlation between leakage power andtemperature [7] The temperature of a processor using DTMtechniques does not exceed the maximum value and thelower temperature has no impact on the design optimizationof thermal-aware processors Therefore the linear regressionanalysis of leakage powers is performed at the temperatureinterval between 60∘C and 100∘C and the regression resultsare used to estimate the leakage power at the whole tempera-ture interval

72 Estimated Accuracy For the hotspot of the processorthat is the branch predictor Figure 3 shows the comparisonbetween the actual value and the estimated value of leakagepower for active cores and Figure 4 shows that for inactivecores A higher estimation accuracy of leakage power isobtained at the temperature interval between 60∘C and 100∘Cat the cost of lower accuracy at the other intervals

After regression analysis the dynamic and leakage powercan be estimated with the linear model in (21) (22) and(24) Figure 5 shows the estimation error rate of the dynamicpower and that of the leakage power for active cores andinactive cores in thermal range between 60∘C and 100∘C Itcan be seen that the error rates of the dynamic powers fordifferent functional units have significant variations For thedecoder the estimation error rate of dynamic power is only262 but 1014 for the floating point register (FPReg) Thereason for this fact is that the linear correlations betweenthe IPC and the dynamic powers for various functionalunits are different Lower error rate results from highercorrelation Obviously the IPC and the dynamic power of

Actual leakage powerEstimated leakage power

60 80 100 12040Temperature (∘C)

minus02

Figure 3 Leakage power estimation of hotspot for active cores

times10minus4

60 80 100 12040Temperature (∘C)

minus2

Figure 4 Leakage power estimation of hotspot for inactive cores

the decoder have the highest linear correlation while theIPC and that of FPReg have the lowest linear correlationIt can also be seen from Figure 5 that the estimation errorrates of leakage power for both active cores and inactivecores are similar The estimation error rates of leakage powerfor active cores are between 315 and 326 and those forinactive cores are between 306 and 591 This is becausethe temperature and the leakage power of various functionalunits have similar linear correlations In order to consider the

Table 1 Means (120583) and standard deviations (120590) of hotspot temperature distribution for active cores

Number of active cores Frequency of 05 Frequency of 067 Frequency of 083 Frequency of 1120583 (∘C) 120590 120583 (∘C) 120590 120583 (∘C) 120590 120583 (∘C) 120590

6 5193 403 6125 540 7002 669 7935 8067 5920 425 6973 569 7964 705 9017 8508 6669 448 7846 600 8954 743 10131 895

Table 2 Means (120583) and standard deviations (120590) of hotspot temperature distribution for inactive cores

6 4059 171 4708 229 5319 284 5968 3427 4778 195 5546 261 6269 323 7037 389

Dynamic power

LB IL1

DTB DL1 L2

Leakage power for active coreLeakage power for inactive core

Figure 5 Error rate of the power estimation

impact of estimation errors on the analysis of temperatureand frequency the error terms Edyn Eleaa and Eleaina areintroduced into the linear models of dynamic power andleakage power as expressed in (21) (22) and (24)

73 Probabilistic Distribution of Temperature Based on theassumption that no DTM techniques are used to controlthe temperature of processors according to Theorem 4the probabilistic distribution of the hotspot temperaturesfor both active cores and inactive cores can be obtainedTable 1 presents the means and the standard deviations ofthe probabilistic distribution of the hotspot temperature foractive cores and the corresponding probabilistic densitycurves are as shown in Figure 6 It can be seen that thehotspot temperature of processors is not deterministic andhas significant variations for a certain number of active coresand a certain frequency The number of active cores and therunning frequency simultaneously determine the range inwhich the temperature lies and the probabilistic distributionof the hotspot temperature For the same running frequencymore active cores will yield higher temperature and viceversa For the same number of active cores higher frequencywill bring higher temperature and vice versa Accordingto the characteristics of normal distribution curve it can

be known that the shape of probabilistic density curvecorresponds to the variations of data distribution dependingon the standard deviation of random variables The curvewith a higher peak implies a smaller standard deviationthat is a lower variation of data distribution whereas thecurve with a lower peak implies a bigger standard deviationthat is a larger variation it can be seen from Figure 6 thatthe probabilistic density curves corresponding to variousfrequencies have different peaks This observation impliesthat the degree of temperature variation has close correlationwith working frequency and higher frequency will yieldhigher variation of temperature

Table 2 presents the means and standard deviations ofthe probabilistic distribution of the hotspot temperature forinactive cores and the corresponding probabilistic densitycurves are as shown in Figure 7 When the number of activecores is eight inactive core does not exist so there is nocase where the number of active cores is eight in Table 2and Figure 7 The effect of the frequency and the number ofactive cores at the hotspot temperature for inactive cores is thesame as that for active cores except that the mean value andvariation of hotspot temperature of inactive cores are lowerthan those of active cores under the same frequency and thenumber of active cores

74 Probabilistic Distribution of Frequencies If the power-gating and the DFS techniques are simultaneously usedto manage the temperature of a processor according toTheorem 5 the probabilistic distribution of working frequen-cies can be determined Figure 8 presents the probabilisticdistribution of frequencies when the number of active cores issix seven and eight respectively When the frequency is lessthan 1 it is implied that the hotspot temperature surpasses thethreshold and the DFS technique is triggered to reduce thefrequency of the processor So the probability for triggeringDFS can be obtained from the probabilistic distribution offrequencies Figure 9 presents the probability for triggeringDFS when the number of active cores is six seven and eightrespectively

Frequency is 05Frequency is 067

40 60 80 100 120 14020Temperature (∘C)

yFrequency is 05Frequency is 067

40 60 80 100 120 14020Temperature (∘C)

Figure 6 The probability density of the hotspot temperature for active cores (a) The number of active cores is six (b) The number of activecores is seven (c) The number of active cores is eight

If a core is powered off and made inactive using thepower-gating technique it only dissipates the leakage powerwhich is much less than that of an active core so the powerdissipated by the processor is reduced significantlyWhen thenumber of active cores is less than six that is more thantwo cores are powered off the decreased power makes it

enough for the rest of active cores of a processor to executeat the full speed and the DFS is not necessary to be triggeredTherefore when the number of active cores is less than six thefrequency of the processor is constantly 1 and the probabilityfor triggering DFS is constantly 0 This situation is not givenin Figures 8 and 9

Table 3 Comparisons of temperatures between active cores with and without DFS

Number of active cores Average hotspot temperature (∘C) Probability for exceedingtemperature threshold ()

With DFS Without DFS With DFS Without DFS6 7930 7935 0 0527 8885 9017 0 12378 9385 10131 0 5584

40 60 80 100 120 14020Temperature (∘C)

Figure 7The probability density of the hotspot temperature for inactive cores (a)The number of active cores is six (b)The number of activecores is seven

It can be seen that various numbers of active cores resultin different probabilistic distributions of frequencies anddifferent probabilities for triggering DFS When all cores arepowered on the probability that the processor runs at themaximum frequency is only 4416 and the probability fortriggering DFS is 5584This means that all cores will run atthe full speed only when the IPCs of tasks are less If the IPCsincrease to an extent the running frequency will be scaleddown to control the temperature under the threshold Asthe number of active cores decreases the probability that theprocessor runs at the maximum frequency increases and theprobability for triggering DFS decreases When the numberof active cores is less than six no matter what the IPCs arethe saved power by shutting off more than two cores makes itdeterministic for the active cores to run at the full speed sothe probability that the processor runs at the maximum fre-quency is 100 and the probability for triggering DFS is 0

75 Comparisons of Temperatures with and without DFS IftheDFS technique is not used then the processor always runs

at the full speed that is the running frequency is constantly 1So the average hotspot temperature of the active core withoutthe DFS can be obtained according to (52) If both the power-gating and DFS techniques are used simultaneously for thedynamic thermal management then the running frequencycan be scaled to control the temperature of the processorunder the thermal threshold According to (63) the averagehotspot temperature of the active core with the DFS canbe obtained In terms of the average hotspot temperatureand the probability that the hotspot temperature exceeds thethreshold Table 3 presents the comparative results betweenthe active cores with and without the DFS when the numberof active cores is 6 7 and 8

If the DFS technique is used by the processor the runningfrequency will be scaled down to reduce the temperatureonce the hotspot temperature reaches the threshold So thehotspot temperature will not exceed the threshold that isthe probability that the hotspot temperature exceeds thethreshold is 0 If the DFS technique is not used by the pro-cessor the running frequency is always the maximum and

Number of active cores

Frequency is 1 Frequency is 083Frequency is 067 Frequency is 05

Figure 8 Probabilistic distribution of frequencies

6 7 8Number of active cores

r trig

Figure 9 Probability for triggering DFS

the hotspot temperature is possible to exceed the thresholdthat is the probability that the hotspot temperature exceedsthe threshold is larger than 0Therefore the average hotspottemperature of the processor with the DFS is lower than thatwithout the DFS

As the number of active cores decreases the saved powerby shutting off cores makes it more possible for the activecores to execute at the full speed and the effect of the DFSon cooling down the processor weakens until it disappearsTherefore for the processors with and without the DFS the

average hotspot temperatures become closer as the number ofactive cores decreases as shown in Table 3 Even though theDFS is not used the probability that the hotspot temperatureexceeds the threshold will reduce until 0 when more coresare powered off

When the number of active cores is lower than 6 that ismore than 2 cores are powered off no matter what speed theprocessor runs at the hotspot temperature will not exceedthe threshold so all active cores are not necessary to triggerthe DFS for managing the temperature of the processor andcan run at the maximum frequency Therefore when thenumber of active cores is lower than 6 the probability thatthe hotspot temperature exceeds the threshold is 0 andthe average hotspot temperatures of the processor with andwithout theDFS are sameThere is no differencewith theDFSandwithout theDFSwhen the number of active cores is lowerthan 6 so the comparisons in this situation are not given inTable 3

8 Conclusions

In this paper a probabilistic analysis method of the tem-perature and frequency of multicore processors is presentedtaking the variation of workloads into account It is provedtheoretically in this paper that (1) the hotspot temperaturesof both active cores and inactive cores are the linear functionsof the IPC (2) the hotspot temperature follows the normalprobabilistic distribution based on the assumption that IPCsof all cores follow the same normal distribution and (3) therunning frequency follows a probabilistic distribution

From the experimental results it can be seen that (1) theestimation error rates of the dynamic powers for differentfunctional units have significant variations indicating thatthe linear correlations between the IPC and the dynamicpowers for various functional units are different (2) theestimation error rates of leakage powers for both activecores and inactive cores are similar showing similar linearcorrelations between temperature and leakage power acrossvarious functional units (3) a higher estimation accuracy ofleakage powers can be obtained at the temperature intervalbetween 60∘C and 100∘C at the cost of lower accuracy atother intervals (4) the hotspot temperature of the proces-sor is not deterministic and has significant variation fora certain number of active cores and a certain frequencyand the number of active cores and the running frequencydetermine simultaneously the probabilistic distribution ofhotspot temperature and (5) various numbers of active coresresult in different probabilistic distributions of frequenciesand different probabilities for triggering DFS

Competing Interests

The authors declare no competing interests

Authorsrsquo Contributions

Biying Zhang performed the theoretical analysis and wrotethe major part of this paper Gang Cui and Zhongchuan

Fu are the supervisor and the cosupervisor of Biying Zhangand his current PhD work Hongsong Chen performed theexperiments

Acknowledgments

This work is supported by the project under Grant no SGS-DWH00YXJS1500229 Beijing Natural Science Foundation(4142034) and Fundamental Research Funds for the CentralUniversities (no FRF-BR-15-056A)

References

[1] J Kong S W Chung and K Skadron ldquoRecent thermalmanagement techniques formicroprocessorsrdquoACMComputingSurveys vol 44 no 3 article 13 2012

[2] K Skadron M R Stan K Sankaranarayanan W Huang SVelusamy and D Tarjan ldquoTemperature-aware microarchitec-ture modeling and implementationrdquo ACM Transactions onArchitecture and Code Optimization vol 1 no 1 pp 94ndash1252004

[3] W Huang S Ghosh S Velusamy K Sankaranarayanan KSkadron and M R Stan ldquoHotSpot A compact thermalmodeling methodology for early-stage VLSI designrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol14 no 5 pp 501ndash513 2006

[4] D Li S X-D Tan EH Pacheco andMTirumala ldquoParameter-ized architecture-level dynamic thermal models for multicoremicroprocessorsrdquo ACM Transactions on Design Automation ofElectronic Systems vol 15 no 2 article 16 2010

[5] H Wang S X-D Tan D Li A Gupta and Y Yuan ldquoCom-posable thermal modeling and simulation for architecture-level thermal designs of multicore microprocessorsrdquo ACMTransactions on Design Automation of Electronic Systems vol18 no 2 article no 28 2013

[6] V Hanumaiah and S Vrudhula ldquoTemperature-aware DVFSfor hard real-time applications on multicore processorsrdquo IEEETransactions on Computers vol 61 no 10 pp 1484ndash1494 2012

[7] V Hanumaiah S Vrudhula and K S Chatha ldquoPerformanceoptimal online DVFS and task migration techniques for ther-mally constrained multi-core processorsrdquo IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems vol30 no 11 pp 1677ndash1690 2011

[8] W S Lawrence and P R Kumar ldquoImproving the performanceof thermally constrained multi core processors using DTMtechniquesrdquo in Proceedings of the International Conference onInformation Communication and Embedded Systems (ICICESrsquo13) pp 1035ndash1040 IEEE Chennai India February 2013

[9] K Chen E Chang H Li and A Wu ldquoRC-based temperatureprediction scheme for proactive dynamic thermal managementin throttle-based 3D NoCsrdquo IEEE Transactions on Parallel andDistributed Systems vol 26 no 1 pp 206ndash218 2015

[10] B Wojciechowski K S Berezowski P Patronik and J BiernatldquoFast and accurate thermal modeling and simulation of many-core processors and workloadsrdquo Microelectronics Journal vol44 no 11 pp 986ndash993 2013

[11] H B Jang J Choi I Yoon et al ldquoExploiting applicationsystem-dependent ambient temperature for accurate microarchitec-tural simulationrdquo IEEE Transactions on Computers vol 62 no4 pp 705ndash715 2013

[12] B Shi Y Zhang and A Srivastava ldquoDynamic thermal manage-ment under soft thermal constraintsrdquo IEEETransactions onVeryLarge Scale Integration (VLSI) Systems vol 21 no 11 pp 2045ndash2054 2013

[13] Z Liu T Xu S X-D Tan and H Wang ldquoDynamic thermalmanagement for multi-core microprocessors considering tran-sient thermal effectsrdquo in Proceedings of the 18th Asia and SouthPacific Design Automation Conference (ASP-DAC rsquo13) pp 473ndash478 Yokohama Japan January 2013

[14] D Das P P Chakrabarti and R Kumar ldquoThermal analysisof multiprocessor SoC applications by simulation and verifi-cationrdquo ACM Transactions on Design Automation of ElectronicSystems vol 15 no 2 article 15 2010

[15] J Lee and N S Kim ldquoAnalyzing potential throughput improve-ment of power- and thermal-constrained multicore processorsby exploiting DVFS and PCPGrdquo IEEE Transactions on VeryLarge Scale Integration (VLSI) Systems vol 20 no 2 pp 225ndash235 2012

[16] R Rao and S Vrudhula ldquoFast and accurate prediction of thesteady-state throughput of multicore processors under thermalconstraintsrdquo IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems vol 28 no 10 pp 1559ndash15722009

[17] C-Y Cher and E Kursun ldquoExploring the effects of on-chipthermal variation on high-performance multicore architec-turesrdquo ACM Transactions on Architecture and Code Optimiza-tion vol 8 no 1 article 2 2011

[18] B Zhang and G Cui ldquoPower estimation for alpha 21264using performance events and impact of ambient temperaturerdquoInternational Journal of Control and Automation vol 7 no 6pp 159ndash168 2014

[19] V Hanumaiah S Vrudhula and K S Chatha ldquoMaximizingperformance of thermally constrainedmulti-core processors bydynamic voltage and frequency controlrdquo in Proceedings of theIEEEACM International Conference on Computer-Aided Design(ICCAD rsquo09) pp 310ndash313 San Jose Calif USANovember 2009

[20] B Wojciechowski K S Berezowski P Patronik and J Bier-nat ldquoFast and accurate thermal simulation and modelling ofworkloads of many-core processorsrdquo in Proceedings of 17thInternational Workshop on Thermal Investigations of ICs andSystems (THERMINIC rsquo11) September 2011

[21] J Lee and N S Kim ldquoOptimizing throughput of power- andthermal-constrainedmulticore processors usingDVFS and per-core power-gatingrdquo inProceedings of the 46thACMIEEEDesignAutomation Conference (DAC rsquo09) pp 47ndash50 San FranciscoCalif USA July 2009

[22] R Rao S Vrudhula and C Chakrabarti ldquoThroughput of multi-core processors under thermal constraintsrdquo inProceedings of theInternational Symposium on Low Power Electronics and Design(ISLPED rsquo07) pp 201ndash206 Portland Ore USA August 2007

[23] M Mohaqeqi M Kargahi and A Movaghar ldquoAnalyticalleakage-aware thermal modeling of a real-time systemrdquo IEEETransactions on Computers vol 63 no 6 pp 1377ndash1391 2014

[24] R E Kessler ldquoThe alpha 21264 microprocessorrdquo IEEE Microvol 19 no 2 pp 24ndash36 1999

[25] W Liao L He and K M Lepak ldquoTemperature and supplyvoltage aware performance and power modeling at microarchi-tecture levelrdquo IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems vol 24 no 7 pp 1042ndash10532005

[26] M R Guthaus J S Ringenberg D Ernst T M Austin TMudge and R B Brown ldquoMiBench a free commercially

representative embedded benchmark suiterdquo in Proceedings ofthe IEEE International Workshop of Workload Characterization(WWC rsquo01) Washington DC USA 2001

[27] J L Henning ldquoSPEC CPU2000 measuring CPU performancein the new millenniumrdquo Computer vol 33 no 7 pp 28ndash352000

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

OptimizationJournal of

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Operations ResearchAdvances in

Journal of

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Decision SciencesAdvances in

Discrete MathematicsJournal of

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of