array allocation taking into account sdram characteristics

22
Array Allocation Taking into Account SDRAM Characteristics Hong-Kai Chang Youn-Long Lin Department of Computer Science National Tsing Hua University HsinChu, Taiwan, R.O.C.

Upload: vance

Post on 19-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Array Allocation Taking into Account SDRAM Characteristics. Hong-Kai Chang Youn-Long Lin Department of Computer Science National Tsing Hua University HsinChu, Taiwan, R.O.C. Outline. Introduction Related Work Motivation Solving Problem Proposed Algorithms Experimental Results - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Array Allocation Taking into Account SDRAM Characteristics

Array Allocation Taking into Account SDRAM Characteristics

Hong-Kai ChangYoun-Long LinDepartment of Computer ScienceNational Tsing Hua UniversityHsinChu, Taiwan, R.O.C.

Page 2: Array Allocation Taking into Account SDRAM Characteristics

2

Outline

IntroductionRelated WorkMotivationSolving ProblemProposed AlgorithmsExperimental ResultsConclusions and Future Work

Page 3: Array Allocation Taking into Account SDRAM Characteristics

3

Introduction

Performance gap between memory and processor

Systems without cache Application specific Embedded DRAM

Optimize DRAM performance by utilize its special characteristics

SDRAM’s multi-bank architecture enables new optimizations in scheduling

We assign arrays to different SDRAM banks to increase data access rate

Page 4: Array Allocation Taking into Account SDRAM Characteristics

4

Related Work

Previous research eliminate memory bottleneck by Using local memory (cache) Prefetch data as fast as possible

Panda, Dutt, and Nicolau utilizing page mode access to improve scheduling using EDO DRAM

Research about array mapping to physical memories for low power, lower cost, better performance

Page 5: Array Allocation Taking into Account SDRAM Characteristics

5

Motivation

DRAM operations Row decode Column decode Precharge

SDRAM characteristics Multiple banks Burst transfer Synchronous

Traditional DRAM 2-bank SDRAM

Row

Column

Row

Column

B a n k 1B a n k 0

Page 6: Array Allocation Taking into Account SDRAM Characteristics

6

Address Mapping Table

Host Address: [a16:a0] Memory Address: [BA, A7-A0]

Page Size for host: Page Size for DRAM:

128 words (a6:a0) 256 words (A7:A0)

-If we exchange the mapping of a0 and a7...

BA A7 A6 A5 A4 A3 A2 A1 A0Row a7 a16 a15 a14 a13 a12 a11 a10 a99x8Col a8 a6 a5 a4 a3 a2 a1 a0

A 9x8 SDRAM address mapping table (Bank interleaving size: 128 words)

BA A7 A6 A5 A4 A3 A2 A1 A0Row a0 a16 a15 a14 a13 a12 a11 a10 a99x8Col a8 a6 a5 a4 a3 a2 a1 a7

A 9x8 SDRAM address mapping table (Bank interleaving size: 1 word)

Page 7: Array Allocation Taking into Account SDRAM Characteristics

7

Motivational Example

BA=BankActive

=RowDecode

R/W=Read/Write =ColumnDecode

BP=Precharge

BA1

BP1

BP2

BA2

R1 R2

Command Bus (Address Bus)

Data1

Data2

DataBus

BA1

BP1

BP2

BA2

R1 R2

Data3

Data4

27 Cycles

1 2 3 4 5 6 7 8 9 10 11 12 13 14

16 17 18 19 20 21 22 23 24 25 26 27 2815

Page 8: Array Allocation Taking into Account SDRAM Characteristics

8

Motivational Example

BA=BankActive

=RowDecode

R/W=Read/Write =ColumnDecode

BP=Precharge

BA1

BP1

BP2

BA2

R1 R2

Command Bus (Address Bus)

Data1

Data2

DataBus

R3 R4

Data3

Data4

10 Cycles1 2 3 4 5 6 7 8 9 10

BA1

BP1

BP2

BA2

R1 R2

Command Bus (Address Bus)

Data1

Data2

DataBus

BA1

BP1

BP2

BA2

R1 R2

Data3

Data4

16 Cycles1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Page 9: Array Allocation Taking into Account SDRAM Characteristics

9

Assumptions

Harvard architecture : Separated program/data memory Paging policy of the DRAM controller

Does not perform precharge after read/write If next access reference to different page, perform precharge, foll

owed by bank active, before read/write As many pages can be opened at once as the number of banks

Resource constraints

Function Unit ALU Multiplier Divider SDRAM SDRAMSupported Op +,-,>,S * / BA,BP R,WClocks 1 2 4 2 3Quantity 1 1 1 2 or 4 2 or 4

Page 10: Array Allocation Taking into Account SDRAM Characteristics

10

Problem Definition

Input a data flow graph, the resource constraints, and the memory configuration

Perform our bank allocation algorithm Schedule the operations with a static list scheduling

algorithm considering SDRAM timing constraints Output a schedule of operations, a bank allocation table,

and the total cycle counts

Page 11: Array Allocation Taking into Account SDRAM Characteristics

11

Bank Allocation Algorithm

Calculate Node distances Calculate Array distances Give arrays with the shorter distances higher priority Allocate arrays to different banks if possible

Page 12: Array Allocation Taking into Account SDRAM Characteristics

12

Example: SOR

main(){ float a[N][N], b[N][N], C[N][N], d[N][N], e[N][N], f[N][N]; float omega, resid, u[N][N]; int j,l;

for (j=2; j<N; j++) for (l=1;l<N;l+=2) { resid = a[j][l]*u[j+1][l]+ b[j][l]*u[j-1][l]+ c[j][l]*u[j][l+1]+ d[j][l]*u[j][l-1]+ e[j][l]*u[j][l] – f[j][l]; u[j][l] -= omega*resid/e[j][l]; }}

RR

*

RR

*

RR

*

RR

*

+

+

+

a

b

c

d

u[j+1][l]

u[j-1][l]

u[j][l+1]

u[j][l-1]

{1,-,-,-,-,-,-,1,-}

{-,1,-,-,-,-,-,-,1}

{2,2,-,-,-,-,-,2,2}

{-,-,1,-,-,-,1,-,-}

{-,-,-,1,-,-,1,-,-}

{-,-,2,2,-,-,2,-,-}

{3,3,3,3,-,-,3,3,3}

D F G o f S O R (p a rtia l)

Page 13: Array Allocation Taking into Account SDRAM Characteristics

13

Node Distance

Distances between current node and the nearest node that access array a, b, c,…. Shown in { }

Ex. {1,-,-,-,-,-,-,1,-} means the distances to the node that access array a[j] and u[j-1] are both 1.

‘-’ means the distance is still unknown When propagate downstream, the distance increases.

RR

*

RR

*

+

a

b

u[j+1][l]

u[j-1][l]

{1,-,-,-,-,-,-,1,-}

{-,1,-,-,-,-,-,-,1}

{2,2,-,-,-,-,-,2,2}

distance to a[j]

distance to u[j-1]

distance to b[j]

Page 14: Array Allocation Taking into Account SDRAM Characteristics

14

Array Distance

The distance between nodes that access arrays Calculate from node distance of corresponding arrays Get the minimum value

Ex. AD(a[j], u[j-1])=min(2,4)=2

RR

*

RR

*

+

a

b

u[j+1][l]

u[j-1][l]

{1,-,-,-,-,-,-,1,-}

{-,1,-,-,-,-,-,-,1}

{2,2,-,-,-,-,-,2,2}

AD(a[j], u[j-1])=1+1=2

AD(a[j], b[j]) =2+2=4

AD(a[j], u[j-1])=2+2=4

Page 15: Array Allocation Taking into Account SDRAM Characteristics

15

Example: SOR

a[j] b[j] c[j] d[j] e[j] f[j] u[j] u[j+1] u[j-1]a[j] 0 4 6 6 7 6 6 2 4b[j] 4 0 6 6 7 6 6 4 2c[j] 6 6 0 4 7 6 2 6 6d[j] 6 6 4 0 7 6 2 6 6e[j] 7 7 7 7 0 3 2 7 7f[j] 6 6 6 6 3 0 3 6 6u[j] 6 6 2 2 2 3 0 6 6

u[j+1] 2 4 6 6 7 6 6 0 4u[j-1] 4 2 6 6 7 6 6 4 0

Array distance table of SOR

Bank allocation:

Bank 0: c,d,e,f Bank 1: a,b,u

Page 16: Array Allocation Taking into Account SDRAM Characteristics

16

Experimental Characteristics

We divided our benchmarks into two groups First group benchmarks access multiple 1-D arrays

Apply our algorithm to arrays Second group benchmarks access single 2-D arrays

Apply our algorithm to array rows Memory configurations

Multi-bank configuration: 2 banks/ 4banks Multi-chip configuration: 2 chips/ 4chips Multi-chip vs mulit-bank: relieves bus contention Utilizing page mode access or not

Page 17: Array Allocation Taking into Account SDRAM Characteristics

17

Results of the first group (multiple array)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

dhrc dequant wiener dct mmult leafcomp fir sor

Benchmark

No

rmali

ze

d C

ycle

Coarse 1Bank 2Bank 4Bank 2Chips 4Chips 2Bank+P 4Bank+P 2Chips+P 4Chips+P

Page 18: Array Allocation Taking into Account SDRAM Characteristics

18

Results of the second group (single array)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

compress laplace sobel lowpass compress2 laplace2 sobel2 lowpass2

Benchmark

No

rmali

ze

d C

ycle

Coarse 1Bank 2Bank 4Bank 2Chips 4Chips 2Bank+P 4Bank+P 2Chips+P 4Chips+P

Page 19: Array Allocation Taking into Account SDRAM Characteristics

19

Results compare to Panda's

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

dhrc dequant mmult leafcomp sor lowpass

Benchmark

No

rmali

ze

d C

ycle

s

Coarse 1Bank 2Bank 4Bank 2Chips 4Chips 2Bank+P 4Bank+P 2Chips+P 4Chips+P Panda

Page 20: Array Allocation Taking into Account SDRAM Characteristics

20

Experimental Results

From the average results, we can see that Scheduling using SDRAM with our bank allocation algorithm do

improve the performance Utilizing page mode access relieves the traffic of address bus,

thus the use of multiple chips does not make obvious improvement

Configuration 1 Chip/2 Banks 1 Chip/4 Banks 2 Chips/1 Bank 4 Chips /1 BankW/O PageMode 70.20% 62.28% 64.93% 54.51%W/ PageMode 53.38% 43.36% 52.52% 42.02%

Average schedule length of different configurations

Page 21: Array Allocation Taking into Account SDRAM Characteristics

21

Conclusions

We presented a bank allocation algorithm incorporated in our scheduler to take advantages of SDRAM

The scheduling results have a great improvement from the coarse one and beat Panda’s work in some cases

Our work is based on a common paging policy Several different memory configurations are exploited Scheduling results are verified and meet Intel’s PC

SDRAM’s spec

Page 22: Array Allocation Taking into Account SDRAM Characteristics

22

Future Works

Extending our research to Rambus DRAM Grouping arrays to incorporating burst transfer Integration with other scheduling /allocation techniq

ues