2013/01/14 yun-chung yang energy-efficient trace reuse cache for embedded processors yi-ying tsai...

20
Paper Presentation 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors and Chung-Ho Chen ansactions On Very Large Scale Integration(VLSI) Systems, Vol. 19, No.9, September 201

Upload: stanley-walsh

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

Paper Presentation

2013/01/14 Yun-Chung Yang

Energy-Efficient Trace Reuse Cachefor Embedded Processors

Yi-Ying Tsai and Chung-Ho Chen2010 IEEE Transactions On Very Large Scale Integration(VLSI) Systems, Vol. 19, No.9, September 2011

Page 2: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

2

Abstract Related Work Introduction Proposed Method Experiment Result Conclusion My Comment

Outline

Page 3: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

3

For an embedded processor, the efficiency of instruction delivery has attracted much attention since instruction cache accesses consume a great portion of the whole processor power dissipation. In this paper, we propose a memory structure called Trace Reuse (TR) Cache to serve as an alternative source for instruction delivery. Through an effective scheme to reuse the retired instructions from the pipeline back-end of a processor, the TR cache presents improvement both in performance and power efficiency. Experimental results show that a 2048-entry TR cache is able to provide 75% energy saving for an instruction cache of 16 kB, at the same time boost the IPC up to 21%. The scalability of the TR cache is also demonstrated with the estimated area usage and energy-delay product. The results of our evaluation indicate that the TR cache outperforms the traditional filter cache under all configurations of the reduced cache sizes. The TR cache exhibits strong tolerance to the IPC degradation induced by smaller instruction caches, thus makes it an ideal design option for the cases of trading cache size for better energy and area efficiency.

Abstract

Page 4: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

4

Related Work

Branch Prediction

Instruction cache restructuring

[1]-[6]

EnergyPerformance

Trace cache

[7]-[9] [10]-[13]

This paper

filter cache

[15], [16]

Page 5: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

5

Improvement in both performance and power efficiency. Performance

。Improvement on instruction delivery to boost up processor performance.

Power Efficiency。The upper level of memory architecture use less power

while accessing. e.g. Filter cache, but increase performance due to cache access.

Prior work focuses the front-end of instruction delivery. Those works need right program trace to reduce execution latency and energy consumption.

Introduction

Page 6: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

6

Page 7: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

7

Add D flip-flops and History Trace Buffer

Page 8: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

8

Indicate how often the opportunity occurs for fetching the same instruction in HTB.

HTB hit rate = Hk / Fk

Hk is the hit count in HTB, Fk is total instruction in program.

Hit ratio of the HTB

Page 9: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

9

Proposed Architecture

Page 10: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

10

Trace-Reuse cache architecture HTB(Trace History Buffer) – FIFO buffer, store the

instruction retired from pipeline backend. TET(Trace Entry Table) – store the PC value of control-

transfer instruction, for instance branch, and the corresponding HTB index.

Update HTB and TET When an instruction retired from pipeline, buffered in the

HTB with its PC value. If incoming instruction is branch instruction TET will

update as well.

Proposed Method

Page 11: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

11

Operation

Page 12: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

12

TET is check each cycle, size and structure is important.

Replaced-by-invalidation policy used in TET, when TET is full, instead of replacing any TET entry, the newly generated trace entry is discarded.

TET Implementation

Page 13: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

13

Fully Associate 4-way set associate Directed mapped

TET Implementation

Page 14: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

14

Add a busy bit to TET(a) entry Add invalidate flag and taken/not-taken direction

bit to HTB(b)

Adjustment of TET and HTB

Page 15: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

15

Replaced-by-invalidation

Page 16: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

16

Impact on Instruction Cache Access. Energy Efficiency. Using MiBench as input code.

Experiment Result

Page 17: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

17

Total number of instruction access in different TRC sizes.

Impact on Instruction Cache Access

Page 18: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

18

Using CACTI tool to calculate.

Tprogram-execution is the elapsed

program execution time Energy-delay product(EDP) is calculate by

multiplying normalized Etotal and Tprogram-execution

Energy Efficiency

Page 19: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

19

For an embedded system with non-taken prediction scheme, TR cache can up boost up 92% prediction rate with 21% performance.

TR cache virtually expands the capacity of the conventional instruction cache.

Can be done without the support of trace-prediction and trace-construction hardware.

Can deliver instruction with lower energy cost than the conventional instruction cache.

Conclusion

Page 20: 2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large

20

This is the first time of Journal paper. The proposed idea is easy, but did a great

improvement to whole system. Think that what should data should we put in our

tag architecture.

My Comment