quick tips ...parihar/papers/poster_circ.pdfquick design guide (--this section does not print--)...

QUICK DESIGN GUIDE (--THIS SECTION DOES NOT PRINT--)

This PowerPoint 2007 template produces a 36”x60” professional poster. You can use it to create your research poster and save valuable time placing titles, subtitles, text, and graphics. We provide a series of online tutorials that will guide you through the poster design process and answer your poster production questions. To view our template tutorials, go online to PosterPresentations.com and click on HELP DESK. When you are ready to print your poster, go online to PosterPresentations.com. Need Assistance? Call us at 1.866.649.3004

Object Placeholders

Using the placeholders To add text, click inside a placeholder on the poster and type or paste your text. To move a placeholder, click it once (to select it). Place your cursor on its frame, and your cursor will change to this symbol . Click once and drag it to a new location where you can resize it. Section Header placeholder Click and drag this preformatted section header placeholder to the poster area to add another section header. Use section headers to separate topics or concepts within your presentation. Text placeholder Move this preformatted text placeholder to the poster to add a new body of text. Picture placeholder Move this graphic placeholder onto your poster, size it first, and then click it to add a picture to the poster.

RESEARCH POSTER PRESENTATION DESIGN © 2012

www.PosterPresentations.com

QUICK TIPS (--THIS SECTION DOES NOT PRINT--)

This PowerPoint template requires basic PowerPoint (version 2007 or newer) skills. Below is a list of commonly asked questions specific to this template. If you are using an older version of PowerPoint some template features may not work properly.

Template FAQs

Verifying the quality of your graphics Go to the VIEW menu and click on ZOOM to set your preferred magnification. This template is at 100% the size of the final poster. All text and graphics will be printed at 100% their size. To see what your poster will look like when printed, set the zoom to 100% and evaluate the quality of all your graphics before you submit your poster for printing. Modifying the layout This template has four different column layouts. Right-click your mouse on the background and click on LAYOUT to see the layout options. The columns in the provided layouts are fixed and cannot be moved but advanced users can modify any layout by going to VIEW and then SLIDE MASTER. Importing text and graphics from external sources TEXT: Paste or type your text into a pre-existing placeholder or drag in a new placeholder from the left side of the template. Move it anywhere as needed. PHOTOS: Drag in a picture placeholder, size it first, click in it and insert a photo from the menu. TABLES: You can copy and paste a table from an external document onto this poster template. To adjust the way the text fits within the cells of a table that has been pasted, right-click on the table, click FORMAT SHAPE then click on TEXT BOX and change the INTERNAL MARGIN values to 0.25. Modifying the color scheme To change the color scheme of this template go to the DESIGN menu and click on COLORS. You can choose from the provided color combinations or create your own.

© 2013 PosterPresenta/ons.com 2117 Fourth Street , Unit C Berkeley CA 94710 [email protected]

Student discounts are available on our Facebook page. Go to PosterPresentations.com and click on the FB icon.

•  Despite the proliferation of multi-core, multi-threaded systems single-thread performance is still an important processor design goal

•  Modern programs do not lack instruction level parallelism (ILP) •  Real challenge: exploit implicit parallelism without undue costs •  One effective approach: Decoupled look-ahead

Mo2va2on

Baseline Decoupled Look-‐ahead Architecture

•  Look-ahead binary (skeleton) offers more parallelism because certain dependencies are removed during slicing for skeleton

•  Look-ahead is more error-tolerant due to lack of correctness constraint •  Can ignore occasional dependence violations •  Little to no support needed, unlike in conventional TLS

•  Not all instructions are equally important and critical for the final outcome; Plenty of weak instructions are present in a typical program

•  Weak instructions can be removed safely from the look-ahead thread to speedup the look-ahead agent without degrading the quality of it

Look-‐ahead Accelera2on via Weak Dependence Removal Performance Benefits of Self-‐tuned Look-‐ahead •  Speedup over baseline decoupled look-ahead: 1.16x •  Speedup over single-thread baseline: 1.78x

Summary and Insights •  Decoupled look-ahead can uncover significant implicit parallelism •  Look-ahead thread often becomes a new bottleneck •  Fortunately, look-ahead lends itself to various optimizations due to

lack of hard correctness constraints •  Speculative parallelization is more beneficial in look-ahead thread

compared to main program thread due to increased parallelism •  Weak instructions can be removed w/o affecting look-ahead quality •  Intelligent look-ahead technique is a promising solution in the era of

flat frequency and modest microarchitecture scaling

References [1] A. Garg and M. Huang. A Performance-Correctness Explicitly Decoupled Architecture. In Proc. Int’l Symp. On Microarch., Nov 2008. [2] A. Garg, R. Parihar and M. Huang. Speculative Parallelization in Decoupled Look-ahead. In Proc. Int’l Conf. On Parallel Arch. and Compilation Techniques, Oct 2011. [3] R. Parihar and M. Huang. Accelerating Decoupled Look-ahead via Weak Dependence Removal. (Submitted), May 2013.

•  The look-ahead thread (skeleton) runs on a separate core and maintains its memory image in local L1, no writeback to shared L2

•  Look-ahead thread sends execution based branch outcome hints through FIFO queue; also helps prefetching in the shared L2 cache

Figure: In right half applications slower look-ahead thread is the bottleneck which slows down the overall decoupled look-ahead system; Number shown on top of each bar is the potential which can be achieved by

speeding up the slow look-ahead thread

Figure: Speedup of baseline look-ahead and speculatively parallel look-ahead over single-thread baseline

•  Look-ahead thread is a self-reliant entity, independent of main thread that entails low management overhead on the main thread

•  No need for quick spawning and register communication support •  Natural throttling to prevent runaway prefetching and cache pollution

0DLQ�&RUH

%UDQFK�4XHXH/RRN�DKHDG�&RUH

/�� /��

([HFXWHV�/RRN�DKHDGWKUHDG

([HFXWHV�0DLQWKUHDG

/��

5HJLVWHU�VWDWH�V\QFKURQL]DWLRQ

3UHIHWFKLQJ�KLQWV

%UDQFK�SUHGLFWLRQ�

�DGGT�Y��Y��Y�QRS��EJW�D��[��I�D�VXET�Y��W��D�

DGGT�Y��Y��Y�VXET�Y��W��D�FPRYJH�D��D��Y�DGGT�Y��Y��Y�VXET�Y��W��D�FPRYJH�D��D��Y�VXET�D��[��D�DGGT�Y��Y��Y�EJW�D��[��I�D�VXET�Y��W��D�

)LJXUH��%DVHOLQH�'HFRXSOHG�/RRN�DKHDG�6\VWHP

Experimental Setup

Look-‐ahead Thread: A New BoKleneck

Prac2cal Advantages of Decoupled Look-‐ahead

Raj Parihar, Michael C. Huang {parihar@ece, michael.huang@}rochester.edu Advanced Computer Architecture Laboratory (ACAL), University of Rochester, Rochester, NY 14627

Accelera2ng Decoupled Look-‐ahead to Exploit Implicit Parallelism

*HQHWLF�$OJRULWKP�3URJUDP

%LQDU\�3DUVHU

)LWQHVV�(YDOXDWRU

,QLWLDO�6HHGV

/DXQFK�ILWQHVV�WHVW

&ROOHFW�ILWQHVVVFRUH

5HPRYH�*HQHV

)HHG�VNHOHWRQ

1RWLI\�+*$

�

�

�

�

� �

+LJK�(QG�6HUYHU��H�J��%OXHKLYH�

)LJXUH��+\EULG�*HQHWLF�$OJRULWKP�)UDPHZRUN�

/RFDO�:RUNVWDWLRQ�H�J��$FDOVUY�

6HFXUH�6KHOO��66+�VFS��TVXE��HWF�

Look-‐ahead Accelera2on via Specula2ve Paralleliza2on

Acknowledgements NSF (Grant # CCF-0747324), NSFC (Grant # 61028004), Alok Garg

Self-‐tuned and Specula2vely Parallel Look-‐ahead •  In some cases, self-tuned and speculatively parallel look-ahead

techniques are synergistic (ammp, art) •  Self-tuned + Speculative parallel look-ahead speedup: over single

thread baseline – 1.84x; over decoupled look-ahead baseline – 1.20x

Gene2c Algorithm based Framework •  Genetic algorithm based framework can be reliably used to identify

and eliminate weak instructions from the look-ahead skeleton •  Chromosome creation -> Crossover and mutation -> Natural selection

•  Program/binary and dependence analysis tool: based on ALTO •  Simulator: based on heavily modified SimpleScalar, look-ahead support •  Genetic algorithm framework: a supervisor program written in C/C++

So_ware, Hardware and Run2me Support

Experimental Analysis and Results •  Speedup of decoupled look-ahead over single-thread baseline: 1.53x •  Speedup of speculative parallel look-ahead over single-thread: 1.73x

Figure: ILP limit study of SPEC 2000 INT applications for various instructions windows; Left three bars measure the ILP in a ideal system, whereas right three bars in the presence of realistic branch mispredictions and cache misses

Figure: Speedup of conventional TLS over single-thread, and speedup of speculatively parallel look-ahead over decoupled look-ahead. Speculative parallel look-ahead achieves higher speedup over a more aggressive baseline

Figure: Long-distance parallelism present in skeleton

•  Software support: coarse-grain dependence analysis, finding of target and spawn points, exploitation of loop-level parallelism

•  Hardware support: spawning support for a new thread, value communication through registers, partial cache versioning

•  Runtime support: squash spawned thread if dependence violation occurs

Figure: Available parallelism for 2 core/contexts system

Figure: Examples of weak instructions (dark) in application vpr Figure: Distributions of weak and strong insts

Weak Dependences: Opportuni2es and Challenges •  Example of weak instructions: Inconsequential adjustments, Load and

store instructions that are (mostly) silent, Dynamic NOP instructions •  Challenges involved: Context-based, hard to identify and combine –

much like game Jenga, also interact with surrounding instructions

•  Speedup of conventional TLS over single-thread baseline: 1.07x •  Speculative parallel look-ahead over decoupled look-ahead: 1.13x

quick tips ...parihar/papers/poster_circ.pdfquick design guide (--this section does not print--)...

Documents