quick tips ...parihar/papers/poster_circ.pdfquick design guide (--this section does not print--)...
TRANSCRIPT
QUICK DESIGN GUIDE (--THIS SECTION DOES NOT PRINT--)
This PowerPoint 2007 template produces a 36”x60” professional poster. You can use it to create your research poster and save valuable time placing titles, subtitles, text, and graphics. We provide a series of online tutorials that will guide you through the poster design process and answer your poster production questions. To view our template tutorials, go online to PosterPresentations.com and click on HELP DESK. When you are ready to print your poster, go online to PosterPresentations.com. Need Assistance? Call us at 1.866.649.3004
Object Placeholders
Using the placeholders To add text, click inside a placeholder on the poster and type or paste your text. To move a placeholder, click it once (to select it). Place your cursor on its frame, and your cursor will change to this symbol . Click once and drag it to a new location where you can resize it. Section Header placeholder Click and drag this preformatted section header placeholder to the poster area to add another section header. Use section headers to separate topics or concepts within your presentation. Text placeholder Move this preformatted text placeholder to the poster to add a new body of text. Picture placeholder Move this graphic placeholder onto your poster, size it first, and then click it to add a picture to the poster.
RESEARCH POSTER PRESENTATION DESIGN © 2012
www.PosterPresentations.com
QUICK TIPS (--THIS SECTION DOES NOT PRINT--)
This PowerPoint template requires basic PowerPoint (version 2007 or newer) skills. Below is a list of commonly asked questions specific to this template. If you are using an older version of PowerPoint some template features may not work properly.
Template FAQs
Verifying the quality of your graphics Go to the VIEW menu and click on ZOOM to set your preferred magnification. This template is at 100% the size of the final poster. All text and graphics will be printed at 100% their size. To see what your poster will look like when printed, set the zoom to 100% and evaluate the quality of all your graphics before you submit your poster for printing. Modifying the layout This template has four different column layouts. Right-click your mouse on the background and click on LAYOUT to see the layout options. The columns in the provided layouts are fixed and cannot be moved but advanced users can modify any layout by going to VIEW and then SLIDE MASTER. Importing text and graphics from external sources TEXT: Paste or type your text into a pre-existing placeholder or drag in a new placeholder from the left side of the template. Move it anywhere as needed. PHOTOS: Drag in a picture placeholder, size it first, click in it and insert a photo from the menu. TABLES: You can copy and paste a table from an external document onto this poster template. To adjust the way the text fits within the cells of a table that has been pasted, right-click on the table, click FORMAT SHAPE then click on TEXT BOX and change the INTERNAL MARGIN values to 0.25. Modifying the color scheme To change the color scheme of this template go to the DESIGN menu and click on COLORS. You can choose from the provided color combinations or create your own.
© 2013 PosterPresenta/ons.com 2117 Fourth Street , Unit C Berkeley CA 94710 [email protected]
Student discounts are available on our Facebook page. Go to PosterPresentations.com and click on the FB icon.
• Despite the proliferation of multi-core, multi-threaded systems single-thread performance is still an important processor design goal
• Modern programs do not lack instruction level parallelism (ILP) • Real challenge: exploit implicit parallelism without undue costs • One effective approach: Decoupled look-ahead
Mo2va2on
Baseline Decoupled Look-‐ahead Architecture
• Look-ahead binary (skeleton) offers more parallelism because certain dependencies are removed during slicing for skeleton
• Look-ahead is more error-tolerant due to lack of correctness constraint • Can ignore occasional dependence violations • Little to no support needed, unlike in conventional TLS
• Not all instructions are equally important and critical for the final outcome; Plenty of weak instructions are present in a typical program
• Weak instructions can be removed safely from the look-ahead thread to speedup the look-ahead agent without degrading the quality of it
Look-‐ahead Accelera2on via Weak Dependence Removal Performance Benefits of Self-‐tuned Look-‐ahead • Speedup over baseline decoupled look-ahead: 1.16x • Speedup over single-thread baseline: 1.78x
Summary and Insights • Decoupled look-ahead can uncover significant implicit parallelism • Look-ahead thread often becomes a new bottleneck • Fortunately, look-ahead lends itself to various optimizations due to
lack of hard correctness constraints • Speculative parallelization is more beneficial in look-ahead thread
compared to main program thread due to increased parallelism • Weak instructions can be removed w/o affecting look-ahead quality • Intelligent look-ahead technique is a promising solution in the era of
flat frequency and modest microarchitecture scaling
References [1] A. Garg and M. Huang. A Performance-Correctness Explicitly Decoupled Architecture. In Proc. Int’l Symp. On Microarch., Nov 2008. [2] A. Garg, R. Parihar and M. Huang. Speculative Parallelization in Decoupled Look-ahead. In Proc. Int’l Conf. On Parallel Arch. and Compilation Techniques, Oct 2011. [3] R. Parihar and M. Huang. Accelerating Decoupled Look-ahead via Weak Dependence Removal. (Submitted), May 2013.
• The look-ahead thread (skeleton) runs on a separate core and maintains its memory image in local L1, no writeback to shared L2
• Look-ahead thread sends execution based branch outcome hints through FIFO queue; also helps prefetching in the shared L2 cache
Figure: In right half applications slower look-ahead thread is the bottleneck which slows down the overall decoupled look-ahead system; Number shown on top of each bar is the potential which can be achieved by
speeding up the slow look-ahead thread
Figure: Speedup of baseline look-ahead and speculatively parallel look-ahead over single-thread baseline
• Look-ahead thread is a self-reliant entity, independent of main thread that entails low management overhead on the main thread
• No need for quick spawning and register communication support • Natural throttling to prevent runaway prefetching and cache pollution
0DLQ�&RUH
%UDQFK�4XHXH/RRN�DKHDG�&RUH
/�� /��
([HFXWHV�/RRN�DKHDGWKUHDG
([HFXWHV�0DLQWKUHDG
/��
5HJLVWHU�VWDWH�V\QFKURQL]DWLRQ
3UHIHWFKLQJ�KLQWV
%UDQFK�SUHGLFWLRQ�
�DGGT�Y���Y���Y�QRS������EJW�D����[�����I�D�VXET�Y���W���D�
DGGT�Y���Y���Y�VXET�Y���W���D�FPRYJH�D���D���Y�DGGT�Y���Y���Y�VXET�Y���W���D�FPRYJH�D���D���Y�VXET�D����[���D�DGGT�Y���Y���Y�EJW�D����[�����I�D�VXET�Y���W���D�
)LJXUH��%DVHOLQH�'HFRXSOHG�/RRN�DKHDG�6\VWHP
Experimental Setup
Look-‐ahead Thread: A New BoKleneck
Prac2cal Advantages of Decoupled Look-‐ahead
Raj Parihar, Michael C. Huang {parihar@ece, michael.huang@}rochester.edu Advanced Computer Architecture Laboratory (ACAL), University of Rochester, Rochester, NY 14627
Accelera2ng Decoupled Look-‐ahead to Exploit Implicit Parallelism
*HQHWLF�$OJRULWKP�3URJUDP
%LQDU\�3DUVHU
)LWQHVV�(YDOXDWRU
,QLWLDO�6HHGV
/DXQFK�ILWQHVV�WHVW
&ROOHFW�ILWQHVVVFRUH
5HPRYH�*HQHV
)HHG�VNHOHWRQ
1RWLI\�+*$
�
�
�
�
� �
+LJK�(QG�6HUYHU��H�J���%OXHKLYH�
)LJXUH��+\EULG�*HQHWLF�$OJRULWKP�)UDPHZRUN�
/RFDO�:RUNVWDWLRQ�H�J���$FDOVUY�
6HFXUH�6KHOO��66+�VFS��TVXE��HWF�
Look-‐ahead Accelera2on via Specula2ve Paralleliza2on
Acknowledgements NSF (Grant # CCF-0747324), NSFC (Grant # 61028004), Alok Garg
Self-‐tuned and Specula2vely Parallel Look-‐ahead • In some cases, self-tuned and speculatively parallel look-ahead
techniques are synergistic (ammp, art) • Self-tuned + Speculative parallel look-ahead speedup: over single
thread baseline – 1.84x; over decoupled look-ahead baseline – 1.20x
Gene2c Algorithm based Framework • Genetic algorithm based framework can be reliably used to identify
and eliminate weak instructions from the look-ahead skeleton • Chromosome creation -> Crossover and mutation -> Natural selection
• Program/binary and dependence analysis tool: based on ALTO • Simulator: based on heavily modified SimpleScalar, look-ahead support • Genetic algorithm framework: a supervisor program written in C/C++
So_ware, Hardware and Run2me Support
Experimental Analysis and Results • Speedup of decoupled look-ahead over single-thread baseline: 1.53x • Speedup of speculative parallel look-ahead over single-thread: 1.73x
Figure: ILP limit study of SPEC 2000 INT applications for various instructions windows; Left three bars measure the ILP in a ideal system, whereas right three bars in the presence of realistic branch mispredictions and cache misses
Figure: Speedup of conventional TLS over single-thread, and speedup of speculatively parallel look-ahead over decoupled look-ahead. Speculative parallel look-ahead achieves higher speedup over a more aggressive baseline
Figure: Long-distance parallelism present in skeleton
• Software support: coarse-grain dependence analysis, finding of target and spawn points, exploitation of loop-level parallelism
• Hardware support: spawning support for a new thread, value communication through registers, partial cache versioning
• Runtime support: squash spawned thread if dependence violation occurs
Figure: Available parallelism for 2 core/contexts system
Figure: Examples of weak instructions (dark) in application vpr Figure: Distributions of weak and strong insts
Weak Dependences: Opportuni2es and Challenges • Example of weak instructions: Inconsequential adjustments, Load and
store instructions that are (mostly) silent, Dynamic NOP instructions • Challenges involved: Context-based, hard to identify and combine –
much like game Jenga, also interact with surrounding instructions
• Speedup of conventional TLS over single-thread baseline: 1.07x • Speculative parallel look-ahead over decoupled look-ahead: 1.13x