march 11, 2003 ss-sq03-w: 1 stanford streaming supercomputer (sss) winter quarter 2002-2003 wrapup...
Post on 21-Dec-2015
217 views
TRANSCRIPT
SS-SQ03-W: 1 March 11, 2003
Stanford Streaming Supercomputer (SSS) Winter Quarter 2002-2003 Wrapup Meeting
Bill Dally, Computer Systems Laboratory
Stanford University
March 11, 2003
SS-SQ03-W: 2 March 11, 2003
Year 2 Overview
• Where we are today– First year goal was met: demonstrated feasibility on single node– Feedback from site visit team was very positive – Potential for a big impact on scientific computing– But still much to do!
• Key FY03 goals– Get long-term software infrastructure in place
• Select approach, implement baseline Brook to SSS compiler– Multi-node versions that scale
• Language, compiler, simulator– Tackle hard problems: 3-D, Irregular neighborhoods/sparse matrix
solve• Language support, numerics support, evaluate on simulator
– Refine architecture• Cluster organization, aspect ratio, register organization, memory
organization– Industrial Partner
• Start serious discussions, outreach to build support, close partner in 04
SS-SQ03-W: 3 March 11, 2003
Some concerns
• We’re doing a great job – but…• Losing a bit of focus and momentum
– Tooling on the detail– Need to take a step back and reexamine the big
picture
• Need to raise our outside profile – Publish
• Overview paper• Brook paper
– Generate some more convincing evidence of advantages
• Need a control for bandwidth measures
– Update the web page– Visit the labs
SS-SQ03-W: 4 March 11, 2003
Lets review our overall goal
Exploit capabilities of VLSI to realize cost-effective scientific computing.
SS-SQ03-W: 5 March 11, 2003
Review – What is the SSS Project About?
• Exploit streams to give 100x improvement in performance/cost for scientific applications vs. ‘cluster’ supercomputers
– From 100 GFLOPS PCs to TFLOPS single-board computers to PFLOPS supercomputers
• Use layered programming system to simplify development and tuning of applications
– Stream languages– Streaming virtual machine
• Demonstrated feasibility of streaming scientific computing in year 1• Refine architecture and programming system in year 2
– Demonstrate realistic applications (3D, irregular)– Build usable compiler– Resolve architecture questions – aspect ratio, conditional execution,
sparse clusters, reg organization, memory system, etc…• Build a prototype and demonstrate CITS applications in years 3-6
– With industrial and government partners– Broaden our base of support
SS-SQ03-W: 6 March 11, 2003
Industrial Partner Update
• Candidates– Cray, IBM, Sun, HP, SGI, Intel
• Initial discussion– Present SSS project and results to date– Discuss collaboration models– Identify next steps
• Met with Cray, Sun, and SGI– They listened politely, but little traction– Need more convincing evidence– Need to address programming issue
• Have to provide a path for legacy codes
SS-SQ03-W: 7 March 11, 2003
Outreach
• National Labs– Los Alamos– Livermore– Sandia
• Other Government– NASA– DARPA– DoD (Charlie Holland)– AFOSR
• User communities
SS-SQ03-W: 8 March 11, 2003
Software Win 02 Goals
Brook– Define carefully the semantics of the operators
• No progress– Work on “views of memory” abstraction
• Proposed API – will write up for next SW meeting– Support for partitioning, shared memory, naming, fitting into stream
abstraction• Adopting UPC – will write up for next SW meeting
– Support for irregular neighborhoods• Failed to find an application
– Multithreaded version (Christos)• Have simple model for multi-node – written up
– (NEW) Preliminary Brooktran spec– Concrete Winter goals [Ian/Frank]
• Review of the language [Pat]• Partitioning (UPC)• Multi-node/Multi-threaded version• Irregular support – w/ application• PPoPP paper• MD on BRT
SS-SQ03-W: 9 March 11, 2003
Brook Spring 03 Goals
• Refine semantics of operators– New version of spec
• Implement views of memory API (UPC)• Find application for irregular structures
– Dijkstra, incomplete LU
• Dynamic structure• Start switching to new compiler• Brooktran spec/implementation
– Implemented in Open64
• Concern – have lost metacompiler support
SS-SQ03-W: 10 March 11, 2003
Software Win 02 Goals
SVM– Spec has evolved
• Concensus between MIT, Texas, Stanford, USC
– Implement multinode version• No progress
– SVM to simulator path• No progress
– Multi-thread
SS-SQ03-W: 11 March 11, 2003
SVM – Spring 03 Goals
• Spec is complete – and supports SSS• Revise single-node simulator• Multi-node simulator (prelim)
SS-SQ03-W: 12 March 11, 2003
Software Win 02 Goals (3 of 3)
• Start regular meetings [Done]• Compiler
– Decide on flow from Brook->SVM->SSS [Mattan]• Done
– Select base compiler [Jayanth]• ORC, Gnu, SUIF, Tendra, others…• Done
– “Spike” a simple program from Brook->SSS [Mattan/Jayanth ++]
• Started – modified front end – operating on WHIRL– Brook to Nvidia– Optimizations [Spring]
• Run time– Write a white paper
SS-SQ03-W: 13 March 11, 2003
Compiler Spring 03 Goals
• Complete feasibility study• Brook to C path
– Parse Brook– Generate C
• Optimizations– See Mattan’s document
• Need to generate SVM code by mid summer
• Parse Brooktran [Alan, Fatica, Jayanth]• Kernel scheduler MULADD [Das]• SVM to SSS [Francois – long term – need plan]
SS-SQ03-W: 14 March 11, 2003
Application Win 02 Goals
• StreamFLO[Fatica]– Base version is complete– Not running on simulator – Early start on 3D version – partitioning waiting on API def
• StreamFEM [Barth]– Waiting on spec for partitioning– 3D arithmetic kernels done– Tridiagonal in Brook
• StreamMD [Eric/student]– Ported GROMACS to the NV30 – benchmarks
• Performance dependent on number of registers• Doesn’t work with CG compiler
• Model applications [Ron/Frank]– Started
• Look at Sierra, purple benchmarks: ppm, sweep3D [delay]
SS-SQ03-W: 15 March 11, 2003
Application Spring 03 Goals
• StreamFLO[Fatica]– Parse Brooktran – F to WHIRL [Alan, Fatica]– Partitioned version – multi-node UPC– 3D version
• StreamFEM [Barth]– Simulate 3D– Sparse LUD– Partitioned version
• StreamMD [Eric/student]– Hand-tune NV30 assembly code – GROMACS in Brook
• Model applications [Ron/Frank]– C implementations of adaptive structures
• Look at Sierra, purple benchmarks: ppm, sweep3D [delay]
SS-SQ03-W: 16 March 11, 2003
Architecture Win 02 Goals• Single-Node Simulator [Jung-Ho, Knight]
– 64-bit support, MULADD, Scalar Processor– Not yet
• Multi-Node Simulator [Jung-Ho, Abhishek]– Network model– Multi-node mechanisms– Not yet
• Point Studies– Aspect ratio
• SSE vs VLIW• Planning
– Conditional execution [Mattan/Ujval]• Started
– Sparse clusters– SRF organization [Nuwan]
• Complete– Cache alternatives [Jung Ho]– Add and store study [Jung Ho]
• Started– I/O– Iterative operations [Francois]
• Planned
SS-SQ03-W: 17 March 11, 2003
Architecture Spring 03 Goals
• Multi-node simulator• Point Studies
– Aspect ratio [TIM]– Conditional execution [Mattan/Ujval]– Sparse clusters [Delay]– SRF organization [Nuwan]
• Refine• Cache alternatives [Jung Ho]
– Add and store study [Jung Ho]– I/O [?]– Iterative operations [Francois]
• 64-bit [delay]• Scalar Processor [delay]
SS-SQ03-W: 18 March 11, 2003
Special Win 02 Goals
• Fix website [Pat]– Public and private websites
• Name that computer– Mississippi– Axios– Submit names to Mattan– Bill, Pat, Bill to choose
• Project Party [Mattan – Pat’s house]
SS-SQ03-W: 19 March 11, 2003
Name Resolution
• From now on, the SSS is called
Merrimac
SS-SQ03-W: 20 March 11, 2003
Winter Quarter Meeting Schedule
• 4/1 Fedkiw Party• 4/8 Alan, Fatica Brooktran• 4/15 Kapasi Conditionals• 4/22 Fatica StreamFLO update• 4/29 Review Prep• 5/6 Review Prep• 5/13 Tim, Tim StreamFEM 3D• 5/20 Ian, Pat Brook Specification• 5/27 Mattan Bandwidth Comparison• 6/3 Jayanth Compiler• 6/10 Bill Wrapup
SS-SQ03-W: 21 March 11, 2003
Papers• Arch
– Indexable SRFs (Nuwan)– Streaming Supercomputer Overview (Tim K.)– Streaming on conventional CPUs (Mattan)– Conditionals (Ujval)– Remote Ops (Jung Ho)– Aspect Ratio (?)– Data parallel (SSE) vs. ILP (VLIW)
• Software– Design of Brook (Ian)– Data parallel programming on graphics HW (Pat)– Brook to CG
• Compiler• Apps
– Gromacs– StreamFEM (Tim2)
• Overview (Bill and Pat)