reconfigurable hpc research at ornl using field-programmable gate arrays (fpgas)

11
Presented by Reconfigurable HPC Research at ORNL using Field-Programmable Gate Arrays (FPGAs) Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics

Upload: reece-bruce

Post on 30-Dec-2015

34 views

Category:

Documents


5 download

DESCRIPTION

Reconfigurable HPC Research at ORNL using Field-Programmable Gate Arrays (FPGAs). Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics. THE SUPERCOMPUTER COMPANY. Explore FPGAs for future ORNL HPC. High-performance computing vendors adopting FPGAs. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Reconfigurable HPC Research at ORNL using Field-Programmable Gate Arrays (FPGAs)

Presented by

Reconfigurable HPC Research at ORNL using Field-Programmable Gate Arrays (FPGAs)

Olaf O. StoraasliFuture Technologies Group

Computer Science and Mathematics

Page 2: Reconfigurable HPC Research at ORNL using Field-Programmable Gate Arrays (FPGAs)

2 Storaasli_ReconfigHPC_0611

Contents Background: Why FPGAs?

ORNL experience: FPGA systems/compilers and applications

ORNL partners: Research Lab, , SRC,

,

THE SUPERCOMPUTER COMPANY

Explore FPGAs for future ORNL HPC

Virtex4 FPGA blades to “accelerate mission-critical applications > 100x”

Steve Scott, CTO HPCWire 24/3/0606

“After exhaustive analysis, Cray concluded that, although multi-core commodity processors will deliver some improvement, exploiting parallelism through a variety of processor technologies using scalar, vector, multithreading and hardware accelerators (e.g., FPGAs or ClearSpeed co-processors) creates the greatest opportunity for application acceleration.”

ORNL potential benefit: Exceed petaflop goal (reduce power)

High-performance computing vendors adopting FPGAs

Page 3: Reconfigurable HPC Research at ORNL using Field-Programmable Gate Arrays (FPGAs)

3 Storaasli_ReconfigHPC_0611

FPGA Logic slice

What’s an FPGA? “User-tailored chip”

Xilinx Virtex4 FPGA: 25K slices (miniCPUs)

Logic array: user-tailored to application

On-chip RAM, multipliers, and PowerPC CPUs

Mega-gigabit transceivers and digital signaling processing blocks

Supports 100–1000 operations/clock cycle

Page 4: Reconfigurable HPC Research at ORNL using Field-Programmable Gate Arrays (FPGAs)

4 Storaasli_ReconfigHPC_0611

0

100

200

300

Computation(GOPS)

Memory Bandwidth(GB/sec)

IO Bandwidth (Gbps)

Pentium

Virtex-4FPGAVirtex4

Pentium

Why FPGAs?

Performance—optimal silicon use (maximize parallel ops/cycle)

Rapid growth—gates, speed, I/O

Low power—1/10th CPUs

Flexible—tailor to application

0

100

200

300

400

500

600

700

2002 2004 2006 2008

Thousands

Logic Cell Capacity1000

800

600

400

200

0

Page 5: Reconfigurable HPC Research at ORNL using Field-Programmable Gate Arrays (FPGAs)

5 Storaasli_ReconfigHPC_0611

Cray XD1Bee2 (via Xilinx)

ORNL FPGA hardware/compilers SRC-6 (Carte), Digilent (Viva,

VHDL), Nallatech (Viva)

Cray XD1 (MitrionC, DSPlogic/Matlab): 6 FPGAs + 144 Opterons

SGI RASC-Altix/Virtex4s (MitrionC)

Bee2: 5 FPGAs and CHiMPS (via Xilinx), DRC (future)

Page 6: Reconfigurable HPC Research at ORNL using Field-Programmable Gate Arrays (FPGAs)

6 Storaasli_ReconfigHPC_0611

Find parallelism: 80% FFTs

More GF/$ GF/Watt

Goal

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Profile

Model faster

Porting HPC code spectral transform shallow water model to FPGAs

FTRNDE

FTRNPE

FTTdd

UV FFT

SHTRNS FFT

COMP1

STEP

FTRNEX FTRNVX

8 calls in parallel

3 functions in parallel

2 calls in parallel

HLL compilerCHiMPS, Mitrion(FPGA Tools Inside) FPGA

speedup

HLL developerprofiles

Page 7: Reconfigurable HPC Research at ORNL using Field-Programmable Gate Arrays (FPGAs)

7 Storaasli_ReconfigHPC_0611

Viva: Graphical Icons—3-dimensional MitrionC: Text/flow—1-dimensional

Exploring programming options

Gauss matrix solverCompiler, simulator,

and debugger

+ Carte/SRC, CHiMPS-VHDL/Xilinx ,

Page 8: Reconfigurable HPC Research at ORNL using Field-Programmable Gate Arrays (FPGAs)

8 Storaasli_ReconfigHPC_0611

Simulate molecular dynamics: Compute Ewald particle mesh

(FORTRAN)

Speedup over Intel 2.8 GHz Xeon host

23,558 atoms 61,641 atoms

1st Version

+ memory tailored

1st Version

+ memory tailored

Compute only 3.30 3.30 3.97 3.97

Setup + compute 3.29 3.29 3.96 3.96

Compute + data transfer 0.64 3.20 0.69 3.83

Overall 0.60 3.19 0.60 3.82

Speedup increase with problem size

Amber speedup on SRC-6

Page 9: Reconfigurable HPC Research at ORNL using Field-Programmable Gate Arrays (FPGAs)

9 Storaasli_ReconfigHPC_0611

Speedup projections

0

200

400

600

800

1000

1200

psec/day

2.40E+04 6.20E+04 1.50E+05 2.50E+05 3.50E+05 4.50E+05

Number of atoms

SRC-6E200 MHz200 MHz + 5.6 GB/s500 MHz500 MHz + 5.6 GB/s

Page 10: Reconfigurable HPC Research at ORNL using Field-Programmable Gate Arrays (FPGAs)

10 Storaasli_ReconfigHPC_0611

Summary

ORNL FPGA research:

- Increasingly relevent to HPC

- FPGA Systems: Cray, SRC, Nallatech, Digilent, SGI, Bee2

- Compilers: Mitrion-C, Carte, Viva, CHiMPS, DSPlogic

- Application speedup: Amber, STSWM, BLAST, SW,…

- Partners: Xilinx HPC team, UT, Mitrion, Cray

Next: Evaluate application performance, test DRC

Acknowledgement:

This research is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725

Page 11: Reconfigurable HPC Research at ORNL using Field-Programmable Gate Arrays (FPGAs)

11 Storaasli_ReconfigHPC_0611

Contact

Olaf StoraasliFuture Technologies [email protected]

11 Storaasli_ReconfigHPC_0611