Transcript
Page 1: High Performance Computing Infrastructure: Past, Present, and Future

1

High Performance Computing Infrastructure:

Past, Present, and Future

ByClay Gloster, Jr., Ph.D., P.E.

Associate Professor

Department of Electrical & Computer EngineeringHoward UniversityTHE RARE PROJECT

[email protected] 22, 2009

Page 2: High Performance Computing Infrastructure: Past, Present, and Future

2

Presentation Outline

• Introduction to Reconfigurable Computing

• The Bison Configurable Digital Signal Processor

• The BCDSP Design Flow

• Current Function Cores and Modules

• A Remote Reconfigurable Computer

• A Parallel and Configurable Computer

Page 3: High Performance Computing Infrastructure: Past, Present, and Future

3

Introduction to Reconfigurable Computing

Page 4: High Performance Computing Infrastructure: Past, Present, and Future

4

Problem Statement

• Given: An application that is computationally intensive or requires considerable CPU execution time.i.e., weather modeling, remote sensing, target recognition, precision targeting, gene sequencing

• Find: A solution that significantly improves performance, requires acceptable development time, at a reasonable cost.

Page 5: High Performance Computing Infrastructure: Past, Present, and Future

5

Potential Solutions

• Cluster-based computing: The use of several general purpose computing systems, i.e. PCs. (Writing programs that execute on typical PCs/workstations.)

• Application-Specific Integrated Circuit (ASIC) Design: The use of special-purposed ICs or chips. (Designing a chip (hardware) that is highly optimized for the particular application.)

• Reconfigurable Computing: The merger of the two approaches. (Writing software to execute non-time-critical portions of the application on a PC while designing hardware to execute the time-critical portions of the application on an FPGA.)

Page 6: High Performance Computing Infrastructure: Past, Present, and Future

6

A Reconfigurable Computer is:

A PC attached to one or more Field Programmable Gate Arrays(FPGAs).

PC

Host

Page 7: High Performance Computing Infrastructure: Past, Present, and Future

7

An FPGA is:

A programmable integrated circuit.

At time t1, it can be programmed as X1 (personal data assistant).At time t2, it can be programmed as X2 (calculator).

Programmable Pin

Configurable Logic Block

Programmable Interconnect

Page 8: High Performance Computing Infrastructure: Past, Present, and Future

8

RC Systems Advantages

• Several applications have been implemented on a reconfigurable computing system resulting in a system with execution times that were an order of magnitude faster than the same application implemented on a typical desktop computer.

• The same reconfigurable computing system hardware can be reused for diverse applications.

• With an RC system, a system can be deployed and subsequently reprogrammed with new hardware to perform functions that were not available at the time of deployment.

Page 9: High Performance Computing Infrastructure: Past, Present, and Future

9

RC Systems Disadvantages

• Developing an RC system requires a system designer that is knowledgeable in both hardware design as well as software design.

• Time required to design and implement an RC system that executes faster than a typical desktop computer can be several months.

Page 10: High Performance Computing Infrastructure: Past, Present, and Future

10

Research Objectives

• To obtain RC system implementations of several applications that achieve an order of magnitude speedup over executing the application on a typical desktop computer.

• To develop tools that reduce RC system development time from months to weeks or days and allow users who are not knowledgeable in hardware design to be able to implement RC systems while experiencing the potential benefit of increased system performance and system reuse.

• To develop a resource management system to efficiently utilize available reconfigurable computing resources located at remote sites.

Page 11: High Performance Computing Infrastructure: Past, Present, and Future

11

The Bison Configurable Digital Signal Processor

Page 12: High Performance Computing Infrastructure: Past, Present, and Future

12

A Configurable Digital Signal Processor

M0

Data

Mn

Instructions

M3

Data

M2

Data

M1

DataProcessor (BCDSP)

DATAUNIT

CONTROLUNIT

Function Core

(FunCoreGen) Mn-1

Data

Mn-2

Data

Page 13: High Performance Computing Infrastructure: Past, Present, and Future

13

Functional Cores

R0 R1 R7

DONE

ENABLE

• Have one or more 32-bit inputs

• Perform floating point vector operations.

• Have simple control.

• Can be built using other FunCores.

• Can include conditional units.

FunCore

Page 14: High Performance Computing Infrastructure: Past, Present, and Future

14

2-D DCT Function Core

Z0

X X X X X X X X

+ + + +

+

+

+

R1 R2R0 R3 R4 R5 R6 R7R8 R9 R10 R11

R12 R13 R14 R15

Page 15: High Performance Computing Infrastructure: Past, Present, and Future

15

Optimizing System Performance with the BCDSP

• Memory is 64-bits wide allowing two single-precision floating point numbers to be fetched in a single memory access.

• There are N=4 data memories, hence multiple data items can be read/written in a single cycle. Theoretically, the number of memory accesses can be reduced by a factor of N=4. (This number can be increased to an upper bound 2N=8 if we store two floating point values per location.)

• Multiple function cores can be used. For example, a typical processor may have 1 multiplier. In this case, K multiplies require K time units or clock cycles. With K multipliers, K multiplies can be executed in a single time unit or clock cycle.

• Pipelining and DMA accesses are used to increase system performance.

• The use of complex instructions, reduces the program size and the number of memory accesses required during program execution.

Page 16: High Performance Computing Infrastructure: Past, Present, and Future

16

BCDSP Software, Cores, and Processors

Page 17: High Performance Computing Infrastructure: Past, Present, and Future

AIST-0016-0044 17

Distinguishing Features of RCCT

• Placement and routing is performed off-line.• The Hardware Module Library evolves continuously.• Compiler can easily recognize new modules.• As new modules are added, the Compiler has a better chance to

improve performance for each user application.

SourceCode

Modified

SourceCode

Modified

DefinitionModule

FileCodeSource

Original

SessionFiles

RCCTCompiler

High LevelCompiler

ExecutableCode

BitStream

LogicSynthesis

Placement& Routing

HDL

High LevelCompiler

ExecutableCode

CodeSource

CompilerSpecial

Original Our ApproachTraditional Approach

Page 18: High Performance Computing Infrastructure: Past, Present, and Future

18

The Front-End Compiler•The purpose of the compiler is to map user applications to FPGA-based reconfigurable computers (RC), (i.e. the BISON reconfigurable computer).

•The compiler takes the original source code written in C/C++ and a module library and produces two outputs: the modified source code and a session file for each modified section.

RCCTCompiler

RCCTCompiler

ProgrammingLanguageCompiler

ProgrammingLanguageCompiler

OriginalSourceCode

ModifiedSourceCode

Session files

ModuleLibrary

NewApplicationExecutable

(Calls the Loader)

Page 19: High Performance Computing Infrastructure: Past, Present, and Future

19

The BCDSP Processor Back-End Compiler

c2hldct.c dct_hl.vhd hl2cududct_cu.vhd

dct_du.vhd

PECORE.vhd

hl2cudu consists of approximately 15 programs!!!

Page 20: High Performance Computing Infrastructure: Past, Present, and Future

20

Execution Time for the 2D-DCT

Image Size

Software (ms)2.97 GHz PC

Hardware (ms)24 MHz BCDSP

Speedup

8x8 0.0400 0.0112 3.56

16x16 0.095 0.0272 3.48

32x32 0.264 0.09150 2.88

64x64 0.849 0.3484 2.43

128x128 3.080 1.3746 2.24

256x256 12.154 5.478 2.22

512x512 60.556 21.8942 2.76

1024x1024 185.754 87.5560 2.12

Reconfigurable hardware was 2.71 times faster on average!!!!

Page 21: High Performance Computing Infrastructure: Past, Present, and Future

21

A Remote Configurable Computer

Page 22: High Performance Computing Infrastructure: Past, Present, and Future

22

A Remote And Reconfigurable Environment (RARE)

FPGA0

FPGA1

Automated Tool Set

Processor Library

Application (C, Java,…)

User Parameters(power, size, weight…)

BCDSP

Remote Environment Resource Bank

Resource Controller

M00 M0

1 M0n

M10 M1

1 M1p

FPGAm Mm0 Mm

1 Mmq

Page 23: High Performance Computing Infrastructure: Past, Present, and Future

23

FPGA Board

FPGA Board

The RARE Project Infrastructure

The RARE software is developed using Java. The Java language is selected because it offers a number of advantages over other programming languages.

The RARE software is developed using Java. The Java language is selected because it offers a number of advantages over other programming languages.

Java supports native methods, remote method invocation and network security. The native method feature allows the use of software routines

written in other programming languages such as C/C++ to be called from Java applications. Remote method invocation and network security

features make it possible to execute Java programs from a remote site.

Java supports native methods, remote method invocation and network security. The native method feature allows the use of software routines

written in other programming languages such as C/C++ to be called from Java applications. Remote method invocation and network security

features make it possible to execute Java programs from a remote site.

Client.java with RMI

links

Client.java with RMI

links

Server.java with RMI

links

Server.java with RMI

linksNMI NMI Function.c Function.c INTERNET

Page 24: High Performance Computing Infrastructure: Past, Present, and Future

24

PNN Execution Times

ImplementationType

Local (ms)

Remote(ms)

Software (Java) 628.71 2887.74

Software (Cpp) 861.04 3116.17

Hardware 104.07 371.01

Remote hardware can be faster than local software!!!!

Page 25: High Performance Computing Infrastructure: Past, Present, and Future

25

A Parallel and Configurable Computer System

Page 26: High Performance Computing Infrastructure: Past, Present, and Future

26

A Parallel and Configurable Computer

CCN0

CCN1

CCN2

CCNi

CCN6

CCN7

PC2i

FPGABrd2i

PC2i+1

FPGABrd2i+1

Parallel CC

• NSF MRI Grant: A Parallel and Configurable Computer for Research in Engineering and the Computational Mathematical Sciences ($500K)

• Projects related to RFID, an Electronic Nose, PET Image Reconstruction, Image Compression, and Computer Vision are using this equipment to solve real world problems.

Page 27: High Performance Computing Infrastructure: Past, Present, and Future

27

Cluster Specifications

• 8 Compute Nodes – 1 x PCI-X dual port Infiniband 4X HCA card– 1 x 250GB SATA Hard Drive 7200RPM w/ 16MB Cache– 8 x 1GB PC3200 ECC Reg DDR (400MHz)– 1 x PNY nVidia Quadro FX 3000G w/ 8XAGP, 256MB DDR, Dual DVI/DVI– 2 x AMD Opteron Model 250 (2.4GHz)– 60-30-12921 1 x Dual Opteron S2885 EATX Motherboard w/ 8X AGP, gigE,

SATA, audio, firewire, 4x 64-bit PCI

• 1 Head Node– 1 x PCI-X dual port Infiniband 4X HCA card– 8 x 1GB PC3200 ECC Reg DDR (400MHz)– 2 x AMD Opteron Model 250 (2.4GHz)– 1 x PNY nVidia Quadro FX 3000G w/ 8XAGP, 256MB DDR, Dual DVI/DVI– 1 x 10/100/1000 64bit PCI-X Gigabit Copper NIC

• 9 FPGA Coprocessors– 16 WS2P/XC2VP100-6P/48D/256 Wildstar II PRO PCI board with 2 ea P100-6

parts & 48 MB DDR SRAM and 256 MB DDR SDRAM

Page 28: High Performance Computing Infrastructure: Past, Present, and Future

28

RARE Project Past, Present, and Future

Page 29: High Performance Computing Infrastructure: Past, Present, and Future

29

AIST Program Space Based NRA Technologies

Description and Objectives

DeliverablesApproach

Co-I’s/Partners

ESTO Earth Science Technology Office

Application/Mission

Hierarchical Algorithms and their Embedded Hierarchical Algorithms and their Embedded Computational Realization in Reconfigurable HardwareComputational Realization in Reconfigurable Hardware

VLIW61

61

61

61

61

Mem1

Mem2

Mem3

Mem4

Mem5

61

61

61

PE1

PE2

PE3

PE4

PE5

61

61

FIFO1 FIFO2 FIFO3 FIFO4 FIFO5

PCI Bus

34

34 34 34 34 34

34

34

34

34

This project addresses problems associated with developing data products for deployment in onboard RC systems. It involves the development of a compiler that reads algorithm descriptions written in C. The compiler will produce hardware and software components required for an RC implementation of typical NASA data products. The main objectives of this project are: efficient algorithm development and fast and reconfigurable hardware implementations (10X-100X speedup).

Develop a compiler to translate nested loops into a sequence of floating point vector instructions. These instructions correspond to modules in a library that is to be developed as a part of this project. Hardware modules will perform complex instructions i.e. matmult, vec-vecmult, FFT, etc.

- Prototype RC Testbed shown above

-Prototype Compiler

-Cloud Masking Data Product Demonstration

-Final Compiler

Hamid Krim, Tom Conte, NC State UniversityCloud Cover Assessment Data Product Development for EOS/AM-1 Satellite

PI: Clay Gloster/Howard UniversityProposal No: AIST 0016-0044

Page 30: High Performance Computing Infrastructure: Past, Present, and Future

High Performance Weather Forecast Modeling

30

WRF is an HPC next generation mesoscaleforecast model and assimilation system developed as a collaborative effort by the Atmospheric science community. It is a massively parallel computing environment for both forecasting and research purposes.

WRF Architecture

3 Level Hierarchical Structure Driver: Processor management etcMediation: interface between Model and DriverModel: plug-in algorithms that compute actual models

Model layer includesLongwave radiation: RRTMShortwave radiation: NASA/GSFC, MM5 (Dudhia)Cumulus: Kain-Fritsch, Betts-Miller-JanjicExplicit microphysics: Kessler, Lin et al., NCEP 3-class (Hong)PBL: MRF, MM5 (Slab)

WRF acknowledges the HPC problem, and is currently pursuing the standard solution 3

RARE solution: replace physics plug-ins with BCDSP FPGA equivalents

Figure courtesy of http://www.wrf-model.org

Page 31: High Performance Computing Infrastructure: Past, Present, and Future

31

A Reconfigurable and Open Architecture Module for Unmanned Systems

● Reconfigurable modules can be reused for various types of unmanned systems, each containing a diverse range of sensors, cameras, displays, GPS receivers, etc.

● Reconfigurable modules can provide capabilities during the mission that were unknown prior to the beginning of the mission.

● With these modules computing resources can be used on remote unmanned systems from a ground station when these modules are idle.

● With reconfigurable modules, a fixed amount of hardware can be changed to theoretically provide an infinite number of different capabilities.

● Because of the unpredictable nature of combat, reconfigurable systems provide the flexibility and performance needed to respond rapidly and effectively to unexpected threats.

● These systems can provide reconfigurable interfaces and interconnections. One system can accommodate any combination of interfaces: USB, Gigabit Ethernet, RS432, IR, wireless, FireWire, etc.

Page 32: High Performance Computing Infrastructure: Past, Present, and Future

Current System Specification

System Specifications•Weight - 27 lbs•Size – 6 x 7x 8.5 in•Power – 150 Watts•Interface – Gbit Ethernet, camera link, LVDS, 422, USB, FireWire•Image Formats – 4 Mb, 1080p, 720p, 480p, NTSC, RS-170, 1600 x 1200 IR, 360 HD-Visible and 640 IR, Stereo CapableCompleted•Software Decoder (H.264), Hardwar Encoder (H.264), IMU/GPS Interface, Imaging System, Targeting System InterfaceDemonstrated Imagery (meta-data format; multiple streams (801.16), Trigger and Sync, Video-teleconferencing through the payload

FPGAs exploit parallelism to reach higher increased performance (sample rates, pixel or frame rates) with limited SWAP

FPGA processing power can be combined and redistributed in real-time to a particular sensor (s)

FPGA-based payload interfaces combined with a hardware Open Architecture approach can provide reconfigurable software interfaces and physical interconnections.

One system can accommodate any combination of interfaces: USB, Gigabit Ethernet, RS432, IR, wireless, FireWire, etc.

The SAME Reconfigurable Context Neutral Payload Interface can be reused to accommodate many different unmanned vehicles, ground stations -- each containing various sensors, cameras, radar systems, acoustics, LCD displays, GPS systems, etc. utilizing high-bandwidth connections to the interface.

Page 33: High Performance Computing Infrastructure: Past, Present, and Future

33

Page 34: High Performance Computing Infrastructure: Past, Present, and Future

34

Opening a Dialogue with Others

• Graduate Student Support– One way for us to work together with others is via graduate

students. – These students can bridge the gap between other disciplines and

computer engineering.• Joint Proposals

– One way for us to work together is to author joint proposals. – Or alternatively, we can be supported under current funding.

However, we would be willing to work with others even if there is no current support for our work. As long as there is potential for future support.

• Implementation of a small portion of a models to demonstrate potential speedup.

• There is a potential to publish results of this experiment in journals of other disciplines as well as in engineering journals.


Top Related