high-performance computing and networking

April 2005

High-Performance Computing and Networking

Overview of Research Challenges: from Embedded Systems to Supercomputers

Alan D. George, Ph.D.

Professor of Electrical and Computer Engineering

Founder and Director, HCS Research Laboratory

Funding Agencies

April 2005 2

Outline

Research emphases Research groups and activities

Reconfigurable computing Advanced space systems and applications HPC applications and services High-performance networking Advanced simulation and modeling HPC performance analysis and optimization

Research facilities Conclusions Appendix – HPC Initiative at UF

April 2005 3

Primary Research Areas high-performance computer architectures high-performance computer architectures high-performance computer networks high-performance computer networks reconfigurable & fault-tolerant computingreconfigurable & fault-tolerant computing parallel and distributed computingparallel and distributed computing

Research Methods modeling and simulationmodeling and simulation experimental testbed researchexperimental testbed research software design & developmentsoftware design & development hardware design & developmenthardware design & development

modeling and

simulation

experimental

testbed

architectures networks

syst

em

softw

are

appl

icat

ions

Research Emphases

April 2005 4

Reconfigurable Computing Exciting new technology for HPC

Enabled by new and emerging FPGA technologies (e.g. from Xilinx)

High concurrency potential with RC Dynamic & partial hardware reconfiguration Best of ASIC and CPU worlds Multiparadigm computing for HPC/HPEC

Many research challenges for RC From hardware structures to middleware,

application mapping, & system mgmt. Offers potentially ideal solutions for many

apps in both GP and embedded HPC Many activities in our RC group

Comprehensive Approach to Reconfigurable Management Architecture (CARMA)

Application mapping from HLL to hardware RC middleware and API structures Dynamic and partial reconfigurable structures Application profiling & analysis as RC target Computational kernel core libraries RC/HPC resource monitoring for diagnosis,

debug, and performance optimization RC clusters and grids; network-attached RC RC applications and benchmarks

Applications

RC ClusterManagement

DataNetwork

Algorithm Mapping

PerformanceMonitoring

MiddlewareAPI

UserInterface

COTSProcessor

RC FabricAPI

RC Fabric

RC Node

To OtherNodes

ControlNetwork

From Embedded Systems From Embedded Systems to Supercomputersto Supercomputers

From Embedded Systems From Embedded Systems to Supercomputersto Supercomputers

Performance

Fle

xibi

lity

General-PurposeProcessors

ASICs

Special-Purpose Processors

(e.g. DSPs, NPs)

ReconfigurableComputing

(e.g. FPGAs)

April 2005 5

Advanced Space Systems & Apps Broad range of activities

HPEC parallel applications Multiprocessor architectures Reconfigurable computing Fault-tolerant computing System interconnects High-speed communications

Testbed experiments and analyses Virtual and actual prototypes

PulseCompression

DopplerProcessing

Space-TimeAdaptive

Processing(STAP)

ConstantFalse Alarm

Rate(CFAR)

Receive Cube

Send Results

Corner Turn Partitioned along range dimension

Partitioned along pulse dimension

Incoming Data Cube

to PE 1

to PE 2

...toPEn

time

1 CPI

PE#4

PE#3

PE#2

PE#1

PE#4

PE#3

PE#2

PE#1

PE#4

PE#3

PE#2

PE#1

PE#4

PE#3

PE#2

PE#1

PC DP STAP CFAR

7-Board System

4-Switch Non-blocking Backplane

Backplane-to-Board 0, 1, 2, 3 Connections

Backplane-to-Board 4, 5, 6, and Data Source/GM Connections

GMTI: System Scalability

150

250

350

450

550

650

750

32000 40000 48000 56000 64000 72000 80000

Number of Ranges

Av

era

ge

La

ten

cy

(m

s)

5-Board System

6-Board System

Baseline 7-Board System

Double-BufferedDeadline

DATA CUBE

Ranges

Pu

lse

s

DATA CUBE

Ranges

Pu

lse

s

April 2005 6

HPC Applications & Services

Variety of interests and applications for HPC & HPEC Computational acoustics and signal processing (ONR) Computational biomechanics; bioinformatics (w/ MAE for NIH) Space-based radar (Honeywell); cryptology (DOD) Exploring additional critical applications for HPC, RC, etc.

Number of CPUs Number of CPUs

April 2005 7

High-Performance Networking Research on HPNs for

general-purpose HPC InfiniBand, 10GigE, SCI,

Myrinet, Quadrics, etc. Experimental testbeds

Research on HPNs for embedded HPC systems RapidIO, WDM LANs

Many critical issues Protocols and middleware Multilevel performance traits Scalability and reliability Quality of service From physical to transport

April 2005 8

Advanced Simulation & Modeling FASE Project

Fast and accurate simulation environment; balanced model

HPC application profiling and scripting for speed with fidelity

Architecture, network, subsystem, & system models

HWIL and distributed simulations

Rapid Virtual Prototyping HPC & HPEC systems Advanced space systems Avionics & aerospace networks Reconfigurable IPv6 routers Mission assurance systems

Heterogeneouscluster

Parallelprogram

Simulation Results

Script Generator

Single computerrunning

MLDesigner

Trace Generator/Performance Statistics

Generator (MPE)

ProcessingRaw Data in

HeterogeneousSatellite System

April 2005 9

Performance Optimization

GAS program models UPC, SHMEM, etc. Increasing emphasis at

DOD, DOE, etc.

Multilevel optimization From hardware to apps Multilevel performance

monitoring and profiling Fusion and integration of

metrics for HPC designers

Diverse target platforms Support leading and

emerging HPC systems

April 2005 10

Computational research facilitiesComputational research facilities• Grid of 11 Intel/AMD Linux clustersGrid of 11 Intel/AMD Linux clusters

• New (12New (12thth) cluster planned for Su’05) cluster planned for Su’05• 480 Pentium-compatible CPUs480 Pentium-compatible CPUs

Opteron, Xeon, P4, P3, etc.Opteron, Xeon, P4, P3, etc.• 308 networked nodes, PCI to PCI-X308 networked nodes, PCI to PCI-X• 102 GB memory, 5.2 TB storage102 GB memory, 5.2 TB storage• Also AlphaServer & Sun clustersAlso AlphaServer & Sun clusters• Reconfigurable computing (RC) serversReconfigurable computing (RC) servers

More details in following slidesMore details in following slides

Networking research facilitiesNetworking research facilities• 10 Gb/s InfiniBand (4X) and 10GigE10 Gb/s InfiniBand (4X) and 10GigE• 5.3 Gb/s Scalable Coherent Interface5.3 Gb/s Scalable Coherent Interface• 3.2 Gb/s Quadrics QsNet3.2 Gb/s Quadrics QsNet• 1.28 Gb/s Myrinet1.28 Gb/s Myrinet• 1.25 Gb/s Cluster LAN (cLAN)1.25 Gb/s Cluster LAN (cLAN)• 1.0 Gb/s Gigabit Ethernet1.0 Gb/s Gigabit Ethernet

Lab Research Facilities

April 2005 11

~100 nodes

Primary RC Facilities in Lab

RC1000Dual XeonSingle XCV2000E

Content PacketProcessorDual XeonDual XC2V1000

ADM-XRCPentium2Single XCV1000E

BenNUEYBenBLUE-IIDual XeonTriple XC2V6000

Reconfigurable Application-Specific Computer (RASC)Altix350, Single XC2V6000

Software:• Xilinx ISE• ChipScope• System Generator• Synplify Pro• Handel-C (Celoxica SDK)• Impulse-C• Mitrion-C• Genus-C• Active-HDL

(under construction)

Dual Xeon, Dual Opteron, Single Athlon, or Single P3Single XC2VP30

April 2005 12

Embedded and Custom RC Facilities

Twelve HW-AFX-BG560 Two HW-AFX-FF1152

Xilinx Development BoardsXilinx Development Boards

UF-developed NARC:Network-Attached RC

(ARM AT91RM9200, Ethernet, Xilinx FPGA)

Eight XCV812EEight XCV1000EEight XCV2000E

Two XC2VP20

Xilinx FPGAs for PrototypingXilinx FPGAs for Prototyping

Field-Programmable Object Array

(FPOA)

(coming soon)

Total RC Resources(aggregate of all cards)

6,594,624 slices192 embedded PowerPCs

Total RC Resources(aggregate of all cards)

6,594,624 slices192 embedded PowerPCs

April 2005 13

CHREC – Proposed New Center Proposed new center via NSF I/UCRC Program

Center for High-Performance Reconfigurable and Embedded Center for High-Performance Reconfigurable and Embedded Computing (CHREC) – pronounced “Shreck” Computing (CHREC) – pronounced “Shreck”

I/UCRC = Industry/University Cooperative Research Center Focus on both HPC and HPEC sides of RC research

Receiving much interest from variety of potential members in industry (e.g. Honeywell, Boeing, Smiths Aerospace, SGI, Xilinx, Cray), in government (e.g. NSA, AFRL), national labs (e.g. ORNL), academia

Steps Toward Goal Letter of intent submitted to NSF in Dec’04, approved in Jan’05 Next step is planning grant proposal for center, due late Sep’05 Will be requesting strong letters of support and encouragement for this

center in July/August (details to follow) from key industry members

April 2005 14

Conclusions Wide range of research expertise in architectures, networks, services,

systems, and applications in HPC Focus on high-performance parallel, distributed, and reconfigurable

computing and communications for critical applications From embedded systems to clusters, supercomputers, and grids

Focal points Goal of “high performance” in terms of execution time, throughput, latency,

quality of service, dependability, etc. Research challenges in computer engineering in terms of both general-

purpose and embedded HPC systems and applications Close collaboration with emerging application domains of HPC

Both simulative and experimental expertise to achieve distinct and interdependent goals requiring both basic and applied research

World-class experimental and simulative research facilities in an academic setting

April 2005

High-Performance Computing (HPC) Initiative at UF

Applications and Infrastructure Research with Advanced HPC Technologies

A. George, Chair, University HPC Committee

April 2005 16

HPC Center and Grid

Phase I(#221 on Nov’04 Top500

list of most powerful systems on earth)

Notes:

• Phase I is a cluster of 200 dual-Xeon servers with 32TB of storage

• Phase II now under development (approx. twice size of Phase I)

• All phases and FLR supported by new campus research network of dual 10GigE funded by NSF/MRI grant

April 2005 17

NSF-funded Research Network

College of Liberal Arts and Sciences College of Engineering

HPC CenterPhase II

CISELab

ACISLab

HCSLab

QTPSlater Lab

PhysicsiVDGL Lab

10GE (10 Gb/s)

GE (1 Gb/s)

200

264

2

10GEstorage server

HEWLETTPACKARD

48

48

48

TributarySwitch

TributarySwitch

48

36

HEWLETTPACKARD

10GEstorage server

10GEstorage server

10GEstorage server

HPC CenterPhase I

2 2

ToFLRand

NLR

10GE Switch(Core)

10GESwitch(Site 1)

10GE Switch(Site 2)

HEWLETTPACKARD

HEWLETTPACKARD

10GESwitch(Site 3)

UF

FloridaLambda

Rail(FLR)

NationalLambda Rail

(via Jacksonville)

high-performance computing and networking

Documents

applicationshpc applications

embedded hpc systemsrapidio

rcrc applications

rc groupcomprehensive

computer engineeringfounder

embedded hpcmany activities

highperformance computing

rcfrom hardware structures