the new era of coprocessor in supercomputing - 并行计算中协处理应用的新时代-

20
© Supermicro 2013 The New Era of Coprocessor in Supercomputing - 并并并并并并并并并并并并并并5/07/2013 @ BAH! Oil & Gas - Rio de Janeiro, Braz Marc XAB, M.A. - 桜桜桜桜桜桜桜桜 Country Manager Super Micro Computer Inc. Rua Funchal, 418. Sao Paulo – SP www.supermicro.com/brazil

Upload: feleti

Post on 23-Feb-2016

122 views

Category:

Documents


0 download

DESCRIPTION

The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-. Marc XAB, M.A. - 桜美林大学大学院 Country Manager. 5/07/2013 @ BAH! Oil & Gas - Rio de Janeiro, Brazil. Super Micro Computer Inc. Rua Funchal , 418. Sao Paulo – SP www.supermicro.com/brazil. Networking in Rio. Company Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

© Supermicro 2013

The New Era of Coprocessor in Supercomputing -

并行计算中协处理应用的新时代-

5/07/2013 @ BAH! Oil & Gas - Rio de Janeiro, Brazil

Marc XAB, M.A. - 桜美林大学大学院Country Manager

Super Micro Computer Inc.Rua Funchal, 418. Sao Paulo – SP

www.supermicro.com/brazil

Page 2: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

Networking in Rio

Page 3: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

Company Overview

Fremont Facility

Revenues: FY10 $721 M FY11 $942 M FY12 $1BGlobal Footprint: >70 Countries, 700 customers, 6800 SKUsProduction: US, EU and Asia Production facilities Engineering: 70% of workforce in engineering, SSI Member

Market Share: #1 Server Channel Corporate Focus: Leader Energy Efficient, HPC & Application-Optimized Systems

San Jose (Headquarter)

Fortune 2012 100 Fastest-Growing Companies

Page 4: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

COPROCESSOR (协处理器 ) A coprocessor is a computer processor used to supplement

the functions of the primary processor (the CPU).

Operations performed by the coprocessor may be floating point arithmetic, graphics, signal processing, string processing, encryption or I/O Interfacing with peripheral devices. Math coprocessor – a computer chip that handles the floating

point operations and mathematical computations in a computer.

Graphics Processing Unit (GPU) – a separate card that handles graphics rendering and can improve performance in graphics intensive applications, like games.

Secure crypto-processor – a dedicated computer on a chip or microprocessor for carrying out cryptographic operations, embedded in a packaging with multiple physical security measures, which give it a degree of tamper resistance

Network coprocessor. 网络协处理器 .

……..

Page 5: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

Green500 Rank MFLOPS/W Site* Computer* Total Power (kW)

1 2,499.44

National Institute for Computational Sciences/University of Tennessee

Beacon - Appro GreenBlade GB824M, Xeon E5-2670 8C 2.600GHz, Infiniband FDR, Intel Xeon Phi 5110P

44.89

2 2,351.10 King Abdulaziz City for Science and Technology

SANAM - Adtech ESC4000/FDR G2, Xeon E5-2650 8C 2.000GHz, Infiniband FDR, AMD FirePro S10000

179.15

3 2,142.77 DOE/SC/Oak Ridge National Laboratory

Titan - Cray XK7 , Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x

8,209.00

4 2,121.71 Swiss Scientific Computing Center (CSCS)

Todi - Cray XK7 , Opteron 6272 16C 2.100GHz, Cray Gemini interconnect, NVIDIA Tesla K20 Kepler

129.00

5 2,102.12 Forschungszentrum Juelich (FZJ)

JUQUEEN - BlueGene/Q, Power BQC 16C 1.600GHz, Custom Interconnect 1,970.00

The Trend Indicated on Green500

Page 6: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

“Submerged Supermicro Servers Accelerated by GPUs”

Supermicro 1U (Single CPU) with two coprocessors No requirement for room-level cooling Operates at PUE ~ 1.12 25 kilowatts per rack – the breakpoint per rack

(between regular air-cool and submerged cool)

Case Study – Submerged Liquid Cooling

Cost Efficiency

Air cool

Submerged liquid cool

KW / rack

~25kW

Removed Fans and Heat Sinks Use SSD & Updated BIOS Reverse the handlers

Page 7: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

Tesla: 2-3x Faster Every 2 Years16

2

4

6

8

10

12

14

DP

GFL

OPS

per

Wat

t

2008

2010

2012

2014

T10 Fermi

Kepler

Maxwell

512 cores

Thousands of core

Page 8: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

GPU Supercomputer Momentum

0

10

20

30

40

50

60

Tesla Fermi Launched

2008 2009 2010 2011 2012 2013

June 2012 Top500

# of GPU Accelerated Systems on Top500 52

First Double Precision GPU

4x

Page 9: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

Case Study – PNNL

Expects supercomputer to rank in world's top 20 fastest machines.

Research for climate and environmental science, chemical processes, biology-based fuels that can replace fossil fuels, new materials for energy applications, etc.

Supermicro FatTwin™with 2x MIC 5110P per node

Page 10: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

Theoretical peak processing speed of

3.4 petaflops

42 racks / 195,840 cores

1440 compute nodes with conventional

processors and Intel Xeon Phi "MIC"

accelerators

128 GB memory per node

FDR Infiniband network

2.7 petabyte shared parallel file system

(60 gigabytes per second read/write)

Case Study – PNNL

Supermicro FatTwin™with 2x MIC 5110P per node

Page 11: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

Programing Paradigm

The Xeon Phi programming model and its optimization are shared across the Intel Xeon

CUDA (Compute Unified Device Architecture) – a parallel computing platform and programming model. CUDA provides developers access to the virtual instruction set and memory of the parallel computational elements in CUDA GPUs.

Made Easier

Don’t Complicated

Page 12: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

Keynotes

This is a new era of hybrid computing – heterogeneous architecture with PCI-E based coprocessor

Specialized (or application-optimized) design is required for GPU/MIC applications and HPC future scalability

There are more to come in the industry roadmap with new technologies, power management and system architecture

Configurable cooling & power for energy efficiency and performance are more and more critical

The trend towards heterogeneous architecture poses many challenges for system builder and software developers in making efficient use of the resources

Programming paradigm and its investment are important as a part of the selecting consideration

Page 13: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

•Options pricing•Risk analysis•Algorithmic trading

•Medical imaging•Visualization & docking•Filmmaking & animation

•Computational fluid dynamics•Materials science•Molecular dynamics•Quantum chemistry

•Mechanical design & simulation•Structural mechanics•Electronic Design Automation

•Data parallel mathematics

•Extend Excel with OLAP for planning & analysis

•Database and data analysis acceleration

Computational Finance

Imaging and Computer Vision

•Weather•Atmospheric•Ocean Modeling•Space Sciences

Weather and Climate

Simulation & Creation DesignScientific

•Seismic imaging •Seismic Interpretation•Reservoir Modeling•Seismic Inversion

Oil and Gas/Seismic

Data Mining

Massively parallel architecture accelerates Scientific & Engineering Applications

HPC Coprocessor Applications

Page 14: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

Telsa S1070

PCI-E x16

1U Twin™The most

powerful PSC

The fastest 1U serverin the world

1U 4-GPU Standalone box

2U GPU w/ QDR IB onboard

2U Twin

2U 4-GPU

1U 3-GPU

7U GPU Blades20 CPUs + 20 GPUs

X9 (DP) 1U 4-GPU/MIC X9 2U 6-GPU/MIC

X9 (UP) 1U 2-GPU/MIC

NVIDIA Kepler & Intel Xeon Phi supports

Hybrid Computing

FatTwin™ 2-node8 GPUs or MICs per node

FatTwin™ 4-node3 GPUs or MICs per node

Ultra HighEfficiency

2008 2009 2010 2011 2012 2013

4 GPUs or MICsWorkstation / 4U

HybridComputing PioneerGPGPU

Where it started…

EfficiencyDensity

Mainstream

Page 15: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

Communication Between Coprocessors

IBIB

IB Switch

The model used by existing CPU-GPU Heterogeneous architectures for GPU-GPU communication. Data travels via CPU & Infiniband (IB) Host Channel Adapter (HCA) and Switch or other proprietary interconnect

Data transfer between cooperating GPUs in separate nodes in a TCA cluster enabled by the PEACH2 chip.

Schematic of the PEARL network within a CPU/GPU cluster

Implementation Example

Source: Tsukuba University

Page 16: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

Designing GPU/MIC Optimized Systems

Performance PCI-e lanes arrangement, PCB placement,

interconnect Mechanical design

mounting, location, space utilization Thermal

air flow, fan speed control, location, noise control

Power support PSU efficiency, wattage options,

power management Number of power connectors (& location)

Page 17: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

Summary Coprocessor and Applications Performance and Efficiency Top500 & Green500 Hybrid Computing & HPC GPU/MIC Optimized Systems Design Considerations

Performance Mechanical Design Thermal & Cooling Power Support

Page 18: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

Thank You!

Marc [email protected]

Page 19: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

Conference Puzzle

How do you put an ELEPHANT in a Refrigerator ?

Page 20: The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

Conference Puzzle