gpu systems

GPU SystemsAdvanced Clustering’s offerings for GPGPU computing

advanced clustering technologieswww.advancedclustering.com • 866.802.8222

what is GPU computing

• The use of a GPU (graphics processing unit) to do general purpose scientific and engineering computing

• Model is to use a CPU and GPU together in a heterogenous computing model

• CPU is used to run sequential portions of application

• Offload parallel computation onto the GPU

2

history of GPUs

• GPUs designed with fixed function pipelines for real-time 3D graphics

• As complexity of GPU increased they were designed to be more programable to easily implement new features

• Scientists and engineers discovered that the originally purpose built GPUs could also be re-programmed for General Purpose computing on a GPU (GPGPU)

3

history of GPUs - continued

• The nature of 3D graphics meant GPUs have very fast floating-point units, which are also great for scientific codes

• Originally very difficult to program, GPU vendors have realized another market for their products and developed specially designed GPUs and programming environments for scientific computing

• Most prominent is NVIDIA Tesla GPU and their CUDA programming environment

4

GPUs vs. CPUs

5

Quad-core CPU

240 Core Tesla GPU•Traditional x86 CPUs are available today with 4 cores: 6, 8, 12 core in the future

•NVIDIA’s Tesla GPU is shipping with 240 cores

GPUs vs. CPUs - continued

6

why use GPUs?

• Massively parallel design: 240 cores per GPU

• Nearly 1 teraflop of single precision floating-point performance

• Designed as an accelerator card to add into your existing system - does not replace your current CPU

• Maximum of 4GB of fast dedicated RAM per GPU

• If your code is highly parallel it’s worth investigating

7

why not use GPUs?

• Fixed RAM sizes on GPU - not upgradable or configurable

• Large power requirements of 188W

• Still requires a host server and CPU to operate

• Specialized development tools required, does not run standard x86 code

• Current development tools are specific to NVIDIA cards - no support for other manufacturer’s GPUs

• Your code maybe difficult to parallelize

8

developing for GPUs

• Current development model: CUDA parallel environment

• The CUDA parallel programming model guides programmers to partition the problem into coarse sub-problems that can be solved independently in parallel.

• Fine grain parallelism in the sub-problems is then expressed such that each sub-problem can be solved cooperatively in parallel.

• Currently an extension for the C programming language - other languages in development

9

NVIDIA GPUs

• All of NVIDIA’s recent GPUs support CUDA development

• Tesla cards designed exclusively for CUDA and GPGPU code (no graphics support)

• GeForce cards designed for graphics can be used for CUDA code as well

• Usually slower, less cores, or less RAM - but a great way to get started at low price points

• Development and testing can be done on almost any standard GeForce GPU and run on a Tesla system

10

GeForce vs. Tesla

11

GPU future

• More products coming: AMD Stream processor line of products, similar to NVIDIA’s Tesla

• Standard, portable programming via OpenCL

• OpenCL (Open Computing Language) is the first open, royalty-free standard for general-purpose parallel programming. Create portable code for a diverse mix of multi-core CPUs, GPUs, Cell-type architectures and other parallel processors such as DSPs.

• More info: http://www.khronos.org/opencl/

12

building GPU systems

• Building systems to house GPUs can be difficult:

• Requires lots of engineering and design work to be able to be able to power and cool them correctly

• GPUs were originally designed for visualization and gaming; size and form-factor were not as important

• When used for computation data-center space is limited and expensive - need to find a way to implement GPUs in existing infrastructure

13

traditional GPU servers

14

•Large tower style cases

•Rackmount servers 4U or larger

•Either choice is not an efficient use of limited data center spaceText

GPUs are large

15

1.5” deep

10.5” long

4.6” tallThe size of the GPU has

limited it’s application

GPUs are power hungry

16

=•GPU Cards can use a lot of power - as much as 270W

•Lots of power equals lots of heat

•Difficult to put into a small space and cool effectively

GPU system options

17

Advanced Clustering has two solutions to the power, heat, and density problems:

NVIDIA’s Tesla S1070

Advanced Clustering’s 15XGPU nodes

NVIDIA’s tesla S1070

• The S1070 is an external 1U box that contains 4x Tesla C1060 GPUs

• The S1070 must be connected to one or two host servers to operate

• S1070 has one power supply and dedicated cooling for the 4x GPUs

• Only available with the C1060 GPU cards pre-installed

18

tesla S1070 - front view

19

tesla S1070 - rear view

20

tesla S1070 - inside view

21

host interface cards (HIC)

22

• The Host Interface Card (HIC) connects Tesla S1070 to Server

• Every S1070 requires 2 HICs

• Each HIC bridges the server to two of the four GPUs inside of the S1070

• HICs can be installed in 2 separate servers, or 1 server

• HICs are available in PCI-e 8x and 16x widths

tesla S1070 block diagram

23

Cables to HICs in Host System(s)

Tesla S1070

connecting S1070 to 2 servers

24

Tesla S1070

Server#1

Server#2

Most servers do not have enough PCI-e bandwidth, so S1070 is designed to allow connecting to 2 separate machines.

connecting S1070 to 1 server

25

Tesla S1070

ServerIf the server has enough PCI-e lanes and expansion slots one Tesla S1070 can be connected to one server.

example cluster of S1070s

26

HIC #1

HIC #2

HIC #1

HIC #2

HIC #1

HIC #2

HIC #1

HIC #2

HIC #1

HIC #2

• 10x 1U compute nodes with 2x CPUs each

• 5 Tesla S1070 with 4x GPUs each

• Balanced system of 20 CPUs and 20 GPUs

• All in 15U of rack space

S1070s pros and cons

•Pros

• External enclosure to hold GPUs doesn’t require a special server design to hold the GPUs

• Easy to add GPUs to any existing system

• 4 GPUs in only 1U of space

• Multiple HIC card configurations including PCI-e 8x or 16x

• Thermally tested and validated by NVIDIA

•Cons

• Two GPUs share one PCI-e slot in the host server limiting bandwidth to the GPU card

• Most 1U servers only have 1x PCI-e expansion slot which is occupied by the HIC - this limits ability to use interconnects like InfiniBand or 10 Gigabit Ethernet

• Limited configuration options, only Tesla cards, no GeForce or Quadro options

27

S1070 - specifications

28

advanced clustering GPU nodes

• The 15XGPU line of systems is a complete two processor server and GPU in 1U

• Server fully configured with latest quad-core Intel Xeon processors, RAM, hard drives, optical, networking, InfiniBand and GPU card

• Flexible to support various GPUs, including:

• Tesla C1060 card

• GeForce series

• Quadro series

29

GPU node - front

30

GPU node - rear

31

GPU node - inside

32

GPU node - block diagram

33

Advanced Clustering 15XGPU

node

Simplified design, host server completely integrated with GPU no external components

to connect to.

example cluster of GPU nodes

34

• 15x 1U compute nodes

• 2x CPUs each

• 1x GPU integrated in each node

• Entire system contains 30x CPUs and 15x GPUs

• All in 15U of rack space

GPU nodes - thermals

35

•System carefully engineered to ensure all components will fit in the small form factor

•Detailed modeling and testing to make sure the system components (CPU and memory) and the GPU are adequately cooled

GPU nodes pros and cons

•Pros

• Entire server and GPU all enclosed in a 1U package

• Flexibility in GPU choice: Tesla, GeForce, and Quadro supported

• Full PCI-e bandwidth to GPU

• Full-featured server with the latest quad-core Intel Xeon CPUs

• Can be used for more than computation, use the GPU for video output as well

•Cons

• Only 1x GPU per server

• Requires purchase of new servers, not an upgrade or add-on

• Not as dense of a solution as S1070 for 4x GPUs

36

GPU nodes

• The GPU node concept is unique to Advanced Clustering

• Only vendor shipping a 1U with integrated Tesla or high-end GeForce / Quadro card

• Available for order as the 1X5GPU2

• Dual Quad-Core Intel Xeon 5500 series processors

• Choice of GPU

37

• Processor

• Two Intel Xeon 5500 Series processors

• Next generation "Nehalem" microarchitecture

• Integrated memory controller and 2x QPI chipset interconnects per processor

• 45nm process technology

• Chipset

• Intel 5500 I/O controller hub

• Memory

• 800MHz, 1066MHz, or 1333MHz DDR3 memory

• Twelve DIMM sockets for support up to 144GB of memory

• GPU

• PCI-e 2.0 16x double height expansion slot for GPU

• Multiple options: Tesla, GeForce, or Quadro cards

• Storage

• Two 3.5" SATA2 drive bay

• Support RAID level 0-1 with Linux software RAID (with 2.5" drives)

• DVD+RW slim-line optical drive

• Management

• Integrated IPMI 2.0 module

• Integrated management controller providing iKVM and remote disk emulation.

• Dedicated RJ45 LAN for management network

• I/O connections

• Two independent 10/100/1000Base-T (Gigabit) RJ-45 Ethernet interfaces

• Two USB 2.0 ports

• One DB-9 serial port (RS-232)

• One VGA port

• Optional ConnectX DDR or QDR InfiniBand connector

• Electrical Requirements

• High-efficiency power supply (greater than 80%)

• Output Power: 560W

• Universal input voltage 100V to 240V

• Frequency: 50Hz to 60Hz, single phase

15XGPU2 - specifications

38

availability

• Both the Tesla S1070 and 15XGPU GPU nodes are available and shipping now

• For price and custom configuration contact your Account Representative

• (866) 802-8222

• [email protected]

• http://www.advancedclustering.com/go/gpu

39

gpu systems

Documents

history of gpus gpus

designed gpus

use gpus

tesla c1060 gpus

nvidia tesla gpu

nvidia gpus

gpu gpgpu

manufacturers gpus