gpu systems
DESCRIPTION
Our unique 1U GPU servers allow you to use the latest GPUs (Tesla, GTX285, Quadro FX5800) for visualization or offloading processing in a small form factor. These are built on Intel\'s latest Nehalem processors.TRANSCRIPT
GPU SystemsAdvanced Clustering’s offerings for GPGPU computing
advanced clustering technologieswww.advancedclustering.com • 866.802.8222
what is GPU computing
• The use of a GPU (graphics processing unit) to do general purpose scientific and engineering computing
• Model is to use a CPU and GPU together in a heterogenous computing model
• CPU is used to run sequential portions of application
• Offload parallel computation onto the GPU
2
history of GPUs
• GPUs designed with fixed function pipelines for real-time 3D graphics
• As complexity of GPU increased they were designed to be more programable to easily implement new features
• Scientists and engineers discovered that the originally purpose built GPUs could also be re-programmed for General Purpose computing on a GPU (GPGPU)
3
history of GPUs - continued
• The nature of 3D graphics meant GPUs have very fast floating-point units, which are also great for scientific codes
• Originally very difficult to program, GPU vendors have realized another market for their products and developed specially designed GPUs and programming environments for scientific computing
• Most prominent is NVIDIA Tesla GPU and their CUDA programming environment
4
GPUs vs. CPUs
5
Quad-core CPU
240 Core Tesla GPU•Traditional x86 CPUs are available today with 4 cores: 6, 8, 12 core in the future
•NVIDIA’s Tesla GPU is shipping with 240 cores
GPUs vs. CPUs - continued
6
why use GPUs?
• Massively parallel design: 240 cores per GPU
• Nearly 1 teraflop of single precision floating-point performance
• Designed as an accelerator card to add into your existing system - does not replace your current CPU
• Maximum of 4GB of fast dedicated RAM per GPU
• If your code is highly parallel it’s worth investigating
7
why not use GPUs?
• Fixed RAM sizes on GPU - not upgradable or configurable
• Large power requirements of 188W
• Still requires a host server and CPU to operate
• Specialized development tools required, does not run standard x86 code
• Current development tools are specific to NVIDIA cards - no support for other manufacturer’s GPUs
• Your code maybe difficult to parallelize
8
developing for GPUs
• Current development model: CUDA parallel environment
• The CUDA parallel programming model guides programmers to partition the problem into coarse sub-problems that can be solved independently in parallel.
• Fine grain parallelism in the sub-problems is then expressed such that each sub-problem can be solved cooperatively in parallel.
• Currently an extension for the C programming language - other languages in development
9
NVIDIA GPUs
• All of NVIDIA’s recent GPUs support CUDA development
• Tesla cards designed exclusively for CUDA and GPGPU code (no graphics support)
• GeForce cards designed for graphics can be used for CUDA code as well
• Usually slower, less cores, or less RAM - but a great way to get started at low price points
• Development and testing can be done on almost any standard GeForce GPU and run on a Tesla system
10
GeForce vs. Tesla
11
GPU future
• More products coming: AMD Stream processor line of products, similar to NVIDIA’s Tesla
• Standard, portable programming via OpenCL
• OpenCL (Open Computing Language) is the first open, royalty-free standard for general-purpose parallel programming. Create portable code for a diverse mix of multi-core CPUs, GPUs, Cell-type architectures and other parallel processors such as DSPs.
• More info: http://www.khronos.org/opencl/
12
building GPU systems
• Building systems to house GPUs can be difficult:
• Requires lots of engineering and design work to be able to be able to power and cool them correctly
• GPUs were originally designed for visualization and gaming; size and form-factor were not as important
• When used for computation data-center space is limited and expensive - need to find a way to implement GPUs in existing infrastructure
13
traditional GPU servers
14
•Large tower style cases
•Rackmount servers 4U or larger
•Either choice is not an efficient use of limited data center spaceText
GPUs are large
15
1.5” deep
10.5” long
4.6” tallThe size of the GPU has
limited it’s application
GPUs are power hungry
16
=•GPU Cards can use a lot of power - as much as 270W
•Lots of power equals lots of heat
•Difficult to put into a small space and cool effectively
GPU system options
17
Advanced Clustering has two solutions to the power, heat, and density problems:
NVIDIA’s Tesla S1070
Advanced Clustering’s 15XGPU nodes
NVIDIA’s tesla S1070
• The S1070 is an external 1U box that contains 4x Tesla C1060 GPUs
• The S1070 must be connected to one or two host servers to operate
• S1070 has one power supply and dedicated cooling for the 4x GPUs
• Only available with the C1060 GPU cards pre-installed
18
tesla S1070 - front view
19
tesla S1070 - rear view
20
tesla S1070 - inside view
21
host interface cards (HIC)
22
• The Host Interface Card (HIC) connects Tesla S1070 to Server
• Every S1070 requires 2 HICs
• Each HIC bridges the server to two of the four GPUs inside of the S1070
• HICs can be installed in 2 separate servers, or 1 server
• HICs are available in PCI-e 8x and 16x widths
tesla S1070 block diagram
23
Cables to HICs in Host System(s)
Tesla S1070
connecting S1070 to 2 servers
24
Tesla S1070
Server#1
Server#2
Most servers do not have enough PCI-e bandwidth, so S1070 is designed to allow connecting to 2 separate machines.
connecting S1070 to 1 server
25
Tesla S1070
ServerIf the server has enough PCI-e lanes and expansion slots one Tesla S1070 can be connected to one server.
example cluster of S1070s
26
HIC #1
HIC #2
HIC #1
HIC #2
HIC #1
HIC #2
HIC #1
HIC #2
HIC #1
HIC #2
• 10x 1U compute nodes with 2x CPUs each
• 5 Tesla S1070 with 4x GPUs each
• Balanced system of 20 CPUs and 20 GPUs
• All in 15U of rack space
S1070s pros and cons
•Pros
• External enclosure to hold GPUs doesn’t require a special server design to hold the GPUs
• Easy to add GPUs to any existing system
• 4 GPUs in only 1U of space
• Multiple HIC card configurations including PCI-e 8x or 16x
• Thermally tested and validated by NVIDIA
•Cons
• Two GPUs share one PCI-e slot in the host server limiting bandwidth to the GPU card
• Most 1U servers only have 1x PCI-e expansion slot which is occupied by the HIC - this limits ability to use interconnects like InfiniBand or 10 Gigabit Ethernet
• Limited configuration options, only Tesla cards, no GeForce or Quadro options
27
S1070 - specifications
28
advanced clustering GPU nodes
• The 15XGPU line of systems is a complete two processor server and GPU in 1U
• Server fully configured with latest quad-core Intel Xeon processors, RAM, hard drives, optical, networking, InfiniBand and GPU card
• Flexible to support various GPUs, including:
• Tesla C1060 card
• GeForce series
• Quadro series
29
GPU node - front
30
GPU node - rear
31
GPU node - inside
32
GPU node - block diagram
33
Advanced Clustering 15XGPU
node
Simplified design, host server completely integrated with GPU no external components
to connect to.
example cluster of GPU nodes
34
• 15x 1U compute nodes
• 2x CPUs each
• 1x GPU integrated in each node
• Entire system contains 30x CPUs and 15x GPUs
• All in 15U of rack space
GPU nodes - thermals
35
•System carefully engineered to ensure all components will fit in the small form factor
•Detailed modeling and testing to make sure the system components (CPU and memory) and the GPU are adequately cooled
GPU nodes pros and cons
•Pros
• Entire server and GPU all enclosed in a 1U package
• Flexibility in GPU choice: Tesla, GeForce, and Quadro supported
• Full PCI-e bandwidth to GPU
• Full-featured server with the latest quad-core Intel Xeon CPUs
• Can be used for more than computation, use the GPU for video output as well
•Cons
• Only 1x GPU per server
• Requires purchase of new servers, not an upgrade or add-on
• Not as dense of a solution as S1070 for 4x GPUs
36
GPU nodes
• The GPU node concept is unique to Advanced Clustering
• Only vendor shipping a 1U with integrated Tesla or high-end GeForce / Quadro card
• Available for order as the 1X5GPU2
• Dual Quad-Core Intel Xeon 5500 series processors
• Choice of GPU
37
• Processor
• Two Intel Xeon 5500 Series processors
• Next generation "Nehalem" microarchitecture
• Integrated memory controller and 2x QPI chipset interconnects per processor
• 45nm process technology
• Chipset
• Intel 5500 I/O controller hub
• Memory
• 800MHz, 1066MHz, or 1333MHz DDR3 memory
• Twelve DIMM sockets for support up to 144GB of memory
• GPU
• PCI-e 2.0 16x double height expansion slot for GPU
• Multiple options: Tesla, GeForce, or Quadro cards
• Storage
• Two 3.5" SATA2 drive bay
• Support RAID level 0-1 with Linux software RAID (with 2.5" drives)
• DVD+RW slim-line optical drive
• Management
• Integrated IPMI 2.0 module
• Integrated management controller providing iKVM and remote disk emulation.
• Dedicated RJ45 LAN for management network
• I/O connections
• Two independent 10/100/1000Base-T (Gigabit) RJ-45 Ethernet interfaces
• Two USB 2.0 ports
• One DB-9 serial port (RS-232)
• One VGA port
• Optional ConnectX DDR or QDR InfiniBand connector
• Electrical Requirements
• High-efficiency power supply (greater than 80%)
• Output Power: 560W
• Universal input voltage 100V to 240V
• Frequency: 50Hz to 60Hz, single phase
15XGPU2 - specifications
38
availability
• Both the Tesla S1070 and 15XGPU GPU nodes are available and shipping now
• For price and custom configuration contact your Account Representative
• (866) 802-8222
• http://www.advancedclustering.com/go/gpu
39