scalemp - introduction
TRANSCRIPT
![Page 1: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/1.jpg)
THE HIGH-END VIRTUALIZATION COMPANY SERVER AGGREGATION – CREATING THE POWER OF ONE
Aggregate. Scale. Simplify. Save.
Short Introduction
![Page 2: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/2.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
COMPANY AND PRODUCTS
13-Sep-10 2
![Page 3: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/3.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Virtualization software for aggregating multiple off-the-shelf systems into a single virtual machine,
providing improved usability and higher performance
What We Do ?
16 x OS
16 x OS
16 x OS
16 x OS
16 x OS
16 x OS
16 x OS
16 x OS
16 x OS N x OS
N x Servers
1 OS
1 VM
13-Sep-10 3
![Page 4: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/4.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Proven Track Record
Key Industry Partners Over 150 Customers Worldwide
Federal
Higher Ed/Research
Commercial
Higher Ed./Research
Government
13-Sep-10 4
![Page 5: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/5.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Virtualization Defined…
Partitioning Providing a virtual resource that is a
subset of the physical resource
Aggregation Providing a virtual resource that is a
concatenation of several physical resources “Utilization”
Single system only
Disk Partitioning
VLANs
Server Virtualization
Hardware
Array-based
Software
Volume Mgmt
Switch-based
Stack- based
Mainframe Hypervisor /
VMM
“Management, Capability”
Disk Concatenation
Link Aggregation
Server Aggregation
Volume Mgmt
Hardware
Array- based
OS- based
Switch- based
SMP, MPP
Software
…abstracting the physical characteristics of a computing resource…
13-Sep-10 5
![Page 6: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/6.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Server Virtualization
Hypervisor or VMM
Virtual Machines
App
OS
App
OS
App
OS
Virtual Machine
App
OS
Hypervisor or VMM
Hypervisor or VMM
Hypervisor or VMM
Hypervisor or VMM
AGGREGATION Concatenation of physical resources
PARTITIONING Subset of the physical resource
13-Sep-10 6
![Page 7: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/7.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
vSMP Foundation Aggregation Platform
Cluster Manageability
SMP Cost Savings & Performance
Cloud Flexibility
Cost Savings
• Up to 5X cost savings
Performance
• Leveraging latest Intel processors
• Best x86 solution by SPEC CPU2006
• #7th best shared-memory by STREAM (memory bandwidth)
Reliability
• Fault detection and component isolation
• Redundant backplane support
Capabilities
• Up to 16,384 CPUs and 64TB RAM
Flexibility
• On-the-fly aggregated VM provisioning and tear-down
• Scaling memory or CPU
Utilization
• Resource fragmentation reduction
• Support any programming model (serial, throughput, multi-threaded, large-memory) without machine boundary
Integration
• Network installation provides seamless integration with any grid management system
Manageability
• Single Operating System (OS) for up to 128 nodes
• OS driven job scheduling and resource management
Performance
• InfiniBand performance with zero management and knowhow
Storage
• Built-in cluster file system
Installation
• Unboxing to production in less than 3 hours
13-Sep-10 7
![Page 8: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/8.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
TECHNOLOGY
13-Sep-10 8
![Page 9: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/9.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
How Does it Work ?
Multiple Computers with Multiple Operating Systems
Multiple Computers with a Single Operating System
13-Sep-10 9
![Page 10: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/10.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Bare Metal, Distributed Virtual Machine Monitor
• Loaded at boot time
– Supported boot devices: USB, IDE, CompactFlash or Network Image (PXE)
• Fabric probing and VM setup
• Loading the OS and maintaining I/O and memory coherency
– Software interception engine creates a uniform execution environment
– Creates the relevant BIOS environment to present the OS (and the SW stack above it) a single system
– Exposes all available CPU, Memory and I/O resources to the OS
– I/O resources unified into a single PCI hierarchy
Network Boot
Up to 16 servers (today), 128 servers (3Q2010)
Multiple Computers with a Single Operating System
How Does it Work ?
13-Sep-10 10
![Page 11: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/11.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
• Systems configuration can be different
– Aggregating systems with different boards, I/O configurations, processors speed and memory configuration
– Only one type of CPU will be presented to the OS
• >10 different coherency mechanisms
• Aggregated hardware I/O compatibility list include devices from Intel, Broadcom, LSI, ATI, Emulex, Adaptec and others
1 x RAM
4 x RAM
1 x RAM
4 x RAM
1 x RAM
1 x RAM
1 x RAM
1 x RAM
Up to 4TB aggregated (today), 64TB aggregated (3Q2010)
14 x RAM
Aggregated System Multiple Computers
with a Single Operating System
How Does it Work ?
13-Sep-10 11
![Page 12: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/12.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
• Decision criteria:
– Access pattern and historical knowledge on the reference (node level)
– Memory block association (system level)
– Code scanning
– Virtual address access pattern
• Transfer size:
– 4K native cache line
– 128KB transfers for pre-fetch
– Instruction size: byte size for PINNED memory
• Ownership:
– No backstore or persistent memory location. Board- local caching.
– Memory migration and replication - Shared / Exclusive / Invalid
– Pre-fetch with persistence ownership (motivated by code-scanning or I/O access pattern)
14 x RAM
Coherent Memory
Memory Coherency Multiple Computers
with a Single Operating System
How Does it Work ?
13-Sep-10 12
![Page 13: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/13.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
• Hardware resiliency (by the hardware vendor):
– Redundant power
– Redundant cooling
– Memory mirroring
• vSMP Foundation architecture resiliency:
– Active-active backplane
– Fault isolation – immediate restart without failed components
– Partitioning / containers support
• Remote management (by hardware vendor + ScaleMP):
– Hardware chassis management
– vSMP Foundation native Serial-over-LAN (SOL) support
Resiliency Multiple Computers
with a Single Operating System
How Does it Work ?
13-Sep-10 13
![Page 14: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/14.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
• vSMP Foundation Productivity Pack: Automatic full software stack installer and tuning tool
• Real time statistics counters and system profiler. Designed to provide the user information about system efficiency
– Efficiency information reveals the major application scalability issues that reduce system performance
– No changes required for the application code; does not affect timing
– Used for pinpoint scalability issues as well as for software tuning; not available on other architectures
• OS-level tuning: Enabled by single kernel compile option
• Tuned libraries: MPI (MPICH), OpenMP (Intel), MKL (Intel)
• Application execution guidelines
Optional Complete Software Stack
Project Start
+3 days
+7 days
+10 days
Actual Case
How Does it Work ?
13-Sep-10 14
![Page 15: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/15.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
CUSTOMER USE CASES
13-Sep-10 15
![Page 16: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/16.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
DEPLOYMENT SCENARIOS
3.1
13-Sep-10 16
![Page 17: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/17.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
SMP (1)
13-Sep-10 17
![Page 18: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/18.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
SMP (2)
vSMP Foundation DC2:
13-Sep-10 18
![Page 19: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/19.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
SMP (3)
Ethernet Switch
13-Sep-10 19
![Page 20: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/20.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Cluster
13-Sep-10 20
![Page 21: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/21.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Cloud
10 STEPS in 10 MINUTES 1. User submits a job requiring 80cores
and 400GB RAM
2. Resource manager (RM) find it needs 10 systems (48GB RAM each)
3. RM identifies 10 available nodes from the environment
4. RM requests vSMP Foundation for Cloud for a 10-node image
5. RM points the 10 nodes to boot vSMP Foundation for Cloud image (using PXE)
6. All 10 nodes PXE-boot the image and aggregate to form a single virtual system
7. The aggregated Virtual Machine, boots Linux OS from either local drive or PXE-boot (using vendor tag ScaleMP)
8. The virtual system now operational and ready to run jobs (up 80-cores, ~400GB)
9. RM runs the job
10.Job finished – 10 nodes back to the pool and available for other jobs
13-Sep-10 21
![Page 22: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/22.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Storage
13-Sep-10 22
![Page 23: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/23.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
SIMPLICITY
3.2
13-Sep-10 23
![Page 24: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/24.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Customer Use Cases
• Customer: Auburn University, School of Engineering
• Current platform: None. Just getting into HPC
• Problems: – Compute requirements were growing as number of users/students was growing
– No in-house skills to run x86 InfiniBand cluster
– Limited operational budget to hire additional sys-admin resources
• Applications: Commercial application (mostly Fluent and MATLAB)
• Solution: – 4 full blade chassis, each aggregated as a single system with 128 cores and 384 GB RAM and 5 TB of
internal storage
– Total: 64 physical nodes, 512 cores, 20TB storage - running as a cluster of 4 ‘Fat-Nodes’
• Benefits: – Low OPEX:
• No additional IT required for day-to-day operations
• The need to manage only 4 ‘Fat-Nodes’
• Internal storage is embedded in each ‘Fat-Node’
– Simplicity: InfiniBand performance without the complexity of managing such a solution
ENGINEERING FACULTY
LARGE SCALE DEPLOYMENT WITHOUT THE COMPLEXITY
13-Sep-10 24
![Page 25: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/25.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Customer Use Cases
• Customer: Mid-size Engineering Services Company
• Current platform: Multiple 2-socket workstations
• Problems: – Existing models (Abaqus) grow fast and can’t fit the engineers workstation
– Interested in running apps in batch at night
– No in-house skills to run x86 InfiniBand cluster (although the application runs nicely on InfiniBand cluster) . Can’t afford RISC systems
• Solution: – 4 Intel dual-processor Xeon systems to provide 128GB RAM, 8 sockets (16 cores) single virtual system
running Linux with vSMP Foundation
• Benefits: – Performance: Solution significantly faster than existing workstations. Performance is comparable to
cluster performance (using vendor benchmarks).
– Low OPEX: No IT required for day-to-day operation
– Versatility: Batch mode at night. Daytime jobs are executed on the system while using the workstation for display only. Multi-user environment with perfect scaling – and sharing without performance degradation.
– Investment protection: Expected to expand the system by adding additional 4 nodes (to a total of 256GB RAM, 32 cores)
ENGINEERING SERVICES COMPANY
INNOVATION WITHOUT
COMPLEXITY
13-Sep-10 25
![Page 26: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/26.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
SIMPLIFYING INTER-PROCESS
COMMUNICATION
Customer Use Cases
• Customer: Hedge Fund
• Current platform: Multiple 4-Socket Servers
• Problems: – A single 4-socket server did not provide enough performance required for customer business targets
– Multiple 4-socket servers required complex decomposition and introduced challenges in transferring data between processes in a short and deterministic time (low latency and small jitters)
• Ethernet based solution could not provide this / IB solution is too complex to manage and program for
– Co-location at exchanges for a solution comprised of multiple systems is complicate
• Applications: KX, WOMBAT, home-grown code
• Solution: – 16 Intel dual-processor Xeon systems to provide 0.5TB RAM, 32 sockets (128 cores) single virtual
system running Linux with vSMP Foundation
• Benefits: – Reduced latency and latency variance
– Simpler solution: Deploy and management of a single system
– Better utilization: Single system reduces resources fragmentation
– Simpler programming model: No need for specific InfiniBand programming
FINANCIAL SERVICES
13-Sep-10 26
![Page 27: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/27.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
FLEXIBILITY
3.3
13-Sep-10 27
![Page 28: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/28.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
COST EFFECTIVE FLEXIBLE SOLUTION WITH HIGH
UTILIZATION
Customer Use Cases
• Customer: Hosted HPC resource provider
• Current platform: Clusters and SMP machines
• Problems: – Need to run MPI as well as OpenMP (shared memory) codes
– Large shared memory jobs require dedicated proprietary hardware
– Low utilization on shared memory systems
• Applications: A variety of commercial codes
• Solution: – Original: 4 systems, total of 8 sockets (32 cores) and 128GB RAM
– Solution was extended to 16 nodes
• Benefits: – Utilization: Rely on standard commodity hardware
– Flexibility: Using same system for both shared memory and cluster benchmarks, resulting in high utilization
HOSTED HPC RESOURCE PROVIDER
13-Sep-10 28
![Page 29: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/29.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
ELASTIC VM SOLUTION AIMED FOR DATA
INTENSIVE COMPUTING
Customer Use Cases
• Customer: San Diego Supercomputer Center (SDSC)
• Current platform: 8-socket AMD systems
• Problems: – Require an infrastructure for data intensive computing
– Need large memory system (TBs in size), depending on job need
– Require the ability to access quickly large amounts of storage
• Applications: A variety of data intensive codes (Astronomy, Genomics, Data Mining, etc..)
• Solution: – Initial Deployment: 4 ‘Super Nodes’, each with 768GB RAM, 128 cores, 10TB internal storage
– Complete Deployment (2011): 1,024 servers with vSMP Foundation for Cloud. Could be aggregated up to 32 ‘Super Nodes’ each nodes is 32 servers, resulting in 2TB RAM and 8TB of SSDs
– On demand allocation using web-request and fast (<10 minutes) provisioning
• Benefits: – Flexibility: Provision multiple ’Super Nodes’ of various
sizes according to need
– Performance: Extremely fast hierarchical memory solution: RAM Aggregated RAM Aggregated SSDs
SUPER COMPUTER CENTER
13-Sep-10 29
![Page 30: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/30.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
CAPABILITY
3.4
13-Sep-10 30
![Page 31: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/31.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Customer Use Cases
• Customer: Global Energy Company
• Current platform: x86 grid
• Problems: – Using in-house single-threaded simulation tools in throughput mode. Each simulation
memory footprint has grown over the years and sometimes (10%) exceeds 32GB.
– Application runs on x86 only
– Used to reschedule failed runs on large-memory systems
• Solution: – 6 Intel dual-processor Xeon systems to provide 192GB RAM, 12 sockets (48 cores) single virtual
system running Linux with vSMP Foundation
• Benefits: – Versatility: Both large and small workloads used concurrently on the same system
– Utilization: Higher utilization compared to grid due to lower infrastructure fragmentation
– Investment protection: Solution expanded by 100% since initial installation
GLOBAL ENERGY COMPANY
SINGLE INFRASTRUCTURE FOR HORIZONTAL AND VERTICAL APPLICATION SCALING – PLUG & PLAY
13-Sep-10 31
![Page 32: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/32.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
SCALEUP AT SCALEOUT
PRICING
Customer Use Cases
• Customer: Formula1 team
• Current platform: Large-memory Itanium-based system
• Problems: – Need to generate large mesh as part of pre-processing of whole-car
simulation (FLUENT TGrid)
– Mesh requirements are ~200GB in size
– Expect to grow significantly within 12 months after initial deployment
– Would like to standardize on x86 architecture due to lower costs and open standards
• Solution: – 12 Intel dual-processor Xeon systems to provide 384GB RAM single virtual system running Linux with
vSMP Foundation
• Benefits: – Better performance: Solution evaluated and found to be faster than
alternative systems (x86 and non-x86)
– Cost: Significant savings compared to alternative system
– Versatility: Also being used to run FLUENT (MPI) as part of large cluster
– Investment protection: Solution can grow
FORMULA1 TEAM
13-Sep-10 32
![Page 33: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/33.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
SIMPLE AND FLEXIBLE COST
EFFECTIVE SOLUTION
Customer Use Cases
• Customer: Weather forecasting service provider
• Current platform: SGI Altix with 32 cores
• Problems: – Need to run MPI as well as OpenMP codes
– System needs to be deployed remotely, and hence needs to be simple to manage
– Data processing flow is complex and requires transferring large amounts of data between steps
• Applications: MM5, WRF, MAWSIP, Home-grown code for data transformation
• Solution: – 4 Intel Nehalem dual socket blades, total of 8 sockets (32 cores) and 192GB RAM
– Internal storage
– Solution was extended to 8 blades, total of 16 sockets (64 cores) and 384GB RAM
• Benefits: – Performance: 2.5 X better performance on same # of cores (32)
– Simpler solution: Significantly reduced capital expense, allowed the customer to have a higher # of cores
– Simplicity: Simple to manage by domain experts (weather forecast scientists)
– Dataflow remains within the system, leveraging internal storage
WEATHER FORECASTING SERVICE PROVIDER
13-Sep-10 33
![Page 34: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/34.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Customer Use Cases
• Customer: Large European semiconductor manufacturer
• Current platform: Proprietary large shared memory system + compute grid
• Problems: – Dedicated and proprietary systems were expensive
– Utilization and efficiency of the existing large memory system was low for throughput jobs
– The mix of throughput and large memory jobs required maintaining 2 separate environments
• Solution: – 8 Intel dual-socket (quad-core) Xeon systems to provide 300GB RAM single virtual system running
Linux with vSMP Foundation
• Benefits: – The ability to run large memory jobs when required
– Flexibility: Switch to between large memory and throughput jobs in an efficient way on the fly
– Consistency: Underlying hardware is aligned with standard hardware used for the rest of the compute grid
– Better utilization: Having a single system reduces resources fragmentation
– Performance: Leverage most recent Intel CPUs for large-memory jobs
SEMICONDUCTOR MANUFACTURER
FLEXIBLE SYSTEM FOR THROUGHPUT AND LARGE MEMORY JOBS
13-Sep-10 34
![Page 35: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/35.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
LARGE MEMORY FOR MULTI-THREADED PROGRAMMING
Customer Use Cases
• Customer: Medical Research Institute
• Current platform: HP Superdome System
• Problems: – Need to perform high performance image processing on very large MRI scans
– Scanned data for a single run is currently over 200GB. Memory requirements are expected to grow significantly with the introduction of full body scan with more sensors
– Would like to use and commercial tools for faster development
– Would like to standardize on x86 architecture due to lower costs and open standards
• Applications: Siemens CT processing, MATLAB, BLAS, Home-grown code, …
• Solution: – 16 Intel dual-processor Xeon systems to provide 1TB RAM, 32 sockets (128 cores) single virtual
system running Linux with vSMP Foundation
• Benefits: – Better performance: Solution evaluated and found to be faster
than any other alternative system
– Cost: Significant savings compared to alternative system (order of magnitude)
– Versatility: Also being used for MPI jobs as part of large cluster
MEDICAL RESEARCH INSTITUTE
13-Sep-10 35
![Page 36: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/36.jpg)
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
SHARED MEMORY MULTI-THREADED PROGRAMMING
Customer Use Cases
• Customer: RWTH Aachen - Polytechnic University
• Current platform: 4 socket x86 server
• Problems: – Need to run auto-parallelized code (using OpenMP) with high core count
– Other solution found not to work: • Cannot afford proprietary SMP
• Cluster-OpenMP proved not to scale
– Has to use OpenMP for faster development (machine generated Fortran code)
• Applications: Home-grown codes (SHEMAT Suite, FIRE, …)
• Solution: – 13 Intel dual-processor Xeon systems to provide 26 sockets (104 cores)
– Single virtual system running Linux with vSMP Foundation
• Benefits: – Performance: Solution scales at over 95% efficiency
to 104 cores with OpenMP codes
– Cost: Significant savings compared to alternative solutions
– System scale: Largest x86 system available
ACADEMIC RESEARCH
13-Sep-10 36
![Page 37: ScaleMP - Introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062513/55618aebd8b42a96778b4cab/html5/thumbnails/37.jpg)
THE HIGH-END VIRTUALIZATION COMPANY SERVER AGGREGATION – CREATING THE POWER OF ONE
Aggregate. Scale. Simplify. Save.
Shai Fultheim Founder and President [email protected], +1 (408) 480 1612